The goal of EndoMineR is to extract as much information as possible from free or semi-structured endoscopy reports and their associated pathology specimens.
Gastroenterology now has many standards against which practice is measured although many reporting systems do not include the reporting capability to give anything more than basic analysis. Much of the data is locked in semi-structured text. However the nature of semi-structured text means that data can be extracted in a standardised way- it just requires more manipulation. This package provides that manipulation so that complex endoscopic-pathological analyses, in line with recognised standards for these analyses, can be done.
The package is basically divied into three parts. How all the functions are connected in shown in the adjoining figure. The import of the raw data is left up to the user with the overall aim being that all the data is present in one dataframe. The user can either load data so that each row of the data is an endoscopic episode (or a pathology report) in its raw form and then allow the package to extract the relevant parts of the data, or the data can be pre-extracted (ie separate columns for the Endoscopist, medication given etc.) so that the Extraction step is skipped. The package can take either but the importing is left to the user.
The extraction- This is really when the data is provided as full text reports. You may already have the data in a spreadsheet in which case this part isn’t necessary. The extraction is provided as one function Extractor, explained below.
Cleaning- These are a group of functions that allow the user to extract and clean data commonly found in endoscopic and pathology reports. The cleaning functions usually remove common typos or extraneous information and do some reformatting. Some of the functions will also extract derived data into separate columns. The cleaning functions are provided on a per column basis (so if you have a column containing the endoscopist name, for example ,then EndoEndoscopist will clean this. However convenience functions are also provided to run the several cleaning functions at the same time as long as the relevant columns are present. EndoscAll for example will run several of the cleaning functions as long as the columns are properly named so that the subfunctions are run on the correct columns.This is also true for HistolAll
Analyses- The analyses provide graphing functions as well as analyses according to the cornerstone questions in gastroenterology- namely surveillance, patient tracking, quality of endoscopy and pathology reporting and diagnostic yield questions as explained in the EndoMineR principles pages. The analyses are separated into generic analyses that are relevant to any endo-pathological dataset, as well as specific analyses for adenoma detection rates and Barrett’s surveillance and therapy. Further disease specific datasets will be included in future iterations.
Endoscopic and pathological data will come in one of two forms- either as a collection of the whole text report or as spreadsheets with some degree of separation into different columns of the various aspects of that report eg. who the Endoscopist was, the patient’s unique identifier etc. For the latter, the package user will not need to Extract information as it is already extracted and so can go straight to cleaning the data. For the former the Extractor function has been provided:
The Extractor is a very useful function. Different hospitals will use different software with different headings for endoscopic reports. The extractor allows the user to define the separations in a report so that all reports can be automatically placed into a meaningful dataframe for further cleaning. Here we use the in-built datasets as part of the package.
A list of keywords is then constructed. This list is made up of the words that will be used to split the document. It is very common for individual departments in both gastroenterology and pathology to use semi-structured reporting so that the same headers are used between patient reports. The list is therefore populated with these headers as defined by the user. The Extractor then does the splitting for each pair of words, dumps the text between the delimiter in a new column and names that column with the first keyword in the pair with some cleaning up and then returns the new dataframe. Here we use an example dataset (which has not had separate columns selected already) as the input:
PathReportWhole |
---|
Hospital: Random NHS Foundation Trust Hospital Number: H2890235 Patient Name: al-Bilal, Widdad DOB: 1922-05-04 General Practitioner: Dr. Mondragon, Amber Date received: 2002-11-10 Clinical Details: Previous had serrated lesions ?,If looks more like UC, please provide Nancy severity index 3 specimen. Nature of specimen: Nature of specimen as stated on pot = ‘Ascending colon x2’|,Nature of specimen as stated on request form = ‘rectum’|,Nature of specimen as stated on pot = ‘4X LOWER, 4X UPPER OESOPHAGUS’|,Nature of specimen as stated on pot = ‘rectal polyp’| Macroscopic description: 1 specimens collected the largest measuring 3 x 5 x 2 mm and the smallest 3 x 5 x 5 mm Histology: The appearances are of a hyperplastic polyp.,8 pieces of tissue, the largest measuring 4 x 36 x 2 mm and the smallest 3 x 3.,Completeness of excision is uncertain as the base is not clearly visualised.,There is no ulceration.,Kikuchi level: sm2. Diagnosis: Colon, biopsy - Normal.,- Focal granulomatous inflammation, non-necrotising.,Duodenum, biopsy - within normal histological limits.,Sigmoid colon, polypectomy: - Tubular adenoma with moderate dysplasia.,- Hyperplastic polyp .,Caecum polyp biopsies:- tubular adenoma, low grade dysplasia.,- Mild chronic inflammation within the oesophageal mucosa.,Sigmoid colon biopsies:- normal mucosa.,Sigmoid polyp excision:- tubular adenoma. |
We can then define the list of delimiters that will split this text into separate columns, title the columns according to the delimiters and return a dataframe. each column simply contains the text between the delimiters that the user has defined. These columns are then ready for the more refined cleaning provided by subesquent functions.
mywords<-c("Hospital Number","Patient Name:","DOB:","General Practitioner:",
"Date received:","Clinical Details:","Macroscopic description:",
"Histology:","Diagnosis:")
PathDataFrameFinalColon2<-Extractor(PathDataFrameFinalColon2,"PathReportWhole",mywords)
HospitalNumber | PatientName | DOB | GeneralPractitioner | Datereceived | ClinicalDetails | Macroscopicdescription | Histology | Diagnosis |
---|---|---|---|---|---|---|---|---|
H2890235 | al-Bilal, Widdad | 1922-05-04 | Dr Mondragon, Amber | 2002-11-10 | Previous had serrated lesions ?,If looks more like UC, please provide Nancy severity index 3 specimen Nature of specimen: Nature of specimen as stated on pot = ‘Ascending colon x2’|,Nature of specimen as stated on request form = ‘rectum’|,Nature of specimen as stated on pot = ‘4X LOWER, 4X UPPER OESOPHAGUS’|,Nature of specimen as stated on pot = ‘rectal polyp’| | 1 specimens collected the largest measuring 3 x 5 x 2 mm and the smallest 3 x 5 x 5 mm | The appearances are of a hyperplastic polyp ,8 pieces of tissue, the largest measuring 4 x 36 x 2 mm and the smallest 3 x 3 ,Completeness of excision is uncertain as the base is not clearly visualised ,There is no ulceration ,Kikuchi level: sm2 | Colon, biopsy - Normal ,- Focal granulomatous inflammation, non-necrotising ,Duodenum, biopsy - within normal histological limits ,Sigmoid colon, polypectomy: - Tubular adenoma with moderate dysplasia ,- Hyperplastic polyp ,Caecum polyp biopsies:- tubular adenoma, low grade dysplasia ,- Mild chronic inflammation within the oesophageal mucosa ,Sigmoid colon biopsies:- normal mucosa ,Sigmoid polyp excision:- tubular adenoma |
Once the extraction has been done into separate columns, various cleaning functions can be used for indivisual columns. This is illustrated in the figure below. Any one column does not have to be present in the final dataframe once extraction has happened. The functions are defined according to the most likely outputted columns from extraction from a typical dataset. If endoscopy reports are being extracted then the functions concentrate on these.
For example if the Endoscopist name has been pulled out, the EndoscEndoscopist function can be used which returns the submitted data frame with the Endoscopist column cleaned up.
The endoscopist column might initially look like this (as the last column in this dataframe)
HospitalNumber | PatientName | GeneralPractitioner | Dateofprocedure | Endoscopist |
---|---|---|---|---|
J6044658 | Jargon, Victoria | Dr Martin, Marche | 2009-11-11 | Dr Sullivan, Shelby |
Y6417773 | Powell, Destiny | Dr al-Safi, Lutfiyya | 2008-06-15 | Dr Kekich, Annabelle |
B6072011 | Martinez-Santos, Ana | Dr Rogers, Monica | 2007-10-27 | Dr Sullivan, Shelby |
G1449886 | Lopez, Maria | Dr Heilman, Lisa | 2002-03-17 | Dr Avitia-Ramirez, Alondra |
V1607560 | al-Rahimi, Rif’a | Dr Krumland, Lisa | 2011-12-05 | Dr Greimann, Phoua |
I8031481 | Forrest, Dazheea | Dr Millman, Arianna | 2014-09-19 | Dr Avitia-Ramirez, Alondra |
W2120051 | Naperola, Breanna | Dr Vigil, Lidia | 2002-05-28 | Dr Martinez, Maegen |
O7163832 | Zuni, Shannon | Dr Merced, Essence | 2009-09-19 | Dr Anderson, Alana |
P6620949 | Gomez Barron, Erin | Dr Ursery, Dezire | 2003-10-02 | Dr Anderson, Alana |
L4378217 | Hamm, Shebra | Dr Bauman, Caitlin | 2016-11-22 | Dr Ives, Rashiah |
Myendo2<-EndoscEndoscopist(Myendo,'Endoscopist')
This function performs the cleaning of common things found in the text that may cause confusion such as getting rid of the titles ahead of the Endoscopist’s name, getting rid of whitespace etc. This is important to prevent double outputs for the same Endoscopist because of, for example, the lack and presence of a ‘.’ after Dr amongst other variations. The result is as follows:
HospitalNumber | PatientName | GeneralPractitioner | Dateofprocedure | Endoscopist |
---|---|---|---|---|
J6044658 | Jargon, Victoria | Dr Martin, Marche | 2009-11-11 | Sullivan, Shelby |
Y6417773 | Powell, Destiny | Dr al-Safi, Lutfiyya | 2008-06-15 | Kekich, Annabelle |
B6072011 | Martinez-Santos, Ana | Dr Rogers, Monica | 2007-10-27 | Sullivan, Shelby |
G1449886 | Lopez, Maria | Dr Heilman, Lisa | 2002-03-17 | Avitia-Ramirez, Alondra |
V1607560 | al-Rahimi, Rif’a | Dr Krumland, Lisa | 2011-12-05 | Greimann, Phoua |
I8031481 | Forrest, Dazheea | Dr Millman, Arianna | 2014-09-19 | Avitia-Ramirez, Alondra |
W2120051 | Naperola, Breanna | Dr Vigil, Lidia | 2002-05-28 | Martinez, Maegen |
O7163832 | Zuni, Shannon | Dr Merced, Essence | 2009-09-19 | Anderson, Alana |
P6620949 | Gomez Barron, Erin | Dr Ursery, Dezire | 2003-10-02 | Anderson, Alana |
L4378217 | Hamm, Shebra | Dr Bauman, Caitlin | 2016-11-22 | Ives, Rashiah |
The EndoscMeds currently extracts Fentanyl, Pethidine, Midazolam and Propofol doses into a separate column and reformats them as numeric columns so further calculations can be done.
Several other similar clean up functions are available for Endoscopy as follows. We will extract from the Raw endoscopy data first:
mywords<-c("Hospital:","Hospital Number:","Patient Name:","General Practitioner:","Date of procedure:","Endoscopist:","Second Endoscopist",
"Medications:","Instrument:","Extent of Exam:","Indications:","Procedure Performed:",
"Findings:","Diagnosis:")
TheOGDReportFinal2<-Extractor(TheOGDReportFinal,"OGDReportWhole",mywords)
TheOGDReportFinal2df<-data.frame(TheOGDReportFinal2["HospitalNumber"],TheOGDReportFinal2["Instrument"],TheOGDReportFinal2["Indications"],TheOGDReportFinal2["Medications"],TheOGDReportFinal2["ProcedurePerformed"])
pander(head(TheOGDReportFinal2df,10))
HospitalNumber | Instrument | Indications | Medications | ProcedurePerformed |
---|---|---|---|---|
J6044658 | FG5 | Follow-up ULCER HEALING | Fentanyl 12 5mcg Midazolam 6mg | Gastroscopy (OGD) |
Y6417773 | FG6 | Weight Loss | Fentanyl 125mcg Midazolam 7mg | Gastroscopy (OGD) |
B6072011 | FG2 | Follow-up ULCER HEALING | Fentanyl 125mcg Midazolam 6mg | Gastroscopy (OGD) |
G1449886 | FG1 | Other- | Fentanyl 12 5mcg Midazolam 2mg | Gastroscopy (OGD) |
V1607560 | FG2 | Previous OGD ? 8 months ago | Fentanyl 75mcg Midazolam 6mg | Gastroscopy (OGD) |
I8031481 | FG6 | Surveillance-Barrett’s | Fentanyl 150mcg Midazolam 3mg | Gastroscopy (OGD) |
W2120051 | FG5 | Dyspepsia | Fentanyl 125mcg Midazolam 5mg | Gastroscopy (OGD) |
O7163832 | FG2 | Oesophagus- Dysplasia | Fentanyl 75mcg Midazolam 3mg | Gastroscopy (OGD) |
P6620949 | FG4 | Oesophagus- Dysplasia | Fentanyl 25mcg Midazolam 1mg | Gastroscopy (OGD) |
L4378217 | FG7 | Therapeutic- Dilatation | Fentanyl 150mcg Midazolam 1mg | Gastroscopy (OGD) |
v<-EndoscMeds(TheOGDReportFinal2df,'Medications')
HospitalNumber | Instrument | Indications | Medications | ProcedurePerformed | Fent | Midaz | Peth | Prop |
---|---|---|---|---|---|---|---|---|
J6044658 | FG5 | Follow-up ULCER HEALING | Fentanyl 12 5mcg Midazolam 6mg | Gastroscopy (OGD) | 5 | 6 | 5 | 5 |
Y6417773 | FG6 | Weight Loss | Fentanyl 125mcg Midazolam 7mg | Gastroscopy (OGD) | 125 | 7 | 125 | 125 |
B6072011 | FG2 | Follow-up ULCER HEALING | Fentanyl 125mcg Midazolam 6mg | Gastroscopy (OGD) | 125 | 6 | 125 | 125 |
G1449886 | FG1 | Other- | Fentanyl 12 5mcg Midazolam 2mg | Gastroscopy (OGD) | 5 | 2 | 5 | 5 |
V1607560 | FG2 | Previous OGD ? 8 months ago | Fentanyl 75mcg Midazolam 6mg | Gastroscopy (OGD) | 75 | 6 | 75 | 75 |
I8031481 | FG6 | Surveillance-Barrett’s | Fentanyl 150mcg Midazolam 3mg | Gastroscopy (OGD) | 150 | 3 | 150 | 150 |
W2120051 | FG5 | Dyspepsia | Fentanyl 125mcg Midazolam 5mg | Gastroscopy (OGD) | 125 | 5 | 125 | 125 |
O7163832 | FG2 | Oesophagus- Dysplasia | Fentanyl 75mcg Midazolam 3mg | Gastroscopy (OGD) | 75 | 3 | 75 | 75 |
P6620949 | FG4 | Oesophagus- Dysplasia | Fentanyl 25mcg Midazolam 1mg | Gastroscopy (OGD) | 25 | 1 | 25 | 25 |
L4378217 | FG7 | Therapeutic- Dilatation | Fentanyl 150mcg Midazolam 1mg | Gastroscopy (OGD) | 150 | 1 | 150 | 150 |
EndoscInstrument,EndoscIndications and EndoscProcPerformed all perform similar cleaning functions with the endoscope number, the indication for the investigation and the actual procedure performed respectively. Future iterations will try to make these cleaning functions more generic and applicable to a wider number of use cases.
The cleaning functions for histology are a little more difficult as Histology reports often have a greater degree of free text reporting. In general, each histology report can be divided into the Macroscopic description of a specimen which itself is comprised of how many specimens there are for each sample sent (a sample can be a pot which includes several specimens) and how big each specimen is. The report will often give a detailed description of what is actually seen and then provide an overall diagnosis.
The histology cleaning functions are based around this. For example, the HistolHistol cleans the Histology text if present.
The original input example can be seen here:
## [1] " Two biopsies consist of small bowel mucosa and are within normal histological limits\n\n"
## [2] " modified giemsa stain\n,These are biopsies of gastric mucosa ,There is no evidence of coeliac disease\n,The nuclei are hyperchromatic,\n,There is no granulomatous inflammation\n,The appearances are in keeping with a reactive/chemical gastritis,features including basal layer hyperplasia and reactive nucelar changes with underlying\n,These are two biopsies of squamous epithelium within normal limits,fibromuscularisation of the lamina propria and mild chronic inflammation\n,These biopsies of columnar mucosa show focal acute inflammation, moderate chronic inflammation\n\n"
And once the function is run the result is here:
t<-HistolHistol(Mypath,'Histology')
## [1] ""
## [2] " modified giemsa stain ,These are biopsies of gastric mucosa .\nThe nuclei are hyperchromatic,\n.\n,The appearances are in keeping with a reactive/chemical gastritis,features including basal layer hyperplasia and reactive nucelar changes with underlying\n.\n,These biopsies of columnar mucosa show focal acute inflammation, moderate chronic inflammation\n"
Some pathology reports also provide an overall impression or a list of diagnoses interpreted from the description of the pathological text. This can also be extracted. The diagnoses may also included the absence of features and a function is provided to both clean up the Diagnosis column as well as exclude negative diagnoses. If a diagnosis column is present the function can be run as follows:
## [1] " Distal transverse colon polyp excision:- tubular adenoma, low grade dysplasia\n,Ileo-caecal valve, biopsies:\n,Stomach antrum biopsies:- normal mucosa\n,- Up to 34 eosinophils per high power field,Stomach, biopsy - Mild chronic inflammation\n"
## [2] " Rectum, polyp biopsy: - Tubular adenoma with mild dysplasia,- Raised intra-epithelial lymphocytes ,Duodenum, biopsies - within normal histological limits\n,B GI biopsy - DISTAL OESOPHAGUS X2, MID OESO X3, PROX OESO X2\n,Oesophagus, biopsies : - Minimal chronic inflammation,Sigmoid colon, polypectomy: - Tubular adenoma with moderate dysplasia,Oesophagus polyps biopsies:- 2 x papillomas\n,Duodenum biopsies:- normal\n"
with the following result:
t<-HistolDx(Mypath,'Diagnosis')
## [1] " Distal transverse colon polyp excision:\ntubular adenoma, low grade dysplasia\n,\nUp to 34 eosinophils per high power field,Stomach, biopsy \nMild chronic inflammation\n"
## [2] "B GI biopsy \nDISTAL OESOPHAGUS X2, MID OESO X3, PROX OESO X2\n,Oesophagus, biopsies : \nMinimal chronic inflammation,Sigmoid colon, polypectomy: \nTubular adenoma with moderate dysplasia,Oesophagus polyps biopsies:\n2 x papillomas\n,Duodenum biopsies:\nnormal\n"
Because the information from the Macroscopic Description is based around numbers, a further function has been provided called HistolNumOfBx to extract the number of biopsies taken.
In order to extract the numbers, the limit of what has to be extracted has to be set as part of the regex so that the function takes whatever word limits the selection.It collects everything from the regex [0-9]{1,2}.{0,3} to whatever the string boundary is. For example, if the report usually says:
Mypath.HospitalNumber | Mypath.PatientName | Mypath.Macroscopicdescription |
---|---|---|
J6044658 | Jargon, Victoria | 3 specimens collected the largest measuring 3 x 2 x 1 mm and the smallest 2 x 1 x 5 mm |
Y6417773 | Powell, Destiny | 4 specimens collected the largest measuring 4 x 4 x 4 mm and the smallest 5 x 3 x 1 mm |
B6072011 | Martinez-Santos, Ana | 9 specimens collected the largest measuring 2 x 5 x 2 mm and the smallest 1 x 1 x 4 mm |
G1449886 | Lopez, Maria | 4 specimens collected the largest measuring 5 x 4 x 1 mm and the smallest 1 x 3 x 3 mm |
V1607560 | al-Rahimi, Rif’a | 5 specimens collected the largest measuring 2 x 2 x 1 mm and the smallest 3 x 4 x 3 mm |
Based on this, the word that limits the number you are interested in is ‘specimen’ so the function and it’s output is:
v<-HistolNumbOfBx(Mypath,'Macroscopicdescription','specimen')
v.HospitalNumber | v.PatientName | v.NumbOfBx |
---|---|---|
J6044658 | Jargon, Victoria | 3 |
Y6417773 | Powell, Destiny | 4 |
B6072011 | Martinez-Santos, Ana | 9 |
G1449886 | Lopez, Maria | 4 |
V1607560 | al-Rahimi, Rif’a | 5 |
The user may want to extract specific diseases from a histology dataset. This can be done using the function HistolExtrapolDx which simply takes the Diagnosis column and looks up the presence or absence of certain diseases. The function has been hard coded to look for dysplasia, cancer or GIST but will also take user defined words. These will have to be in the form of a regular expression or can be left as an empty string as in the examples
Mypath3<-data.frame(Mypath["HospitalNumber"],Mypath["Diagnosis"])
HospitalNumber | Diagnosis |
---|---|
J6044658 | Distal transverse colon polyp excision:- tubular adenoma, low grade dysplasia ,Ileo-caecal valve, biopsies: ,Stomach antrum biopsies:- normal mucosa ,- Up to 34 eosinophils per high power field,Stomach, biopsy - Mild chronic inflammation |
Y6417773 | Rectum, polyp biopsy: - Tubular adenoma with mild dysplasia,- Raised intra-epithelial lymphocytes ,Duodenum, biopsies - within normal histological limits ,B GI biopsy - DISTAL OESOPHAGUS X2, MID OESO X3, PROX OESO X2 ,Oesophagus, biopsies : - Minimal chronic inflammation,Sigmoid colon, polypectomy: - Tubular adenoma with moderate dysplasia,Oesophagus polyps biopsies:- 2 x papillomas ,Duodenum biopsies:- normal |
B6072011 | - Background Barrett ‘s oesophagus,Sigmoid colon, biopsy - Adenocarcinoma ,- Gastric metaplasia,Oesophagus 36cm ’papilloma’ biopsy:- normal squamous mucosa ,- Chronic active inflammation,Oesophagus, biopsy - Barrett ’s oesophagus with moderate chronic inflammation ,- Minimal chronic inflammation |
G1449886 | Stomach, biopsy - Mild chronic inflammation and reactive changes ,- Normal,- note: biopsies put into the wrong pots ,Oesophagus, biopsy - Poorly differentiated tumour ,Rectum, polyp biopsy: - Tubular adenoma with mild dysplasia,- Mild chronic inflammation and oedema,-Inflammatory fibroid polyp,- within normal histological limits,- Negative for HLO |
V1607560 | Nodule GOJ, biopsies:- acute and chronic inflammation with Helicobacter ,Stomach, biopsy -Mild acute and chronic inflammation ,Oesophagus polyps biopsies:- 2 x papillomas ,- <1 mm from lateral margin,Duodenum biopsies:- patchy increase in IELs ,Duodenum, biopsy - Normal |
I8031481 | Duodenum and stomach, polyp biopsies - Consistent with hamartomatous polyps ,Gastric oesophageal junction, biopsies : - Chronic inflammation,Stomach, biopsy - Mild chronic inflammation ,- Gastric HER2 negative,- Minimal chronic inflammation,Oesophagus, biopsy - Acute inflammation in presumed proximal biopsies only ,Gastro-osophageal junction, biopsy - Squamocolumnar mucosa ,Descending colon biopsies:- normal mucosa ,- Within normalhistological limits,The biopsies of gastric oesophageal junction type squamo-columnar mucosa show mild chronic |
W2120051 | - Negative for helicobacter,- Intestinal metaplasia ,Oesophagus biopsies:- normal ,Ileum and colon biopsies:- normal mucosa ,Oesophagus, EMR 43P - Barrett ’s oesophagus without intestinal metaplasia ,Sigmoid colon, biopsy - Adenocarcinoma ,- Chronic active inflammation,Duodenum biopsies:- normal mucosa ,- Low grade dysplasia |
O7163832 | - possible eosinophilic oesophagitis,- Mild chronic gastritis,A -E) Stomach, polyps, biopsies: ,- Mild chronic inflammation and oedema,- Focal mild chronic inflammation,- Gastric HER2 negative,Ileum and colon biopsies:- normal mucosa ,Adjacent mucosa, biopsy - Normal small bowel mucosa |
P6620949 | - Chronic active gastritis,Stomach, biopsy - Chronic, moderately active Helicobacter associated gastritis ,Oesophaguas biopsies:- normal mucosa ,- Acute inflammatory exudate,Right and left colon, biopsies: - Within normal histological limits |
L4378217 | Stomach, biopsy - Chronic, moderately active Helicobacter associated gastritis ,- tubular adenoma, low grade dysplasia x 1 ,- Tubular adenoma,- Tubular adenoma,Sigmoid colon, polyp biopsy - Hyperplastic polyp ,Stomach, biopsy - Reactive gastritis and intestinal metaplasia ,- Chronic inflammation,- Mild chronic inflammation |
Mypath3<-HistolExtrapolDx(Mypath3,"Diagnosis","")
HospitalNumber | Diagnosis | Extracted |
---|---|---|
J6044658 | Distal transverse colon polyp excision:- tubular adenoma, low grade dysplasia ,Ileo-caecal valve, biopsies: ,Stomach antrum biopsies:- normal mucosa ,- Up to 34 eosinophils per high power field,Stomach, biopsy - Mild chronic inflammation | dyspla |
Y6417773 | Rectum, polyp biopsy: - Tubular adenoma with mild dysplasia,- Raised intra-epithelial lymphocytes ,Duodenum, biopsies - within normal histological limits ,B GI biopsy - DISTAL OESOPHAGUS X2, MID OESO X3, PROX OESO X2 ,Oesophagus, biopsies : - Minimal chronic inflammation,Sigmoid colon, polypectomy: - Tubular adenoma with moderate dysplasia,Oesophagus polyps biopsies:- 2 x papillomas ,Duodenum biopsies:- normal | dyspla, dyspla |
B6072011 | - Background Barrett ‘s oesophagus,Sigmoid colon, biopsy - Adenocarcinoma ,- Gastric metaplasia,Oesophagus 36cm ’papilloma’ biopsy:- normal squamous mucosa ,- Chronic active inflammation,Oesophagus, biopsy - Barrett ’s oesophagus with moderate chronic inflammation ,- Minimal chronic inflammation | carcin |
G1449886 | Stomach, biopsy - Mild chronic inflammation and reactive changes ,- Normal,- note: biopsies put into the wrong pots ,Oesophagus, biopsy - Poorly differentiated tumour ,Rectum, polyp biopsy: - Tubular adenoma with mild dysplasia,- Mild chronic inflammation and oedema,-Inflammatory fibroid polyp,- within normal histological limits,- Negative for HLO | tumour, dyspla |
V1607560 | Nodule GOJ, biopsies:- acute and chronic inflammation with Helicobacter ,Stomach, biopsy -Mild acute and chronic inflammation ,Oesophagus polyps biopsies:- 2 x papillomas ,- <1 mm from lateral margin,Duodenum biopsies:- patchy increase in IELs ,Duodenum, biopsy - Normal | |
I8031481 | Duodenum and stomach, polyp biopsies - Consistent with hamartomatous polyps ,Gastric oesophageal junction, biopsies : - Chronic inflammation,Stomach, biopsy - Mild chronic inflammation ,- Gastric HER2 negative,- Minimal chronic inflammation,Oesophagus, biopsy - Acute inflammation in presumed proximal biopsies only ,Gastro-osophageal junction, biopsy - Squamocolumnar mucosa ,Descending colon biopsies:- normal mucosa ,- Within normalhistological limits,The biopsies of gastric oesophageal junction type squamo-columnar mucosa show mild chronic | |
W2120051 | - Negative for helicobacter,- Intestinal metaplasia ,Oesophagus biopsies:- normal ,Ileum and colon biopsies:- normal mucosa ,Oesophagus, EMR 43P - Barrett ’s oesophagus without intestinal metaplasia ,Sigmoid colon, biopsy - Adenocarcinoma ,- Chronic active inflammation,Duodenum biopsies:- normal mucosa ,- Low grade dysplasia | carcin, dyspla |
O7163832 | - possible eosinophilic oesophagitis,- Mild chronic gastritis,A -E) Stomach, polyps, biopsies: ,- Mild chronic inflammation and oedema,- Focal mild chronic inflammation,- Gastric HER2 negative,Ileum and colon biopsies:- normal mucosa ,Adjacent mucosa, biopsy - Normal small bowel mucosa | |
P6620949 | - Chronic active gastritis,Stomach, biopsy - Chronic, moderately active Helicobacter associated gastritis ,Oesophaguas biopsies:- normal mucosa ,- Acute inflammatory exudate,Right and left colon, biopsies: - Within normal histological limits | |
L4378217 | Stomach, biopsy - Chronic, moderately active Helicobacter associated gastritis ,- tubular adenoma, low grade dysplasia x 1 ,- Tubular adenoma,- Tubular adenoma,Sigmoid colon, polyp biopsy - Hyperplastic polyp ,Stomach, biopsy - Reactive gastritis and intestinal metaplasia ,- Chronic inflammation,- Mild chronic inflammation | dyspla |
Other less useful functions include but which may be useful in certain hospitals and certain situations in
In addition, if there is a need to remove all sentences that give negative diagnoses (eg “There is no evidence of…”) so that false positive diagnoses are not made during the analysis stage, a further function can be applied called NegativeRemove. It can be applied as a stand alone function but is also implemented within the HistolDx function which extracts and cleans the diagnosis from the Histology text to provide a Simplified Diagnosis column.
The original input example can be seen here
## [1] " - Negative for helicobacter,- Intestinal metaplasia ,Oesophagus biopsies:- normal\n,Ileum and colon biopsies:- normal mucosa\n,Oesophagus, EMR 43P - Barrett 's oesophagus without intestinal metaplasia\n,Sigmoid colon, biopsy - Adenocarcinoma\n,- Chronic active inflammation,Duodenum biopsies:- normal mucosa\n,- Low grade dysplasia"
If we apply the function NegativeRemove we see this changes to:
MypathNegRem<-NegativeRemove(Mypath,"Diagnosis")
## [1] ",Ileum and colon biopsies:- normal mucosa\n,Oesophagus, EMR 43P - Barrett 's oesophagus without intestinal metaplasia\n,Sigmoid colon, biopsy - Adenocarcinoma\n,- Chronic active inflammation,Duodenum biopsies:- normal mucosa\n,- Low grade dysplasia"