You are on page 1of 17

Search

Search
Pubmed
Pubmed
with
with
R
R
Part3
Part3
Query
Query
pubmed
pubmed
titles for systemic lupus
titles for systemic lupus
erythematosus
erythematosus
with R Package RISmed
with R Package RISmed
1 1
#Type the following in the R console: #Type the following in the R console:
library(RISmed library(RISmed) )
lupus< lupus<- - EUtilsSummary('lupus[Ti EUtilsSummary('lupus[Ti] ] erythematosus[ti erythematosus[ti] ]
systemic[Ti systemic[Ti]', ]', retmax retmax=200) =200)
# # retmax retmax refer to Maximum number of records to retrieve, default is 100 refer to Maximum number of records to retrieve, default is 1000. 0.
fetch.lupus fetch.lupus < <- - EUtilsGet(lupus EUtilsGet(lupus) )
fetch.lupus fetch.lupus
# Results: # Results: PubMed PubMed query: query: lupus[Ti lupus[Ti] AND ] AND erythematosus[ti erythematosus[ti] AND ] AND systemic[Ti systemic[Ti] Records: 200 ] Records: 200
lupus.tit lupus.tit< <- -ArticleTitle(fetch.lupus ArticleTitle(fetch.lupus) )
lupus.tit lupus.tit [1:10] # to view the first 10 results of titles [1:10] # to view the first 10 results of titles
# export results to text file # export results to text file
write(lupus.tit,file write(lupus.tit,file=" ="lupusRISmedTi.txt lupusRISmedTi.txt") ")
References References
1 1- - RISmed RISmed package package: : Stephanie Stephanie Kovalchik Kovalchik (2013). (2013). RISmed RISmed: : Download Download content content from fromNCBI NCBI databases databases. R . R package package version version 2.1.0. 2.1.0.
http http:// ://CRAN.R CRAN.R- -project.org project.org/ /package package= =RISmed RISmed
Query
Query
pubmed
pubmed
titles for systemic
titles for systemic
lupus
lupus
erythematosus
erythematosus
using
using
RISmed
RISmed
View results of the exported text file
View results of the exported text file
Export results to text file with R command line Export results to text file with R command line
write(lupus.tit,file write(lupus.tit,file=" ="lupusRISmedTi.txt lupusRISmedTi.txt") ")
# export title results as text file and open file in excel or an # export title results as text file and open file in excel or any other valid text editor y other valid text editor
Find the Title Verb Relation with
Find the Title Verb Relation with
Reverb
Reverb
REVERB
1
is an open extractor executable jar executable jar program
developed by the University of Washington's Turing Center.
It is important to note that Reverb is dependent on J AVA, therefore it
is not a R program.
Reverb is powerful and provides useful information about structure
relation of a text. It is relative easy to use and runs very fast.
In our case we will apply Reverb to to our text title results.
Reference:
@inproceedings{ReVerb2011, author = {Anthony Fader and Stephen Soderland and Oren Etzioni},
title = {Identifying Relations for Open Information Extraction}, booktitle = {Proceedings of the Conference of Empirical Methods in Natural Language
Processing ({EMNLP} '11)}, year = {2011}, month = {J uly 27-31}, address = {Edinburgh, Scotland, UK} }
Install Reverb
Install Reverb
You can download the latest You can download the latest ReVerb ReVerb jar from jar from
http://reverb.cs.washington.edu/reverb http://reverb.cs.washington.edu/reverb- -latest.jar latest.jar
This is the executable jar file is easy to run from MS This is the executable jar file is easy to run from MS- -DOS command. DOS command.
In In https://github.com/knowitall/reverb/ https://github.com/knowitall/reverb/ you can find how to use you can find how to use
Reverb. It provides the following example which illustrates wha Reverb. It provides the following example which illustrates what it t it
does: does:
ReVerb ReVerb takes takes raw raw text text as as input input, , and and outputs outputs (argument1, (argument1, relation relation
phrase phrase, argument2) triples. , argument2) triples. For For example example, , given given the the sentence sentence
"Bananas are "Bananas are an an excellent excellent source source of of potassium potassium," ," ReVerb ReVerb will will extract extract
the the triple (bananas, be triple (bananas, be source source of of, , potassium potassium). ).
In In order order to to run run Reverb Reverb you you need need to to have have J ava J ava installed installed on on your your
computer computer. . You You can can install install J ava J ava from fromhttps://www.java.com/en/download/ https://www.java.com/en/download/
Reference:
@inproceedings{ReVerb2011, author = {Anthony Fader and Stephen Soderland and Oren Etzioni},
title = {Identifying Relations for Open Information Extraction}, booktitle = {Proceedings of the Conference of Empirical Methods in
Natural Language Processing ({EMNLP} '11)}, year = {2011}, month = {J uly 27-31}, address = {Edinburgh, Scotland, UK} }
Use of Reverb
Use of Reverb
Place
Place reverb-latest.jar file and the result file

lupusRISmedTi.txt
lupusRISmedTi.txt

under the same folder


under the same folder
Figure shows example of the 2 files in the same
folder (which we named Reverb-J ava)
Use of Reverb
Use of Reverb
1
1
-
-
Open the MS
Open the MS
-
-
DOS
DOS
cmd
cmd
and type the path of
and type the path of
the folder (Reverb
the folder (Reverb
-
-
J ava in our example)
J ava in our example)
containing both files:
containing both files: reverb-latest.jar file and
lupusRISmedTi.txt
lupusRISmedTi.txt
Use Reverb
Use Reverb
2 2- - Type the following cmd line to view results on the
console:
java -Xmx512m -jar reverb-latest.jar lupusRISmedTi.txt lupusRISmedTi.txt
Results are displayed on the MS Results are displayed on the MS- -DOS window DOS window
Use of Reverb
Use of Reverb
-
-
export the results to
export the results to
xls
xls
file
file
3 3- - Type the following cmd line to export results to a file : :
java -Xmx512m -jar reverb-latest.jar lupusRISmedTi.txt lupusRISmedTi.txt > >
ReverbLupusRISmedTi.txt ReverbLupusRISmedTi.txt
(the name given to the file was ReverbLupusRISmedTi.txt ReverbLupusRISmedTi.txt. You can use . You can use
other name or even export to a other name or even export to a xls xls file if you type file if you type
ReverbLupusRISmedTi.xls ReverbLupusRISmedTi.xls
Open the Reverb result file
Open the Reverb result file
ReverbLupusRISmedTi.txt
ReverbLupusRISmedTi.txt
with MS excel
with MS excel
Reverb output
Reverb output
The Reverb output has 18 columns
The Reverb output has 18 columns
(see results in the excel file)
(see results in the excel file)
The
The
most
most
interesting
interesting
are:
are:

Col 3 (Col C) : Argument1


Col 3 (Col C) : Argument1

Col 4 (Col D):


Col 4 (Col D):
Verb
Verb
Relation
Relation
phrase
phrase

Col 5 (Col E): Argument2


Col 5 (Col E): Argument2
(Col 12 (Col 12 refer refer to to the the confidence confidence that that this this extraction extraction is is correct correct and and col 2 col 2
refer refer to to the sentence number where the extraction came from)
Reverb Results
Reverb Results
Results of the first 5 rows (excel) from columns 3 Results of the first 5 rows (excel) from columns 3- -5 5
1 1- - childhood childhood- -onset systemic lupus onset systemic lupus erythematosus erythematosus is associated with is associated with ethnicity ethnicity
2 2- - renal involvement renal involvement are lower in are lower in ACE inhibitor ACE inhibitor- -treated patients treated patients
3 3- - Prednisone Prednisone induced induced two two- -way myocardial development way myocardial development
4 4- - Acetylated Acetylated histones histones contribute to contribute to the the immunostimulatory immunostimulatory potential of potential of
Neutrophil Neutrophil Extracellular Extracellular Traps Traps
5 5- -clinical practice clinical practice monitor the impact of monitor the impact of systemic lupus systemic lupus erythematosus erythematosus
Note: Note: Blue color refer to argument 1 Blue color refer to argument 1; white color is verb relation; ; white color is verb relation; orange color orange color
refer to argument 2 refer to argument 2
Prepare Reverb Results
Prepare Reverb Results
data for R
data for R
Wordcloud
Wordcloud
# use # use read.table read.table script (from reference script (from reference
1 1
) as follows: ) as follows:
d < d <- -
read.table('ReverbLupusRISmedTi.txt',quote read.table('ReverbLupusRISmedTi.txt',quote='', ='',comment comment
.char .char='', ='', allowEscapes allowEscapes= =F,sep F,sep=' ='\ \t', header=FALSE, t', header=FALSE,
as.is as.is=T, =T, stringsAsFactors stringsAsFactors=F) =F)
# transforms the data into a data frame # transforms the data into a data frame
e< e<- -as.data.frame(d as.data.frame(d) )
# merge columns (3 # merge columns (3- -5) into a single text sentence 5) into a single text sentence
f=paste(e$V3,e$V4,e$V5) f=paste(e$V3,e$V4,e$V5)
f[1:3] f[1:3] # view the first 3 lines # view the first 3 lines
[1] "childhood [1] "childhood- -onset systemic lupus onset systemic lupus erythematosus erythematosus is associated with ethnicity" is associated with ethnicity"
[2] "renal involvement are lower in ACE inhibitor [2] "renal involvement are lower in ACE inhibitor- -treated patients" treated patients"
[3] "Prednisone induced two [3] "Prednisone induced two- -way myocardial development" way myocardial development"
Reference: Reference:
1 Please stop using Excel 1 Please stop using Excel- -like formats to exchange data like formats to exchange data
December 7th, 2012J ohn Mount December 7th, 2012J ohn Mount
Represent Reverb Results
Represent Reverb Results
in R
in R
Wordcloud
Wordcloud
library (tm)
my.corpus my.corpus< <- -Corpus(VectorSource(f Corpus(VectorSource(f)) ))
summary(my.corpus)
inspect(my.corpus [1:3])
my.corpus <- tm_map(my.corpus, removeWords, stopwords("english"))
#my.corpus <- tm_map(my.corpus, stemDocument)
myTdm<- TermDocumentMatrix(my.corpus, control =
list(wordLengths=c(1,Inf)))
myTdm
# A term-document matrix (140 terms, 26 documents)
# Non-/sparse entries: 163/3477
# Sparsity : 96%
# Maximal term length: 22
# Weighting : term frequency (tf)
Represent Reverb Results
Represent Reverb Results
in R
in R
Wordcloud
Wordcloud
findFreqTerms(myTdm, lowfreq=2)
# [1] "associated" "damage" "distinct" "erythematosus"
# [5] "increased" "independently" "lupus" "systemic"
termFrequency <- rowSums(as.matrix(myTdm))
termFrequency <- subset(termFrequency, termFrequency>=10)
m <- as.matrix(myTdm)
wordFreq <- sort(rowSums(m), decreasing=TRUE) # This yields Word
Frequency
library (wordcloud)
#library (RColorBrewer)
set.seed(375)
pal1 <- brewer.pal(6,"Dark2")
wordcloud(words=names(wordFreq), freq=wordFreq,
scale=c(2,.9),min.freq=1, random.order=F, colors= pal1)
R Wordcloud
R Wordcloud
of Reverb Results
of Reverb Results

You might also like