The getLattes
R
package, written by Roney Fraga Souza and Winicius Sabino, was built to extract data from the Lattes curriculum platform exported as XML
.
The XML
file needs to be extracted from .zip
.
To automate the download process, please see Captchas Negated by Python reQuests - CNPQ.
You can install the released version of getLattes from github with:
# install and load devtools from CRAN install.packages("devtools") library(devtools) # install and load getLattes devtools::install_github("roneyfraga/getLattes") library(getLattes)
# the file 4984859173592703.xml is stored in datatest directory cl <- readLattes(filexml='4984859173592703.xml', path='datatest/') # import all Lattes XML files in datateste cls <- readLattes(filexml='*.xml$', path='datatest/') # import all Lattes XML files in the working directory cls <- readLattes(filexml='*.xml$')
# to combine list of data frames in data frame library(dplyr) # to import from one curriculum getDadosGerais(xmlsLattes[[499]]) # to import from two or more curricula lt <- lapply(xmlsLattes, getDadosGerais) head(bind_rows(lt))
# to import from one curriculum getArtigosPublicados(xmlsLattes[[462]]) # to import from two or more curricula lt <- lapply(xmlsLattes, getArtigosPublicados) head(bind_rows(lt))