The getLattes R package, written by Roney Fraga Souza and Winicius Sabino, was built to extract data from the Lattes curriculum platform exported as XML.

The XML file needs to be extracted from .zip.
To automate the download process, please see Captchas Negated by Python reQuests - CNPQ.
You can install the released version of getLattes from github with:
# install and load devtools from CRAN install.packages("devtools") library(devtools) # install and load getLattes devtools::install_github("roneyfraga/getLattes") library(getLattes)
# the file 4984859173592703.xml is stored in datatest directory cl <- readLattes(filexml='4984859173592703.xml', path='datatest/') # import all Lattes XML files in datateste cls <- readLattes(filexml='*.xml$', path='datatest/') # import all Lattes XML files in the working directory cls <- readLattes(filexml='*.xml$')
# to combine list of data frames in data frame library(dplyr) # to import from one curriculum getDadosGerais(xmlsLattes[[499]]) # to import from two or more curricula lt <- lapply(xmlsLattes, getDadosGerais) head(bind_rows(lt))
# to import from one curriculum getArtigosPublicados(xmlsLattes[[462]]) # to import from two or more curricula lt <- lapply(xmlsLattes, getArtigosPublicados) head(bind_rows(lt))