Published April 16, 2026
| Version 1
Book chapter
Open
O papel dos dados no pré-treinamento de Grandes Modelos de Linguagem
Authors/Creators
- Carpi, Miguel de Mello (Researcher)1
- Serras, Felipe Ribas (Researcher)1
- Sturzeneker, Mariana Lourenço (Researcher)1
- Palma, Mayara Feliciano (Researcher)1
- Lachi, Gabriela Alves (Researcher)1
- Costa, Aline Silva (Researcher)2
- Paixão de Sousa, Maria Clara (Researcher)1
- Namiuti, Cristiane (Researcher)2
- do Monte, Vanessa Martins (Researcher)1
-
Finger, Marcelo
(Researcher)1
Description
The large language models currently available to us—based on Transformer technology and its variants—have at least two main training stages: pre-training and fine-tuning. In this chapter, we focus on the pre-training phase of a large language model. We provide a general overview of how the data used influences the pre-training process, the resulting models, and their capabilities and characteristics. As in the rest of the book, we will give special focus to the Portuguese language and the resources available for it.
Files
cap-modelos-dados.pdf
Files
(316.6 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:0a667c570fe2b23e20a4e7cbefdf7494
|
316.6 kB | Preview Download |
Additional details
Related works
- Is part of
- Book: 978-65--0200158-5 (ISBN)
- Is published in
- Book: https://brasileiraspln.ufscar.br/livro-pln-4ed-vol2/modelos/cap-modelos-dados/cap-modelos-dados.html (URL)