Published April 16, 2026 | Version 1
Book chapter Open

O papel dos dados no pré-treinamento de Grandes Modelos de Linguagem

Description

The large language models currently available to us—based on Transformer technology and its variants—have at least two main training stages: pre-training and fine-tuning. In this chapter, we focus on the pre-training phase of a large language model. We provide a general overview of how the data used influences the pre-training process, the resulting models, and their capabilities and characteristics. As in the rest of the book, we will give special focus to the Portuguese language and the resources available for it.

Files

cap-modelos-dados.pdf

Files (316.6 kB)

Name Size Download all
md5:0a667c570fe2b23e20a4e7cbefdf7494
316.6 kB Preview Download

Additional details

Related works

Is part of
Book: 978-65--0200158-5 (ISBN)
Is published in
Book: https://brasileiraspln.ufscar.br/livro-pln-4ed-vol2/modelos/cap-modelos-dados/cap-modelos-dados.html (URL)

Funding

Fundação de Amparo à Pesquisa do Estado de São Paulo
C4AI 2019/07665-4