RACKA: Regionális Adatokon Célzottan Kialakított Alapmodell
Description
RACKA is a large language model (LLM), a text-generation AI created by researchers at ELTE by adapting the Qwen3-4B model to the Hungarian language. The adaptation was carried out using 130 million documents and code files in Hungarian, English, and German, employing LoRA adapters. The RACKA model can be used both in reasoning mode and in chat mode, retaining these capabilities in part from its base model. In line with international research practice, the model was named after an indigenous, wool-bearing, cloven-hoofed animal species.
The RACKA model is one of the smallest Hungarian-language chat-capable models. Although it still has much to learn, it has already acquired a remarkably broad range of knowledge relative to its size. After applying certain model-reduction (quantization) steps, RACKA may become capable of running independently, locally, and offline—even on a mobile phone. This makes it possible to use Hungarian-language AI processing while guaranteeing data security, without the need for expensive hardware. The training of the RACKA model is currently ongoing. At present, it is learning the rules of proper conversation, etiquette, as well as facts related to public life and science.
Language-based artificial intelligence models process textual input by splitting it into smaller units, known as tokens. Most tokenization solutions are developed with a primary focus on the English language. In the case of RACKA, we replaced part of the tokenizer so that it interprets text based on the elements most frequently occurring in Hungarian. As a result, processing becomes faster, the model is able to handle longer texts, and it makes fewer inflectional (morphological) errors.
Komondor HPC, Hungary’s largest supercomputer, provided the computational capacity for training the RACKA model. The training was carried out over 12 days, using 64 A100 (40 GB) GPUs simultaneously, amounting to a total of 2 years and 1 month of cumulative GPU time. The estimated CO₂ footprint of the project is approximately 1,290 kg, which is comparable in magnitude to the environmental impact of a round-trip flight between Budapest and Rome. With environmental sustainability in mind, the waste heat generated by Komondor is reused to support the heating of the Debrecen Sports Swimming Complex.
Files
Racka poster.pdf
Files
(773.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:1e7396d373796f97ab04c998bd39bc6d
|
773.5 kB | Preview Download |
Additional details
Dates
- Created
-
2025