Optimizing Large Language Models for Domain-Specific Tasks

Uppsala University

doi:10.5281/zenodo.15575483

Published June 2, 2025 | Version v1

Report Open

Optimizing Large Language Models for Domain-Specific Tasks

Uppsala University

This research explores the integration of Large Language Models (LLMs) with domain-specific datasets, focusing on optimizing model training in centralized
environments. Using the TSpec-LLM dataset, a collection of telecommunications standards documentation, we developed a modular preprocessing
pipeline to clean and structure the data. We fine-tuned pre-trained T5 models for domain-specific question answering and compared their performance
against retrieval-augmented generation (RAG) approaches. Evaluation metrics, including BLEU and cosine similarity, demonstrated that while RAG excels in knowledge-intensive tasks, fine-tuned models provide efficient solutions for tasks with well-defined datasets. This study highlights the potential of centralized LLM systems for advancing domain-specific AI applications in telecommunications, leveraging methodologies from prior works.

Files

Project-CS-2024.pdf

Files (697.0 kB)

Name	Size	Download all
Project-CS-2024.pdf md5:2bc064c9dc735dfbc6133ae25f873b20	697.0 kB	Preview Download

104

Views

Downloads

Show more details

	All versions	This version
Views	104	104
Downloads	71	71
Data volume	66.9 MB	66.9 MB

More info on how stats are collected....

DOI

Resource type

Report

Publisher

Zenodo

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: June 2, 2025
Modified: June 2, 2025

Optimizing Large Language Models for Domain-Specific Tasks

Creators

Description

Files

Project-CS-2024.pdf

Files (697.0 kB)