From Document-Level to Segment-Level: LLM-Based Terminology Extraction for Translation Workflows
Description
This preprint introduces a novel approach to terminology extraction in translation workflows called Segment-Level: LLM-Based Terminology Extraction. Traditional terminology extraction methods typically operate at document or corpus level, which may limit their usefulness in Computer-Assisted Translation (CAT) environments where translators primarily work with individual segments.
Based on practical experience with CAT tools such as Phrase, Trados, and Crowdin, as well as student-based experiments evaluating AI terminology extraction capabilities, this study identifies limitations of document-level extraction. These limitations include increased noise, reduced domain focus, over-extraction of general vocabulary, and limited usefulness for real-time translation workflows.
To address these challenges, this paper proposes Segment-Level: LLM-Based Terminology Extraction, a prompt-based approach using Large Language Models (LLMs) to extract concept-based terminology candidates directly from individual source segments. The method is guided by ISO 704 terminology principles and emphasizes concept-oriented, domain-relevant, and translation-relevant terminology selection.
A preliminary micro-study indicates promising results, showing that segment-level extraction:
- reduces noise in term candidate selection
- improves domain relevance
- reduces over-extraction
- enhances translation consistency
- improves terminology relevance in translation workflows
The proposed method introduces a new direction for terminology extraction and may support real-time terminology assistance in CAT tools, terminology management, machine translation customization, and human-in-the-loop translation workflows.
This preprint presents the conceptual framework, methodology, prompt design, and preliminary observations supporting the feasibility of Segment-Level: LLM-Based Terminology Extraction.
Files
Files
(26.1 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:0e726db560db553d5ff1d4dafb346285
|
26.1 kB | Download |