DPO and SFT Comparison in LLM Counter-Speech Argumentation Across Languages
Description
The automatic generation of counter-speech (CS) is a critical strategy for addressing hate speech by providing constructive and informed responses. However, existing methods often fail to generate high-quality, impactful, and scalable CS, particularly across diverse linguistic contexts. In this paper, we propose a novel methodology to enhance CS generation by aligning Large Language Models (LLMs) using Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). Our approach leverages DPO to align LLM outputs with human preferences, ensuring contextually appropriate and linguisticall
Research goal: What is the impact of DPO versus SFT on the argumentative strength metrics of LLM-generated counter-speech across diverse linguistic contexts in alignment evaluations?
Autonomous synthesis report generated by Assignee Research. Tribunal consensus score: 8.3/10.
Notes
Files
paper.pdf
Files
(87.0 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:75ebdb084a74929f854887ca07284a9d
|
87.0 kB | Preview Download |