Dominique: An AI-Powered Fact-Checking Chatbot for Democratizing Access to Reliable Information
Creators
-
Tavares, Denis
(Researcher)1
-
Barreto, Marina
(Researcher)1
-
Silva de Almeida, Breno Livio
(Researcher)1, 2, 3
-
Florentino, Bruno
(Researcher)1
- Tetzner, Felipe (Researcher)1
-
Tegoni Goedert, Guilherme
(Researcher)4
-
Struchiner, Claudio4
-
Parmezan Bonidia, Robson
(Researcher)5
-
de Carvalho, Andre
(Researcher)1
Description
This repository contains two datasets developed for research in Brazilian Portuguese fake news detection:
-
Golden Dataset: The primary dataset, comprising 22,044 unique news articles (11,145 fake, 10,899 true) in Brazilian Portuguese. It was created by merging and deduplicating three established corpora, Fake.Br, FakeTrueBR, and FakeRecogna, to form a larger, more robust, and balanced resource. It includes extensive metadata such as source, publication date, author, and linguistic features to support the development of advanced machine learning models.
-
Gemini Validation Dataset: A synthetic, health-focused dataset of 1,000 news instances (labeled as true or fake) generated using Google's Gemini LLM. This dataset was specifically created for external validation to test the generalization capability of trained models on unseen, out-of-distribution topics, simulating a real-world fact-checking scenario.
Files
gemini_dataset.csv
Files
(201.4 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:7f243e09f5215c0365b21957cc58a646
|
2.7 MB | Preview Download |
|
md5:78fc3508594e6aeb9bf0515c8831d014
|
198.8 MB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/autoaihub/Dominique
- Programming language
- Python
- Development Status
- Active