Published April 3, 2026 | Version v2
Dataset Open

Dominique: An AI-Powered Fact-Checking Chatbot for Democratizing Access to Reliable Information

Description

This repository contains two datasets developed for research in Brazilian Portuguese fake news detection:

  1. Golden Dataset: The primary dataset, comprising 22,044 unique news articles (11,145 fake, 10,899 true) in Brazilian Portuguese. It was created by merging and deduplicating three established corpora, Fake.Br, FakeTrueBR, and FakeRecogna, to form a larger, more robust, and balanced resource. It includes extensive metadata such as source, publication date, author, and linguistic features to support the development of advanced machine learning models.

  2. Gemini Validation Dataset: A synthetic, health-focused dataset of 1,000 news instances (labeled as true or fake) generated using Google's Gemini LLM. This dataset was specifically created for external validation to test the generalization capability of trained models on unseen, out-of-distribution topics, simulating a real-world fact-checking scenario.

Files

gemini_dataset.csv

Files (86.7 MB)

Name Size Download all
md5:7f243e09f5215c0365b21957cc58a646
2.7 MB Preview Download
md5:5feb7694d3c00a17c6eb03822421fa1f
84.1 MB Preview Download

Additional details

Software

Repository URL
https://github.com/autoaihub/Dominique
Programming language
Python
Development Status
Active