Published February 19, 2025 | Version v6
Dataset Open

FoodSky: A Food-oriented Large Language Model, and FoodEarth: A Foundamental Food Corpus and Instruction Dataset

  • 1. ROR icon Institute of Computing Technology
  • 2. ROR icon Chinese Academy of Sciences

Contributors

Project member:

Description

Food is the cornerstone of both survival and social life. With the increasing complexity of global dietary needs and preferences, there is a growing demand for food intelligence to enable tasks like recipe recommendation and diet-disease correlation discovery. To address this, we introduce the Food-oriented Large Language Model (LLM) FoodSky, which offers fine-grained perception and reasoning of food data. We constructed a food corpus, FoodEarth, from various authoritative sources to enhance FoodSky's knowledge. We also developed the Topic-based Selective State Space Model and Hierarchical Topic Retrieval Augmented Generation algorithms to improve FoodSky's ability to capture fine-grained food semantics and generate context-aware food-relevant text. Extensive experiments show that FoodSky outperforms general-purpose LLMs on the Chinese National Chef Exam and Dietetic Exam, achieving accuracies of 67.2% and 66.4%, respectively. FoodSky not only enhances culinary creativity and promotes healthier eating patterns but also establishes a new standard for domain-specific LLMs tackling real-world food-related issues.

Notes

All versions of the FoodEarth dataset have been uploaded, including:

  • FoodEarth-811K+-full.json: A complete processed version containing over 811K instruction data.
  • FoodEarth_examples.json: A demo dataset showcasing the base structure of our dataset.
  • FoodEarth-mini.json: A minimal dataset with 200K instruction data for efficient use.
  • FoodEarth_instruction_v1_76w.json: An older version of the FoodEarth dataset.
  • FoodEarth-680K.json: A subset of the dataset used in ablation studies.

Files

FoodEarth-Complete.zip

Files (657.6 MB)

Name Size Download all
md5:fde7d7dfbc099b9240ea565fc77c7af7
589.1 MB Preview Download
md5:96e871056014a8a4399630d13fae823b
68.6 MB Preview Download

Additional details

Dates

Accepted
2025-02-19

Software

Repository URL
https://github.com/LanceZPF/FoodSky
Programming language
Python

References

  • @article{zhou2024foodsky, title={FoodSky: A Food-oriented Large Language Model that Passes the Chef and Dietetic Examination}, author={Zhou, Pengfei and Min, Weiqing and Fu, Chaoran and Jin, Ying and Huang, Mingyu and Li, Xiangyang and Mei, Shuhuan and Jiang, Shuqiang}, journal={arXiv preprint arXiv:2406.10261}, year={2024} }