Published January 14, 2026 | Version v1
Dataset Open

MemeChain: A Multimodal Cross-Chain Dataset for Meme Coin Forensics and Risk Analysis

  • 1. ROR icon Sapienza University of Rome
  • 2. ROR icon Technical University of Denmark

Description

This repository contains the dataset presented in the paper "MemeChain: A Multimodal Cross-Chain Dataset for Meme Coin Forensics and Risk Analysis".

 

MemeChain provides a comprehensive, multimodal snapshot of 34,988 meme coins spanning four major blockchains (Ethereum, BSC, Base, and Solana). MemeChain is designed to support research in financial forensics, scam detection (including rug pulls and honeypots), and heterogeneous graph mining.

 

Dataset Organization

 

The dataset is strictly organized in the `data/` directory to ensure Interoperability.
Files are categorized into four logical modules: Token Registry, Financial Metadata, Digital Presence, and Visual Assets.

 

 1. Token Registry (Core Identity)

Master lists for token identification and validation.
 
File Name Description Key Columns
`confirmed_meme_coins.csv` The raw master list of all 34,988 identified tokens. `symbol`, `name`, `platform`, `address`
`final_filtered_meme_coins.csv` A refined subset (N=31,811) with verified deployment timestamps. Use this for temporal analysis. `address`, `platform`, `smart_contract_creation_date`

 

2. Financial Metadata

Market performance metrics.
 
File Name Description Key Columns
`economic_info_data.csv` Market snapshots for 20,692 active tokens. `price`, `market_cap`, `volume`, `creation_date`

 

3. Digital Presence & Social Forensics

Off-chain artifacts for analyzing web infrastructure and community intent.
 
File Name Description Key Columns
`web_page_analysis.csv` Detailed domain forensics and security flags. Includes ChainPatrol scam labels. `url`, `registrar`, `expiration_date`, `chain_patrol_status`, `domain_name`
`social_info.csv` Verified official social media endpoints. `website`, `twitter` (X), `telegram`, `discord`
`html_web_pages_path.csv` Mapping file linking a token to its local HTML file path. `address`, `platform`, `html_path`
`html_web_pages.zip` [Raw Data] - Archive containing the HTML code of project websites. Organized by folders: `{blockchain}/{token_name}_{address}.html`

 

4. Visual Assets

Multimodal assets for computer vision and brand impersonation studies.
 
 
File Name Description Key Columns
`memes_with_image_paths.csv` Mapping file linking a token to its local image file path. `address`, `platform`, `path`
`logos_images.zip` [Raw Data] - Archive containing token logos/icons. Organized by folders: `{blockchain}/{token_name}_{address}.jpg`

 



Data Completeness & "Faceless" Tokens

You will notice that we provide logo images for approximately 15,000 tokens (~43% of the dataset), while the remaining tokens do not have associated image files.
 
  • The ERC-20 and SPL token standards are lightweight and do not store images on-chain. Visual metadata requires manual submission by developers to off-chain registries (e.g., Trust Wallet, Etherscan).
  • Hypothesis: The absence of a logo is often a strong behavioral signal of low-effort deployments or bot-generated scams.

 

🔗 How to Link Data (Interoperability)

All CSV files and raw assets share a unified composite key. You can join any file in this dataset using the tuple `(address, platform)`.

Files

data.zip

Files (709.7 MB)

Name Size Download all
md5:ee341d14684471ce1da6a8599b422e6e
709.6 MB Preview Download
md5:30e4db153e85fbfaf27ed9041349fe04
3.5 kB Preview Download