# MemeChain: A Multimodal Cross-Chain Meme Coin Dataset

This repository contains the dataset presented in the paper **"MemeChain: A Multimodal Cross-Chain Dataset for Meme Coin Forensics and Risk Analysis"**. 

MemeChain provides a comprehensive, multimodal snapshot of **34,988 meme coins** spanning four major blockchains (**Ethereum, BSC, Base, Solana**). MemeChain is designed to support research in financial forensics, scam detection (rug pulls, honeypots), and heterogeneous graph mining.

---

## 📂 Dataset Organization

The dataset is strictly organized in the `data/` directory to ensure **Interoperability**. 
Files are categorized into four logical modules: **Token Registry**, **Financial Metadata**, **Digital Presence**, and **Visual Assets**.

### 1. Token Registry (Core Identity)
*Master lists for token identification and validation.*

| File Name | Description | Key Columns |
| :--- | :--- | :--- |
| **`confirmed_meme_coins.csv`** | The raw master list of all 34,988 identified tokens. | `symbol`, `name`, `platform`, `address` |
| **`final_filtered_meme_coins.csv`** | A refined subset (N=31,811) with verified **deployment timestamps**. Use this for temporal analysis. | `address`, `platform`, `smart_contract_creation_date` |

### 2. Financial Metadata
*Market performance metrics.*

| File Name | Description | Key Columns |
| :--- | :--- | :--- |
| **`economic_info_data.csv`** | Market snapshots for 20,692 active tokens. | `price`, `market_cap`, `volume`, `creation_date` |

### 3. Digital Presence & Social Forensics
*Off-chain artifacts for analyzing web infrastructure and community intent.*

| File Name | Description | Key Columns |
| :--- | :--- | :--- |
| **`web_page_analysis.csv`** | Detailed domain forensics and security flags. Includes **ChainPatrol** scam labels. | `url`, `registrar`, `expiration_date`, `chain_patrol_status`, `domain_name` |
| **`social_info.csv`** | Verified official social media endpoints. | `website`, `twitter` (X), `telegram`, `discord` |
| **`html_web_pages_path.csv`** | Mapping file linking a token to its local HTML file path. | `address`, `platform`, `html_path` |
| **`html_web_pages.zip`** | **[Raw Data]** Archive containing the source code of project websites. | Organized by folders: `blockchain/{token_name}_{address}.html` |

### 4. Visual Assets
*Multimodal assets for computer vision and brand impersonation studies.*

| File Name | Description | Key Columns |
| :--- | :--- | :--- |
| **`memes_with_image_paths.csv`** | Mapping file linking a token to its local image file path. | `address`, `platform`, `path` |
| **`logos_images.zip`** | **[Raw Data]** Archive containing token logos/icons. | Organized by folders: `blockchain/{token_name}_{address}.png` |

---

## ⚠️ Data Completeness & "Faceless" Tokens
You will notice that we provide logo images for approximately **15,000 tokens** (~43% of the dataset), while the remaining tokens do not have associated image files.

* The ERC-20 and SPL token standards are lightweight and do not store images on-chain. Visual metadata requires manual submission by developers to off-chain registries (e.g., Trust Wallet, Etherscan). 
* **Hypothesis:** The absence of a logo is often a strong behavioral signal of **low-effort deployments** or **bot-generated scams**.

---

## 🔗 How to Link Data (Interoperability)
All CSV files and raw assets share a unified composite key. You can join any file in this dataset using the tuple:
> **`(address, platform)`**