Instrumetriq — Crypto Market Activity & Sentiment Context Dataset (Weekly Samples)
Description
This dataset provides time-aligned observational snapshots of crypto market activity and social sentiment across 270+ crypto assets, designed to contextualize market structure, liquidity, and attention dynamics rather than produce forecasts or signals.
The archive contains weekly Sunday samples drawn from Instrumetriq's continuous monitoring pipeline.
Data Collection
Spot market data sourced from Binance
- Mid prices, bid–ask spreads, liquidity percentiles
- Aggregated per observation window
Social sentiment data sourced from X (Twitter)
- Posts are continuously collected and classified using a hybrid transformer-based sentiment system
- Sentiment is exposed only in aggregated form (counts and averages)
- Each asset is monitored in ~2-hour observation cycles, producing one row per asset per session
- Approximately ~2,500 observations per day, regardless of tier
Archive Contents
This archive contains:
- 7 weekly Sunday samples (2025-12-21 through 2026-02-01)
- Three dataset tiers per week, sharing the same observations but differing in schema depth
- Apache Parquet format with Snappy compression
- Schema documentation for all tiers
- Methodology overview
Dataset Tiers
All tiers contain the same number of observations. They differ only in column structure and depth.
Tier 1 — Explorer
- 19 flat columns
- Aggregated sentiment counts and averages
- Spot prices, spreads, liquidity, and quality scores
- Designed for lightweight inspection, dashboards, and general analysis
Tier 2 — Analyst
- Extends Tier 1 with nested columns
- Detailed sentiment aggregates, author statistics, and engagement metrics
- Designed for deeper behavioral and cross-sectional analysis
Tier 3 — Researcher
- Extends Tier 2 with nested futures and microstructure data
- Includes 700+ spot price samples per observation window (10-second resolution)
- Multi-window sentiment, diagnostics, and futures positioning data
- Designed for research, validation, and archival analysis
Note: High-frequency (10-second) spot price samples are available only in Tier 3.
Intended Use
This dataset is intended for:
- Market structure research
- Behavioral and sentiment analysis
- Liquidity and execution context studies
- Exploratory and descriptive analytics
Limitations & Ethics
- Observational data only
- No trading advice, predictions, or signal generation
- No individual social media posts or personal data are included
- All sentiment data is aggregated and anonymized
Access
- Free weekly samples: github.com/SiCkGFX/instrumetriq-public
- Methodology: instrumetriq.com/research
Full access via subscription at instrumetriq.com/access. Interactive demo: Open in Colab.
Observational data only. No trading advice, predictions, or signal generation.
Notes
Methods
Market data is sourced from the Binance spot market via the public REST API.
Spot prices, bid–ask spreads, and liquidity-related metrics are sampled internally at high frequency and aggregated into fixed observation windows.
Social sentiment data is sourced from publicly available X (Twitter) posts.
Posts are collected continuously and classified using a hybrid transformer-based sentiment system. Sentiment outputs are aggregated into per-window counts and summary statistics.
Each tracked asset is monitored in rolling observation cycles of approximately **~2 hours**, producing one observation per asset per cycle.
All tiers share the same observation timing and coverage.
High-frequency spot price samples (10-second resolution) are retained **only in the highest dataset tier**.
Lower tiers expose aggregated spot and sentiment statistics only.
No raw social media content, user identifiers, or personally identifiable information are included.
The dataset is strictly observational and descriptive in nature.
Files
Files
(76.2 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:fd75e86b18ea23de4caab253427d86fc
|
76.2 MB | Download |
Additional details
Related works
- Is documented by
- Other: https://instrumetriq.com/dataset (URL)
- Is supplemented by
- Dataset: https://github.com/SiCkGFX/instrumetriq-public (URL)
Dates
- Collected
-
2025-12-21/2026-02-01Data collection period covered by the samples in this archive