Published April 22, 2026 | Version 1.0.0
Preprint Open

Empirical Audit of Frontier Vision-Language Models for Candlestick-Chart Pattern Recognition

  • 1. ROR icon Prince of Songkla University

Description

 Empirical audit evaluating whether frontier vision-language models (VLMs) can serve as pattern-recognition experts inside a production cryptocurrency signal pipeline. The study was motivated by QuantAgent (Xiong et al., 2025; arXiv:2509.09995), whose PatternAgent component renders candlestick charts as images and prompts a VLM with a 16-label classical-pattern glossary.

Methodology. Four frontier VLMs across two vendors were evaluated: Anthropic Claude Haiku 4.5, Claude Sonnet 4.6, and Claude Opus 4.7, plus Google Gemini 3 Flash Preview. Test set consisted of 40 verified production crypto signals (20 take-profit hits, 20 stop-loss hits; 20 long, 20 short, 25 unique coins) plus a 5-fixture breadth pass with curated ground-truth patterns. 215 API calls total, logged end-to-end. Charts rendered deterministically via node-canvas (48 bars at 4H + 45 bars at 1D context). Point-in-time OHLCV from Hyperliquid. Total audit cost: USD 1.16.             

Key findings.

  - Pattern-name accuracy: 1 correct out of 215 calls across the entire audit.                                   

  - Direction accuracy at n=37: Haiku 4.5 51.4%, Gemini 3 Flash 51.4%, Opus 4.7 57.1%. Wilson 95% confidence intervals [35.9%, 66.6%] and [40.9%, 72.0%] all contain the 50% coin-flip baseline.

  - Confidence calibration: signed point-biserial r ≈ 0 for all three audited models; reported confidence does not discriminate correct from incorrect direction calls.

  - Structural directional bias: Gemini 3 Flash predicted LONG on 17 of 17 long fixtures and 18 of 20 short fixtures — a 90 percentage-point long/short gap. Opus exhibited a 49pp gap; Haiku 13.8pp. Consistent bullish prior across models, most extreme on Gemini.

Conclusion. As of April 2026, no tested frontier VLM produces a directional signal usable as a soft prior in a production trading pipeline. The architectural hypothesis developed in the paper is that language-first VLM projections degrade on the precise counting and compositional visual relationships that candlestick-pattern recognition requires. Practitioners considering QuantAgent-style integrations should verify against balanced out-of-sample test data before deploying.           

Supplementary materials.           

  - Gist with full raw data tables: https://gist.github.com/roman-rr/c1cd675f7c35b68ae5ac281c30080166

  - Reproducibility discussion (GitHub issue on QuantAgent repository): https://github.com/Y-Research-SBU/QuantAgent/issues/21

- Sample rendered charts (public repository): https://github.com/roman-rr/trading-skills/tree/main/audits/vision-llm-charts-2026-04 

Files

vision-llm-chart-audit-preprint.pdf

Files (655.9 kB)

Name Size Download all
md5:a695e1e206158191f4eaa5ffbe6c6457
655.9 kB Preview Download

Additional details

Related works

Cites
Preprint: arXiv:2509.09995 (arXiv)
Is documented by
Other: https://github.com/Y-Research-SBU/QuantAgent/issues/21 (Other)
Is supplement to
Other: https://gist.github.com/roman-rr/c1cd675f7c35b68ae5ac281c30080166 (Other)

References

  • Xiong, F., Zhang, X., Feng, A., Sun, S., & You, C. (2025). QuantAgent: A multi-agent LLM framework for chart-pattern trading. arXiv:2509.09995v3.
  • Rahmanzadehgervi, P., Bolton, L., Taesiri, M. R., & Nguyen, A. T. (2024). Vision language models are blind. arXiv:2407.06581.
  • Wilson, E. B. (1927). Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22(158), 209–212.
  • Bulkowski, T. N. (2005). Encyclopedia of chart patterns (3rd ed.). Wiley.