Published May 17, 2026 | Version v1
Preprint Open

Orchid 1.0: A Reproducible Recipe for Aligned Ternary-Weight Language Models on Consumer Hardware

  • 1. Independent Researcher, Bogotá, Colombia

Description

We present Orchid 1.0, a 2-billion-parameter ternary-weight language model aligned through a three-stage LoRA pipeline (reasoning SFT, identity-and-knowledge SFT, and Odds-Ratio Preference Optimization) on a single RTX 3050 laptop with 4 GB of VRAM. We document each design decision, memory-management technique, and recovery procedure that made the training feasible on this hardware.

We then describe and resolve the ternary merge problem — the destructive interaction between LoRA deltas and ternary weight quantization — which motivated the construction of ternative.cpp, a purpose-built C++ inference engine that loads a base I2_S GGUF and a separate LoRA adapter GGUF and merges them at full precision at load time. Ternative.cpp supports CPU (AVX2, OpenMP) and GPU (CUDA 12.6) execution with an OpenAI-compatible HTTP server.

We evaluate Orchid 1.0 on four standard benchmarks: ARC-Challenge 56.0% (+6.1 pp over the BitNet base), HellaSwag 52.0%, WinoGrande 74.0%, and MMLU 38.6%.

All artifacts are openly available:
- Model: https://huggingface.co/MicheRomChis/orchid-1.0
- Inference engine: https://github.com/michelangeloromerochisco/ternative
- Training code: https://github.com/michelangeloromerochisco/orchid-1.0

Files

orchid-1-0-technical-paper.pdf

Files (516.2 kB)

Name Size Download all
md5:9e07d969ac08b0fbe0d33e7040d10754
516.2 kB Preview Download

Additional details