Tiny Scale Is All I Can Spare To Play With Transformer

Srivastava, Srijan

doi:10.5281/zenodo.20631957

Published June 10, 2026 | Version v1

Preprint Open

Tiny Scale Is All I Can Spare To Play With Transformer

Srivastava, Srijan¹

1. Independent Researcher

Introduction of the Transformer neural network architecture in the famous Attention Is All You Need paper has created a huge wave of AI development in recent years. The scaled dot-product attention allows for information to be processed with higher efficiency and quality, which the previous RNN-based models lacked. However Transformer-based models comes with their own challenges, particularly with parameter efficiency for tiny models with parameters ≤ 5M. At such small scale a Transformer model essentially uses more parameter than it really should. This sub-ten-million parameters domain space is very underexplored and for good reasons but I wanted to explore it anyways. So here-in this paper I am introducing Silia, a novel transformer architecture designed for efficient modelling & classification tasks under severe parameter budget. Training against GPT-2 architecture (Andrej Karpathy's nanoGPT project) with same "base" hyperparameters, training data and compute budget, Silia achieves comparable loss and generation quality with significantly less parameters.

Files

Silia.pdf

Files (323.1 kB)

Name	Size	Download all
Silia.pdf md5:a38e8387af376dd6f975816dc1de5e1f	323.1 kB	Preview Download

Additional details

Submitted: 2026-06

Submitted the paper

Repository URL: https://github.com/SrijanSriv211/Silia
Programming language: Python
Development Status: Active

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, (2017). Attention is all you need. arXiv preprint arXiv:1706.03762.
Noam Shazeer, (2020). GLU variants improve transformer. arXiv preprint arXiv:2002.05202.
Andrej Karpathy, (2022). nanoGPT. GitHub. https://github.com/karpathy/nanogpt

	All versions	This version
Views	509	509
Downloads	281	281
Data volume	102.4 MB	102.4 MB

Silia.pdf

Files (323.1 kB)

Dates

Software

References

Tiny Scale Is All I Can Spare To Play With Transformer

Authors/Creators

Description

Files

Silia.pdf

Files (323.1 kB)

Additional details

Dates

Software

References