There is a newer version of the record available.

Published March 21, 2021 | Version 1.0
Software Open

GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow


GPT-Neo is an implementation of model & data-parallel GPT-2 and GPT-3-like models, utilizing Mesh Tensorflow for distributed support. This codebase is designed for TPUs. It should also work on GPUs, though we do not recommend this hardware configuration.



Files (86.6 kB)

Name Size Download all
86.6 kB Preview Download

Additional details

Related works

Is cited by
Preprint: arXiv:2105.09938 (arXiv)
Preprint: arXiv:2107.13586 (arXiv)
Preprint: arXiv:2107.03374 (arXiv)
Preprint: arXiv:2107.06499 (arXiv)
Is supplement to (URL)