Published June 12, 2025 | Version v3.0.0
Software Open

e-p-armstrong/augmentoolkit: Augmentoolkit 3.0

  • 1. Instituto Nacional de Ciência e Tecnologia em Democracia Digital
  • 2. Roam Research

Description

Augmentoolkit 3.0 is essentially an entirely new project.

Before we had 3 pipelines. Now we have 16.

Before we just generated data. Now it automatically trains whole LLMs with autogenerated training configs. Datagen can be done locally, efficiently, on consumer hardware, thanks to a custom-trained dataset generation model.

The factual finetuning process's quality has been completely revolutionized during development -- three separate times, each building on the one before it.

A full changelog is impractical, since everything is changed. Every abstraction has been improved. Every way in which the tool is used has been streamlined and improved. Every pipeline is better. Every outcome is higher-quality and more efficiently delivered.

Instead of a changelog, refer to the documentation, since diffs don't mean much when the project has been effectively rewritten from the ground up.

However, if you've forked the project before to build your own data pipelines, do not despair -- porting pipelines to New Augmentoolkit is easy and there is the pipeline conventions, abstractions primer, and new pipeline primer in the documentation (docs/...) to guide you through the process. Alternatively, you can get help on the Discord.

Augmentoolkit is now the best way in the world to make custom data, and by extension, custom models.

Happy Hacking!

Files

e-p-armstrong/augmentoolkit-v3.0.0.zip

Files (10.2 MB)

Name Size Download all
md5:249fe021256a7aaa76ba071a4e6a2f41
10.2 MB Preview Download

Additional details

Related works