PyHDL-Eval: An LLM Evaluation Framework for Hardware Design Using Python-Embedded DSLs

Batten, Christopher; Pinckney, Nathaniel; Liu, Mingjie; Ren, Haoxing; Khailany, Brucek

doi:10.5281/zenodo.13117553

Published September 11, 2024 | Version v1

Conference paper Open

PyHDL-Eval: An LLM Evaluation Framework for Hardware Design Using Python-Embedded DSLs

1. Cornell University
2. Nvidia (United States)

There has been a recent trend towards embedding hardware design and verification frameworks within Python to improve the productivity of hardware engineers. At the same time, there is significant recent work exploring the use of large-language models (LLMs) to improve key chip design and verification tasks. All of this prior work has focused on LLMs in the context of traditional hardware description languages. This paper describes PyHDL-Eval, a new framework for evaluating LLMs on specification-to-RTL tasks in the context of Python-embedded DSLs. The framework includes 168 problems developed using an ontological approach to cover 19 categories of RTL design. The framework also includes Verilog reference solutions, Verilog test benches, Python test scripts, and workflow orchestration scripts. We use our framework to conduct a detailed case study comparing five LLMs (CodeGemma 7B, Llama3 8B/70B, GPT4, and GPT4 Turbo) targeting Verilog and five Python-embedded DSLs (PyMTL3, PyRTL, MyHDL, Migen, and Amaranth). Our results demonstrate the promise of in-context learning (ICL) when applied to smaller models (e.g., pass rate for CodeGemma 7B improves from 14.9% to 32.7% on Verilog) and Python-embedded DSLs (e.g., pass rate for LLama3 70B improves from 0.6% to 33.0% on PyMTL3). We find LLMs perform equally well or better when targeting Verilog as compared Python-embedded DSLs (e.g., pass rate for GPT4 Turbo is 72.3% on Verilog and 30.0-62.2% on the Python-embedded DSLs), even though they are embedded within a popular general-purpose host language. PyHDL-Eval will serve as a useful framework to drive continued research at the intersection of Python-embedded DSLs and LLMs.

The attached Docker image includes everything required to reproduce all of the results in the paper:

Source code for the PyHDL-Eval framework (Verilog reference solutions, Verilog test benches, Python test scripts, workflow orchestration scripts)
Pre-installed binaries for all tools (GCC 13.2.0, Make 4.3, Icarus Verilog simulator 12.0, Verilator Verilog simulator 5.020, Python 3.12.3)
Pre-installed Python packages for all five Python-embedded DSLs (PyMTL3, PyRTL 0.11.1, MyHDL 0.11.45, Migen 0.9.2, Amaranth 0.4.5)
RTL modules pre-generated using all five LLMs (CodeGemma 7B, Llama3 8B/70B, GPT4, GPT4 Turbo)

Please refer to the README file for how to load the Docker image, test the framework, run all of the simulations, and then generate the result data tables.

Files

README.md

Files (350.8 MB)

Name	Size
pyhdl-eval-artifact.tar.gz md5:10c54bd8f326c95619b4afdd1a577073	350.8 MB	Download
README.md md5:d986629c9a017dd9b06c7e0489bc7201	11.5 kB	Preview Download

	All versions	This version
Views	391	391
Downloads	174	174
Data volume	29.8 GB	29.8 GB

PyHDL-Eval: An LLM Evaluation Framework for Hardware Design Using Python-Embedded DSLs

Authors/Creators

Description

Files

README.md

Files (350.8 MB)