TorchMetrics - Measuring Reproducibility in PyTorch
Description
A main problem with reproducing machine learning publications is the variance of metric implementations across papers. A lack of standardization leads to different behavior in mech- anisms such as checkpointing, learning rate schedulers or early stopping, that will influence the reported results. For example, a complex metric such as Fréchet inception distance (FID) for synthetic image quality evaluation will differ based on the specific interpolation method used. There have been a few attempts at tackling the reproducibility issues. Papers With Code links research code with its corresponding paper. Similarly, arXiv recently added a code and data section that links both official and community code to papers. However, these methods rely on the paper code to be made publicly accessible which is not always possible. Our approach is to provide the de-facto reference implementation for metrics. This approach enables proprietary work to still be comparable as long as they've used our reference implementations. We introduce TorchMetrics, a general-purpose metrics package that covers a wide variety of tasks and domains used in the machine learning community. TorchMetrics provides standard classification and regression metrics; and domain-specific metrics for audio, computer vision, natural language processing, and information retrieval. Our process for adding a new metric is as follows, first we integrate a well-tested and established third-party library. Once we've verified the implementations and written tests for them, we re-implement them in native PyTorch to enable hardware acceleration and remove any bottlenecks in inter-device transfer.
Notes
Files
Lightning-AI/metrics-v0.11.3.zip
Files
(942.3 kB)
Name | Size | Download all |
---|---|---|
md5:0e6e8ea50b448e20fcc644637fc45bc2
|
942.3 kB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/Lightning-AI/metrics/tree/v0.11.3 (URL)