Published September 15, 2025 | Version v1
Journal article Open

Open Raman spectral library for biomolecule identification

Description

Raman spectroscopy combined with Multivariate Curve Resolution (MCR) analysis is widely used in biomedical applications. However, assignation of biomolecules to the components extracted by MCR can be challenging due to the absence of an open Raman spectral library for biomolecules. Raman experts typically identify unmixed component spectra as biomolecules by comparing them with reference spectra from the literature. This process can be time-consuming and subject to human bias. In this work, we created an open Raman spectral database with 140 biomolecules by implementing an algorithm to digitalize the spectra plots and most relevant peaks from articles available in the literature. Additionally, we implemented two search algorithms. The first one uses the spectral linear kernel or cosine similarity on the full spectra. The second algorithm is based on peak matching, and relies on the intersection over the union of the matched peaks with a defined tolerance for peak matching. Our experimental validation showed 100 % top 10 accuracy in molecule identification (e.g. collagen) and 100 % accuracy in molecule type identification (e.g. protein) in both pure biomolecule measurements and also when replicating results from prior studies. Objectively narrowing the identification to the top 10 ranked candidates and providing type identification can significantly reduce both the time required for visual identification and the need to purchase reference component samples. We publish our spectral library as an open-source tool so it can be expanded collaboratively by the research community. It is available at: https://github.com/mteranm/rama nbiolib.

Files

Open_Raman_spectral_library.pdf

Files (3.6 MB)

Name Size Download all
md5:0f7877299e4ec3dc7e5cd97b1d63278b
3.6 MB Preview Download