Published October 12, 2023 | Version 1.6.0
Software Open

libscientific: A Powerful C Library for Multivariate Analysis

  • 1. Independent Researcher

Description

Libscientific is a C framework for multivariate and other statistical analysis written to be quasi-completely independent of common and well-established calculus libraries, except for the lapack library, which is used only to calculate left eigenvectors and eigenvalues.

The main goals of libscientific are:

  1. To provide a simple and tiny framework for multivariate analysis that can be used not only in regular computers but also in embedded systems
  2. Create a robust library of multivariate algorithms for any research and industrial application

Currently libscientific is able to compute:

  • Multivariate analysis

    • Principal Component Analysis (PCA) NIPALS algorithm) [1]
    • Partial Least Squares (PLS) NIPALS algorithm [1]
    • Consensus PCA (CPCA) NIPALS algorithm [7]
    • Multiple Linear Regression (MLR) Ordinary least squares algorithm
    • Unfold PCA (UPCA) [2]
    • Unfold PLS (UPLS) [2]
    • Fisher LDA
  • Clustering

    • K-means++ (David Arthur modification) [3]
    • Hierarchical clustering
  • Object/Instance selection

    • Most Descriptive Compounds (MDC) [4]
    • Most Dissimilar Compounds (DIS) [5]
  • Statistical analyisis

    • R2, MSE, MAE, RMSE, BIAS, Sensitivity, Positive Predicted Values
    • Yates analysis
    • Receiver operating characteristic (ROC)
    • Precision-Recal
    • Matrix-Matrix Euclidean, Manhattan, Cosine and Mahalanobis distances
  • Numerical analysis

    • Estimate of an integral over a xy region (numerical integration using the trapezoid rule)
    • Natural cubic spline interpolation and prediction
    • Ordinary Least-Squares (OrdinaryLeastSquares)
    • Linear Equation Solver (SolveLSE)
    • Singular value decomposition
  • Optimization

    • Nelder-Mead simplex algorithm

Moreover for some algorithms is possible to run validation methods with parallel computing to be faster:

  • Bootstrap k-fold Cross Validation (RGCV)
  • Leave-One-Out
  • Y-Scrambling [6]

Files

Files (653.7 kB)

Name Size Download all
md5:860dc13d31b47b54db8c1c06ef5747e0
653.7 kB Download

Additional details

References

  • P. Geladi, B.R. Kowalski Partial least-squares regression: a tutorial Analytica Chimica Acta Volume 185, 1986, Pages 1-17 [ttp://dx.doi.org/10.1016/0003-2670(86)80028-9
  • S. Wold, P. Geladi, K. Esbensen and J. Öhman MULTI-WAY PRINCIPAL COMPONENTSAND PLS-ANALYSIS Journal of Chemometrics Volume 1, Issue 1, pages 41–56, January 1987 http://dx.doi.org/10.1002/cem.1180010107
  • T. Kanungo, D.M. Mount, N.S. Netanyahu, C.D. Piatko, R. Silverman, A.Y. Wu An efficient k-means clustering algorithm: analysis and implementation Pattern Analysis and Machine Intelligence, IEEE Transactions on Issue Date: Jul 2002 On page(s): 881 - 892 http://dx.doi.org/10.1109/TPAMI.2002.1017616
  • B.D. Hudson, R.M. Hyde, E. Rahr, J, Wood and J. Osman Parameter Based Methods for Compound Selection from Chemical Databases Quantitative Structure-Activity Relationships Volume 15, Issue 4, pages 285–289, 1996 http://dx.doi.org/10.1002/qsar.19960150402
  • J. Holliday, P. Willett Definitions of "Dissimilarity" for Dissimilarity-Based Compound Selection Journal of Biomolecular Screening Volume 1, Number 3, 1996 Pages: 145-151 http://dx.doi.org/10.1177/108705719600100308
  • R.D. Clark , P.C. Fox Statistical variation in progressive scrambling. J Comput Aided Mol Des. 2004 Jul-Sep;18(7-9):563-76. http://dx.doi.org/10.1007/s10822-004-4077-z
  • J. A. Westerhuis, T. Kourti and J.F. Macgregor Analysis of multiblock and hierarchical PCA and PLS models Journal of Chemometrics 1998 12, 301-321 http://dx.doi.org/10.1002/(SICI)1099-128X(199809/10)12:5<301::AID-CEM515>3.0.CO;2-S