There is a newer version of the record available.

Published August 23, 2023 | Version v1.2.3
Software Open

dcTensor: An R package for discrete matrix/tensor decomposition

  • 1. RIKEN ACCC BiT

Description

Matrix factorization (MF) is a widely used approach to extract significant patterns in a data matrix. MF is formalized as the approximation of a data matrix $X$ by the matrix product of two factor matrices $U$ and $V$. Because this formalization has a large number of degrees of freedom, some constraints are imposed on the solution. Non-negative matrix factorization (NMF) imposing a non-negative solution for the factor matrices is a widely used algorithm to decompose non-negative matrix data matrix. Due to the interpretability of its non-negativity and the convenience of using decomposition results as clustering, there are many applications of NMF in image processing, audio processing, and bioinformatics.

A discrete version of NMF can also be considered by imposing a binary solution (e.g., {0,1}) for the factor matrices extracted from the data matrix and it is called binary matrix factorization (BMF). BMF is recently featured in some data science domains such as market basket data, document-term data, Web click-stream data, DNA microarray expression profiles, or protein-protein complex interaction networks.

Although BMF is becoming more used, in the current data analysis, further extensions are required. For example, we may need a ternary solution (e.g., {0,1,2}) instead of a binary one. Here, I call it ternary matrix factorization (TMF). TMF would contribute to the extraction of ordered patterns, such as stages of disease severity. It is also possible to apply the discretization to only one of the two factor matrices ($U$ or $V$) and here I call it semi-binary matrix factorization (SBMF) or semi-ternary matrix factorization (STMF). This extension contributes to the extraction of discrete patterns in continuous-valued matrix data. Finally, there is a growing demand to extend MF to the simultaneous factorization of multiple matrices or tensors (high-dimensional arrays). Such heterogeneous data sets are obtained when multiple measurements with a common data structure are performed under different experimental conditions. Therefore, it is very convenient if discretization is available to such heterogeneous data structures. To meet these requirements, I originally developed dcTensor, which is an R/CRAN package to perform some discrete matrix/tensor decomposition algorithms (https://cran.r-project.org/web/packages/dcTensor/index.html).

Files

rikenbit/dcTensor-v1.2.3.zip

Files (359.2 kB)

Name Size Download all
md5:66c211d2e057d1c080c3d099902c324c
359.2 kB Preview Download

Additional details

Related works