dcTensor: An R package for discrete matrix/tensor decomposition

doi:10.5281/zenodo.8275544

Published August 23, 2023 | Version v1.2.3

Software Open

dcTensor: An R package for discrete matrix/tensor decomposition

Koki Tsuyuzaki¹

1. RIKEN ACCC BiT

Matrix factorization (MF) is a widely used approach to extract significant patterns in a data matrix. MF is formalized as the approximation of a data matrix $X$ by the matrix product of two factor matrices $U$ and $V$. Because this formalization has a large number of degrees of freedom, some constraints are imposed on the solution. Non-negative matrix factorization (NMF) imposing a non-negative solution for the factor matrices is a widely used algorithm to decompose non-negative matrix data matrix. Due to the interpretability of its non-negativity and the convenience of using decomposition results as clustering, there are many applications of NMF in image processing, audio processing, and bioinformatics.

A discrete version of NMF can also be considered by imposing a binary solution (e.g., {0,1}) for the factor matrices extracted from the data matrix and it is called binary matrix factorization (BMF). BMF is recently featured in some data science domains such as market basket data, document-term data, Web click-stream data, DNA microarray expression profiles, or protein-protein complex interaction networks.

Although BMF is becoming more used, in the current data analysis, further extensions are required. For example, we may need a ternary solution (e.g., {0,1,2}) instead of a binary one. Here, I call it ternary matrix factorization (TMF). TMF would contribute to the extraction of ordered patterns, such as stages of disease severity. It is also possible to apply the discretization to only one of the two factor matrices ($U$ or $V$) and here I call it semi-binary matrix factorization (SBMF) or semi-ternary matrix factorization (STMF). This extension contributes to the extraction of discrete patterns in continuous-valued matrix data. Finally, there is a growing demand to extend MF to the simultaneous factorization of multiple matrices or tensors (high-dimensional arrays). Such heterogeneous data sets are obtained when multiple measurements with a common data structure are performed under different experimental conditions. Therefore, it is very convenient if discretization is available to such heterogeneous data structures. To meet these requirements, I originally developed dcTensor, which is an R/CRAN package to perform some discrete matrix/tensor decomposition algorithms (https://cran.r-project.org/web/packages/dcTensor/index.html).

Files

rikenbit/dcTensor-v1.2.3.zip

Files (359.2 kB)

Name	Size	Download all
rikenbit/dcTensor-v1.2.3.zip md5:66c211d2e057d1c080c3d099902c324c	359.2 kB	Preview Download

Additional details

Is supplement to: https://github.com/rikenbit/dcTensor/tree/v1.2.3 (URL)

	All versions	This version
Views	139	39
Downloads	16	6
Data volume	3.8 MB	2.2 MB

dcTensor: An R package for discrete matrix/tensor decomposition

Creators

Description

Files

rikenbit/dcTensor-v1.2.3.zip

Files (359.2 kB)

Additional details

Related works