Planned intervention: On Thursday 19/09 between 05:30-06:30 (UTC), Zenodo will be unavailable because of a scheduled upgrade in our storage cluster.
Published July 27, 2020 | Version v1.0.0
Software Restricted

Classification of immune receptor repertoires using machine learning methods

  • 1. University of Liverpool
  • 2. University of Cambridge
  • 1. University of Cambridge
  • 2. University of Oxford
  • 3. Royal Surrey County

Description

R code to generate k-mer counts from a set of CDR3 sequences, and to classify samples based on these counts.

Contains 3 main functions:

NonPos_Matrix: A function to generate a matrix of kmer counts from a matrix of CDR3 counts.
input:  InputFile - csv file which contains a cdr3 by sample matrix of cdr3 counts.
           k - length of kmer to be identified
           OutputFile - file to write out the kmer matrix to, should be .csv.gz
output: writes out the kmer matrix to specified OutputFile

Pos_Matrix: A function to generate a matrix of positional kmer counts from a matrix of CDR3 counts.
input:  InputFile - csv file which contains a cdr3 by sample matrix of cdr3 counts.
           k - length of kmer to be identified
           OutputFile - file to write out the kmer matrix to, should be .csv.gz
output: writes out the kmer matrix to specified OutputFile

ClusterOptim: A function which clusters a kmer count matrix based on its principal components, and identifies the set of principal components which generate the optimal clustering.
input:  file_name - csv file which contains a kmer by sample matrix of kmer counts, of the type generated by NonPos_Matrix and Pos_Matrix
           classes - a vector of 1s and 2s, in which each entry corresponds to a column (sample) of the input file and indicates whether the sample is a case (=1) or a control (=2)
           plotAll=FALSE - logical, should plots of hierarchies of all PC combinations evaluated be plotted.
           outDir='Clusters' - name of directory to which all output files are written.
output: writes out 3 files; plot of the optimal hierarchy, accuracy of all PC combinations evaluated, and summary of accuracy of all PC combinations evaluated. Optionally also writes out plots of hierarchies for all PC combinations evaluated.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

This code is limited to non-commercial use.

You are currently not logged in. Do you have an account? Log in here