Cell Maps Image Embedding

Description: {DESCRIPTION}

Version: {VERSION}

Usage

cellmaps_image_embeddingcmd.py [-h] --inputdir INPUTDIR [--model_path MODEL_PATH] [--provenance PROVENANCE] [--name NAME] [--organization_name ORGANIZATION_NAME] [--project_name PROJECT_NAME] [--fold FOLD] [--fake_embedder] [--dimensions DIMENSIONS]
                                  [--suffix SUFFIX] [--logconf LOGCONF] [--skip_logging] [--verbose] [--version]
                                  outdir

Outputs

The tool creates several files and folders in the specified output directory.
It generates separate directory for each fold (e.g. 2.image_embedding_fold1, 2.image_embedding_fold2).
Below is the list and description of each output generated by the tool.

- image_emd.tsv:
    A tab-separated file containing the generated embeddings for each image. Each row corresponds to an image and the subsequent columns contain the embedding vector.

            1	2	3	4
    BPTF	-0.037030112	-0.139459819	0.417184144	0.386600941
    KAT2B	0.02969132	-0.139459819	-0.038685802	0.136547908
    PARP1	-0.037030112	-0.139459819	0.540370524	0.119614214
    MSL1	0.18169874	-0.139459819	-0.038685802	0.152157351
    KAT6B	-0.037030112	-0.139459819	0.308141887	0.257056117

- labels_prob.tsv:
    This tab-separated file contains probability scores for each of the 28 possible protein labels (e.g., Nucleoplasm, N. membrane, etc.) for each image.

        Nucleoplasm	N. membrane	Nucleoli	N. fibrillar c.
    BPTF	0.740698278	0.270941526	0.147179633	0.149313971
    KAT2B	0.38626197	0.092356719	0.36738047	0.238842875
    PARP1	0.596435964	0.100168504	0.382214785	0.179471999
    MSL1	0.195862561	0.01370267	0.101418771	0.038516384
    KAT6B	0.606423676	0.101763181	0.337655455	0.201311186

- model.pth:
    The pre-trained Densenet model used for image embedding.

- blue_resize:
    This directory contains images that are processed in the blue channel.

- green_resize:
    This directory contains images that are processed in the green channel.

- red_resize:
    This directory contains images that are processed in the red channel.

- yellow_resize:
    This directory contains images that are processed in the yellow channel.

Logs and Metadata
-----------------

- output.log:
    A log file detailing the activities and potential issues encountered during the image embedding process.

- error.log:
    If any errors occur during the execution of the script, they will be recorded in this log file.

- ro-crate-metadata.json:
    Metadata in [RO-Crate](https://www.researchobject.org/ro-crate) format, a community effort to establish a lightweight approach to packaging research data with their metadata.
    The main object contains identifier (@id), type (@type), name, descriptions, keywords and isPartOf, that describes the hierarchical relationship (organization and project).
    Graph: The @graph key contains an array of objects that detail other entities related to the main dataset.
    - a. Metadata, Datasets, Software
    - b. Output Files: details of output files generated by the tool.
    - c. Images: details about specific image files, including keywords, descriptions, formats, and content URLs.
