﻿This DATSETNAMEreadme.txt file was generated on 2021-10-22 by 
Gabina Calderón-Rosete and Leonardo Rodríguez-Sosa.


GENERAL INFORMATION

1. Title of Dataset: Nucleotide sequences in Procambarus clarkii

2. Author Information
	A. Principal Investigator Contact Information
		Name: Leonardo Rodríguez-Sosa
		Institution: Facultad de Medicina, Universidad Nacional Autónoma de México  
		Address: Av. Universidad 3000 Ciudad Universitaria 04510 Ciudad de México, México
		Email: lrsosa@unam.mx

	B. Associate or Co-investigator Contact Information
		Name: Gabina Calderón-Rosete
		Institution: Facultad de Medicina, Universidad Nacional Autónoma de México 
		Address: Av. Universidad 3000 Ciudad Universitaria 04510 Ciudad de México, México
		Email: gcalderonrosete@yahoo.com.mx

3. Date of data collection: (Autumn) 2019-10-01 

4. Geographic location of data collection: México City. 

5. Information about funding sources that supported the collection of the data: Facultad de Medicina, UNAM. México


SHARING/ACCESS INFORMATION

1. Licenses/restrictions placed on the data: not applicable

2. Links to publications that cite or use the data: Not available 

3. Links to other publicly accessible locations of the data: All sequencing data reported here have been deposited in the GenBank database.   

4. Links/relationships to ancillary data sets: GenBank accession numbers for nucleotide sequences: 
MN110026
MT601680
MT601681
MT601679 
MT601682
MT601683
MT601684
MT942649
KY974273
MN110016
MN110021
MN110023
MN110012
MN110017
MT942642
MT942643
MT942647
MT942648
MN110024
MN110029
MF279133
MN110031
MN110025
MN110018
MN110034
KY974308
MT601685
MN110015
MF279134
MN110020
MN110019
MN110035
MN110013
MN110014
MT601688
MT601689
MN110027
MN110022
MN110033
MN110036
MN110028
MN110038
MH156427
MN110003
MN110004
MN110006
MN110005
MN110009
MN110007
MN110008
MT942646
MN110039
MT942644
MH156441
MN110037
QIA97593
QIA97594
MH156430.1
MN110030
MT601686
MG910470 
MW981273

5. Was data derived from another source?: No
6. Recommended citation for this dataset:
	Transcriptional identification of genes light-interacting in the extraretinal photoreceptors of crayfish Procambarus clarkii 
	Gabina Calderón-Rosete, Juan Antonio González-Barrios, Celia Piña-Leyva, Hayde Nallely Moreno-Sandoval, Manuel Lara-Lozano and Leonardo Rodríguez-Sosa (2021). Zookeys   manuscript #73075 


DATA & FILE OVERVIEW

1. File List: 
APPENDIX S1: Alignments comparing protein sequences of Drosophila and  Procambarus clarkii (pleonal nerve cord, and the eyestalk)  
APPENDIX S2: Nucleotide sequences list referred from Tables 1-7


2. Relationship between files, if important: All genes identified here from the pleonal nerve cord were edited for annotation and submitted to the GenBank database of the National Center for Biotechnology Information (NCBI).

3. Additional related data collected that was not included in the current data package: NO 

4. Are there multiple versions of the dataset?: NO
	A. If yes, name of file(s) that was updated: 
		i. Why was the file updated? 
		ii. When was the file updated? 


METHODOLOGICAL INFORMATION

1. Description of methods used for collection/generation of data: 
Total RNA was extracted from the abdominal nerve cord using TRIzol reagent following the manufacturer’s protocol (Catalog number 15596018, Invitrogen Co., Carlsbad, CA, USA). 
We used 5μg of total RNA to obtain the cDNA libraries, according to the manufacturer’s protocol for the Illumina TruSeq RNA Library Preparation Kit v2 (Catalog number RS-122-2001, Illumina, San Diego, CA, USA). We performed Illumina paired-end protocol 150 bp sequencing. The library obtained was sequenced using the MiSeq Reagent kit v3 system (Catalog number MS-102-3001) according to the manufacturer’s protocol, to obtain the peonal nerve cord transcriptome.

2. Methods for processing the data: Bioinformatic analysis. 

3. Instrument- or software-specific information needed to interpret the data:
The raw data from the Illumina system were uploaded to the Galaxy Web Portal to execute a de novo assembly process, using Trinity software (Grabherr et al. 2011; Haas et al. 2013; Afgan et al. 2016; Oakley et al. 2014). 
The pipeline Phylogenetically Informed Annotation (PIA),(Speiser et al. 2014). 
4. Standards and calibration information, if appropriate: N/A 
5. Environmental/experimental conditions: 
We used four adult crayfish (P. clarkii) two males and two females in their intermolt stage.
6. Describe any quality-assurance procedures performed on the data: 
The reads had quality scores higher than 30, so we did not conduct any procedure to eliminate low-quality sequences. 
7. People involved with sample collection, processing, analysis and/or submission: 
Gabina Calderón-Rosete, Juan Antonio González-Barrios, Celia Piña-Leyva, Hayde Nallely Moreno-Sandoval, Manuel Lara-Lozano and Leonardo Rodríguez-Sosa

DATA-SPECIFIC INFORMATION FOR: 
APPENDIX S1: Alignments comparing protein sequences of Drosophila and  Procambarus clarkii (pleonal nerve cord, and the eyestalk) 

1. Number of variables: 4

2. Number of cases/rows: 232

3. Variable List: 
The encoded protein to the sequences of the eyestalk transcriptome and Drosophila. 
4. Missing data codes: Not appliable

5. Specialized formats or other abbreviations used: Single letter codes used for all 20 amino acids. 


DATA-SPECIFIC INFORMATION FOR APPENDIX S2: Nucleotide sequence list referred from Tables 1-7

1. Number of variables: Nine sets of transcripts that potentially interact with phototransduction process 

2. Number of cases/rows: 1202

3. Variable List:62 Nucleotide sequences in FASTA format. 


4. Missing data codes: Not applicable 

5. Specialized formats or other abbreviations used: Codes for nucleotides.


