Published September 22, 2025
| Version v1
Computational notebook
Open
GenProtein Finder
Description
Welcome to GenProtein Finder!
A Python tool for analyzing protein distribution across multiple genomes, enabling bidirectional searches and comparative genomics analysis.
- Genome-to-Protein Search: Find all proteins present in a specific genome
- Protein-to-Genome Search: Identify which genomes contain a specific protein
- Occurrence Statistics: Analyze protein distribution and identify shared proteins across genomes
- Partial Name Matching: Search genomes using partial names (e.g., "GCA_")
- Interactive Menu: User-friendly command-line interface
- Python 3.6+
- pandas
- openpyxl (for Excel file support)
Install required dependencies:
pip install pandas openpyxl
The script expects an Excel file with the following structure:
- Columns 1-3: Metadata (ignored by the script)
- Columns 4+: Each column represents a genome, containing protein names
- Protein names should be listed in each genome column
- Empty cells are automatically handled
Example:
| Meta1 | Meta2 | Meta3 | GCA_000001.1 | GCA_000002.1 | GCA_000003.1 |
|---|---|---|---|---|---|
| ... | ... | ... | CP000360.1_737 | CP000360.1_737 | CP000455.1_123 |
| ... | ... | ... | CP000360.1_891 | CP000455.1_123 | CP000360.1_737 |
- If you are using Linux and running from the terminal, give execution permission to the script:
chmod +x genprotein_finder.py
- Update the file path in the script:
excel_file = "path/to/your/excel/file.xlsx"
- Run
python3 genprotein_finder.py
Vinicius Henrique de Oliveira Franzote - vinicius.henrique@unesp.br
Files
GenProtein-Finder.zip
Files
(6.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:7e6cf8188abdc8e0fb62ed4e39c2eeab
|
6.5 kB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/vinyheenryy94/GenProtein-Finder