# Project subgroups


## Subgroup 1: Bucket of Bugs to BioClip


Goal:
We want A tool that lets us take photographic collections of individual Wild insects And be able to iteratively pass them through something like bioclip in order to rapidly id And refine IDs of insects taxonomically. 

Ideally the program also runs offline as it will be used by field text gathering data and places where internet might not be great.


Members:
* Elizabeth Campolongo
* Ernie Parke
* Andy Quitmeyer
* Matt Thompson



Project Code:
https://github.com/Digital-Naturalism-Laboratories/bucket-o-bugs

## Subgroup 2: SDMs using AI-generated data

*Goal: Broad goal is to get computer vision generated species data and use those data for ecological application. Specifically, we want to develop the Species Distribution Models and predict the suitable habitats of beetle species under various climate scenario using computer vision identified species data.

Task: This goal needs two steps of the work.

* First task is to generate the Convolution Neural Network (CNN) model using NEON beetle image data and use this model for species identification of unknown sample bee species.
* Second task is to get climate data for associated with identified species and conduct the species distribution models which produce the species map for their suitable habitat.

* Members:  
    - Khum Thapa-Magar, INSTAAR, University of Colorado
    - Sarwan Ali, Georgia State University
    - Hsunyi Hsieh, Michigan State University
    - Feel free to join the group if you like

Project Code: https://github.com/Imageomics/sdm-beetlepalooza


## Subgroup 3: Beetle ID and Ten Simple Rules

Goal: We want to see how far taxonomically BioClip can get in identifying individual NEON beetles. 

* Members:  
    - Sydne Record
    - Hilmar Lapp
    - Evan Waite
    - Laura Nagel
    - Kim Landsbergen
    - Isa Betancourt
    - Elizabeth Campolongo

---
- Run BioClip; 
 run 1 - run on segmented images 
 run 2 - run on unsegemented images
 compare

- open classification versus list of known taxa

- Hilmar: 

run 1 -  Bioclip on 6 known images individually per bar code 
         samples run listed below
         (08984, 08914, 08980, 08976, 40688, 40713...);

taxonomists' assessment - in one case, runs convened to same tribe (a group of subgenera) for 08914; not 08984;

run 2 - to level of rank 'biochip predict --rank genus [range of images]
  outcome - tribes not correct

Evan = going from subfamily to tribe - this is a huge leap

Evan = Can we train Bioclip to get to assess each image and stop at the tribe level?
Laura = a goal would be to get to genus - that would be a time-saver

Imagining an example in-person tech workflow 
  AI sort to tribe, then human tech can work on identification lower than tribe
  that would help eliminate a lot to be able to get to tribe (keys would be needed)

Samples are from Wisconsin - same domain 05, 2 different locations (UNDE, STEI)

Laura provided a file w/ all species found within Domain D5 - 
every unique species ID returned has been included in the list

^ list to be used in BioClip to limit identification to that domain-specific list
 filename D05_TaxaList.txt

The NEON Domain 05 list represents specimens already found, that have been expert verified
But this is not the list of what *could* be there (which is a larger number)

Hilmar reran w/ the D05 list; Elizabeth helped w/ formatting table code

efforts below all include D05 list as part of BioClip

Evan - both of these are different species 
     - AI found them too be different
     - but they are in the vial as same species
A00000008980-06  
A00000008980-08  

40688 - ran 10 subsets from this vial - correct ID is Synuchus impunctatus
  
40713 - correct ID for Bembidion transparens
40688 - using the full image - with all the beetles in the image

running it beetle by beetle - the ability to ID to correct taxon is variable
running it as a full image with all beetles included - the correct ID is in the top 3

conversation about how to optimize photos - on Evan's high-res images of their specimens
now running EWIC_00001460, EWIC_0000353, EWIC_0000799, EWIC_0000801, EWIC_0001164

Day 3 wrap-up

Sydne re-ran what we did yesterday, got rid of sub-species data
Created summaries at the tribe, subfamily level
What were scores for each image?

Laura - been data wrangling to evaluate what the cumulative scores were at each of those taxonomic levels
 Assigning flags, at each of those levels, what was right or wrong

Isa - it would be interested to evaluate the number of training images with the Right/Wrong flag

Elizabeth put together a script on Cyvers - where she summarized the training images for BioClip; 36 genera, how many images for training were used in BioClip runs

Hilmar battling to get individually segmented images ready to run each image with its own reference domain list. This file structure needing shuffling and wrangling. 

Goal for tomorrow - to run BioClip on all of the segmented images with the newly wrangled dataset (thank you Hilmar!)


- - - - - - - - - - - - 

Members: Sydne Record, Isabelle Betancourt, Evan Waite, Laura Nagel, Kim Landsbergen, Hilmar Lapp, Elizabeth Campolongo

Group 3 code is in a [group 3 folder](https://github.com/Imageomics/BeetlePalooza-2024/tree/main/Group3) in this repository.

## Subgroup 4: EcoPalette: Integration of environmental data into species images to improve model accuracy 

Members: Alyson East, Nicholas Gunner, Brennan Hays, Daniel Lopez, Isabella Viney 

Subgroup goals: 
* Represent ecosystem metadata visually on beetle image 
* Improve AI model's classification confidence of beetle species using visualized metadata
* Assess the importance of image-encoded metadata in model's accuracy

Workflow: 
* Segment NEON vial-level images of ground beetles into individual beetle images (thanks to Sarwan Ali and Michelle Ramirez)
* Subset beetle image dataset to include only 5 beetle species for proof-of-concept simplicity
* Identify abiotic and biotic ecosystem features of interest based on relevance to beetle niche
* Extract NEON ecosystem data of interest for year 2018 and link to beetle images
* Train and test AI models in identifying beetle species from (1) beetle image subset including image-encoded metadata and (2) beetle image subset NOT including image-encoded metadata
* Compare AI model accuracy from (1) and (2) above

Project code: https://github.com/Imageomics/EcoPalette/tree/main

## Subgroup 5: Easy Traits with ML

Members: Isadora Fluck, Michelle Ramirez, Jennifer Girón, S M Rayeed, Ekaterina Nepovinnykh, Dhanyapriya Somasundaram, Hojin Yoo, Sydne Record

Goal: automate trait measurements from images

Workflow: 
577 images of the beetles + code (phyton + R):

Code:
Input: image with multiple individuals
Output: a data table with columns:
- pictureID (that is linked to speciesID, plotID, siteID, etc);
- individualID (that can be linked to the individual images);
- elythra area;
- elythra width;
- elythra length;

Group5_b: Project Code: https://github.com/yoohj0416/predictbeetle

## Subgroup 6: Gaps in Current Models and What Actually Matters

Members: Nathan, Blair, Alec, Parkash

Grad Cam for BioCLIP: https://github.com/mirkab/BeetlePalooza_2024_Mirka

Grad Cam for ResNet-50: https://github.com/parkash-ps/Imageomics-Beetlepalooza-2024

Goals: 
- Identify where and why current CV models misidentify beetle species.