Dataset work:

note: knb.ecoinformatics.org might be a good place to find some data


dataset_208: coordinates are in UTM ... for sites, how do I assign sites for samples without a location? there are multiple sampling locations in a given site (210 sites total). Should I average the utm's?

d210: No spatial data. Sent email to the webmaster at Cedar Creek to request GIS data (this is the provided protocol for doing so). In addition, I requested GIS data for experiments that may be useful in addtion to d210, including: PLE014, PLE054, PLE172, and PLE061. See data catalogue: http://www.cbs.umn.edu/explore/field-stations/cedarcreek/research/data

d213: Problem in coordinates ... coordinates are given in meters relative to quadrate points. No record of lats and longs that I can find.

d223: No spatial data. Sent spatial data request to my friend who works at Sevilleta.

d225: Downloaded environmental data, put in new scratch folder for spatial data. Extracted coordinate data (provided in UTM) and converted to lat/long. Stored in newly created document in core-transient-datasets ... "site_data.csv"). For the species abundance dataset, it will require running the data prep script entitled "data_prep.r" that will convert the data from long format to wide format. Information for dataset_225 has been filled out completely in the data_source_table and is ready for analysis.

d226: Got lat/long data from Dornelas (1 site), added a site column, added to site_data spreadsheet. Information for dataset_226 has been filled out completely in the data_source_table and is ready for analysis.

d227: BBS, waiting on this one.

d228: Was listed as obtained but this was likely a mistake. I obtained d228 from the Hubbard Brook website and stitched together the datasets from the 4 sites used in the study. Location data were not provided as lats and longs. I contacted Scott Sillett (someone I know from SMBC) to see if he could provide these data. Information for dataset_226 has beenfilled out completely in the data_source_table and the dataset will be ready for analysis once I get the coordinate data.

d229: CBC, waiting on this one.

d230-d234: I have these listed as obtained, but they are not on github. They may be on my computer in the lab. Waiting on these until I go into the office.

Update: d232 and d234 have been added to the core-transient-datasets folder. spatial data are still needed for d232. d234 has only one site and the lat and long of the site is provided. d234 is ready for analysis and the information for this dataset has been filled out completely in the data_source_table.

d236-d238: There was no spatial data provided for these studies. I sent the principle investigator, Douglas Kelt, an email requesting site coordinates.

d239: This one seems like it will require considerably more work. There are lat/long data attached to physiochemical water quality samples, but the sample-ID's for these data do not follow the same naming convention as the community data. Ack!

d240: data have been obtained. site and lat long data are within the data file itself. Data are prepared for analysis and the data_souce_table is complete, though this dataset may not be appropriate because there are only 7 species present.

To do tomorrow:

- Send Jes an email regarding the location of the R files that she had posted to PBWorks
- Wherever possible, prepare the site file for the test analysis
- Continue to format site data for the available datasets
- Obtain datasets from ecological archives using the EcoData Retriever
- Follow up on emails regarding site data.
- Send email to Megan for more site data from Sevilleta
- Check the dataset Allen posted an issue on (Cnidarians, I believe)
- Send a data request email to Art Shapiro for butterfly data
- Column for long or wide format in the data_source_table ... check through each dataset to determine the current format



