# Expected vs. Observed Peaks

## Running the Experiment

First, set up the sequencing data. To do so, enter `seq_data/`. Run the following to download and unzip `seqdata.tar.gz`:

```
wget https://data.cyverse.org/dav-anon/iplant/home/vhaghani26/Rocketchip_Data/seqdata.tar.gz
tar -xvzf seqdata.tar.gz
mv seqdata/* .
rm -rf seqdata/
```

Go back to the `exp_vs_obs/` directory and open `exp_vs_obs.py` in a text editor of your choice. In the "Import Modules" section, go to the "User-specific variables" section and change the working directory and authors variable to your working directory (do not put a `/` at the end of the directory name) and name. Save and close the file.

Now, run the script. This will trigger the creation of all project files, creation of all snakefiles, execute the snakefiles to conduct the analysis, carry out statistics, and visualize the results, except for the heatmaps used in the paper, which are generated by [generate_heatmap.ipynb](https://github.com/vhaghani26/rocketchip_tests/blob/main/exp_vs_obs/generate_heatmap.ipynb). Please note this will take a long time due to the amount of tests. If you would like to run it, simply run the following after you have made the appropriate changes:

```
python3 exp_vs_obs.py
```

## Generating the Sequence Data

The sequence was generated by `seq_data/datasynth`, a program written by Alan Zhang and Ian Korf to model ChIP-seq data. A CSV file, `seq_data/to_generate.csv`, was made that contained the parameters for each generated set of data. `seq_data/generate.py` reads in `seq_data/to_generate.csv` and generates the ChIP-seq data using the specified parameters using `seq_data/datasynth`. This creates a log, `seq_data/log` that documents each command used in datasynth. The data was generated by running the following:

```
python3 generate.py ./datasynth 2> log
```

Then, it was organized using:

```
mkdir paired_broad paired_narrow single_broad single_narrow

for dir in paired_broad_*; do
    new_name="test_$(echo "$dir" | sed 's/paired_broad_//')"
    mv "$dir" "paired_broad/$new_name"
done

for dir in paired_narrow_*; do
    new_name="test_$(echo "$dir" | sed 's/paired_narrow_//')"
    mv "$dir" "paired_narrow/$new_name"
done

for dir in single_broad_*; do
    new_name="test_$(echo "$dir" | sed 's/single_broad_//')"
    mv "$dir" "single_broad/$new_name"
done

for dir in single_narrow_*; do
    new_name="test_$(echo "$dir" | sed 's/single_narrow_//')"
    mv "$dir" "single_narrow/$new_name"
done
```

Upon generation and completion, we packaged it into `seqdata.tar.gz` for distrubution and storage purposes.
