## PtrGNCommitMsg

### Dependencies

#### Environment Dependencies

`python` >= 3.6

`pip` >= 18.1

#### Install Dependency Libraries

There are two requirements files: the one is requirements-cpu.txt including cpu version of `tensorflow`, the other one is requirements-gpu.txt including gpu version of `tensorflow`.

Install dependency libraries using requirements-cpu.txt or requirements-gpu.txt:

```bash
# install with cpu version of tensorflow
$ pip install -r requirements-cpu.txt

# or, install with gpu version of tensorflow
$ pip install -r requirements-gpu.txt
```



### Train

The file `train.py` is the entry of the training. To train the model, run `python train.py` with the following parameters:

| Name              | Value                                                        | Note                                                         |
| ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
| prefix            | str. 'static/data/top1000/top1000' or 'static/data/top1000lowercase/top1000lowercase' or 'static/data/top2000/top2000' or 'static/data/top2000lowercase/top2000lowercase'. | The prefix path of train files.                              |
| restore           | bool. Default is False.                                      | It decides whether continue the previous training. If true, global_step and epoch_i must be also set. |
| global_step       | int.                                                         | The continue global step.                                    |
| epoch_i           | int. Default is 1.                                           | The continue epoch.                                          |
| optimizer         | str. 'sgd' or 'adam'.                                        | The optimizer algorithm.                                     |
| train_type        | str. 'pointer' or 'no_pointer'.                              | It decides whether using pointer-generator network.          |
| attention_option  | str. 'bahdanau', 'luong', 'scaled_luong',  or 'normed_bahdanau'. | The specific attention algorithm.                            |
| lr                | float. Default is 0.0001.                                    | The initial learning rate.                                   |
| batch_size        | int. Default is 64.                                          | The batch size.                                              |
| init_op           | str. 'glorot_uniform', 'uniform', or 'glorot_normal'.        | The initializer algorithm.                                   |
| early_stop_option | str. 'loss', or 'bleu'.                                      | Using loss or bleu on the validation dataset to early stop.  |
| early_stop_init   | float. Default is 0.                                         | The initial best value for early stopping.                   |
| bleu_script       | str.                                                         | The absolute path of the BLEU script from project [mosesdecoder](https://github.com/moses-smt/mosesdecoder). |

For example, 

```bash
$ python train.py \
	--prefix=static/data/top1000lowercase/top1000lowercase \
    --train_type=pointer \
    --optimizer=adam \
    --batch_size=16 \
    --init_op=glorot_normal \
    --bleu_script=/home/username/mosesdecoder/scripts/generic/multi-bleu.perl \
    --early_stop_option=bleu \
    --early_stop_init=0
```

The summary directory for `tensorboard` is in static/summary, and we can run tensorboard as follows:

```shell
$ tensorboard --logdir=static/summary
```



### Prediction

After traning and getting the models, we can predict commit messages on test dataset.

The parameters `prefix`, `global_step` must be set. What's more, the value of  `prefix`, `train_type`, `batch_size` must be same as them when training, and the value of `global_step` is pointing to the model on `static/data/model`.

For example, 

```bash
$ python -m postprocess.get_predicted_targets \
	--prefix=static/data/top1000lowercase/top1000lowercase \
    --global_step=33948 \
    --train_type=pointer \
    --optimizer=adam \
    --batch_size=16
```

After gaining the predicted results of test dataset, we can calculate BLEU scores via the script from project [mosesdecoder](https://github.com/moses-smt/mosesdecoder) as follows:

```shell
$ git clone git@github.com:moses-smt/mosesdecoder.git
# The files reference-sentences-filename and prediction-sentences-filename contain 
# multiple lines, and each line is a sentence.
$ mosesdecoder/scripts/generic/multi-bleu.perl reference-sentences-filename < prediction-sentences-filename
```

And calcaulate ROUGE scores via [sumeval](https://github.com/chakki-works/sumeval) as follows:

```shell
$ pip install sumeval
$ sumeval r-nl -f prediction-sentences-filename reference-sentences-filename -in
```

