This dataset consists of all of the data used for code style probing. -evaluation set is included in the Data zip -all results are untokenized to allow us to retokenize with different NL prompts or to not use prompts. -non tokenized versions will also allow addressing of truncation curated_eval_set.zip -this file contains eval sets used in testing and getting scores in paper Labeled_Code.tar.zip -raw data for parallel corpora, these csvs contain code samples with transformations applied to them -code is not truncated in these examples -filtering is not done for cases where input == label all bq files will have same length as result -'full_data_acl', -training data for codet5 cluster classifier label is cluster labels input is code 'content' column -'data_full_uncommented_no_outlier.csv', -do not use, uncommenting transformation on subset of data -'bq_data_outliers_acl', -big query data with outliers, transformations derive from this data originally -'bq_data_no_outliers_acl', -big query data with outliers removed -'bq_uncommented_no_outlier.csv', -do not use, uncommenting transformation on no outlier subset -'bq_uncommented_outlier.csv', -parallel corpora for uncommenting -label: 'content' input: 'uncommented_content' -'bq_data_no_outlier_no_class.csv', -do not use, class transformation on no outlier subset -'bq_data_outlier_no_class.csv', -parallel corpora for removing class -uncommented_content got deleted, get from bq_uncommented_outlier.csv -label: 'uncommented_content' input: 'no_class_content' -'bq_data_uncased.csv', -parallel_corpora for removing casing -uncommented_content got deleted, get from bq_uncommented_outlier.csv -label: 'uncommented_content' input: 'casing_content' -'eval_set_no_class.csv', -eval set for testing class parallel corpora with all super() removed -'bq_data_uncomp_fixed_outlier.csv', -list comp parallel corpora -label is 'uncommented_content' input is uncomped_content -'bq_data_outlier_no_class_no_super.csv', -training data for class parallel corpora with super() removed -gold label is 'content', input is 'no_class_content' -'parallel_no_decorators.csv' -prallel corpora with decorators removed in labels -gold label is 'uncommented_content', input is 'decorator_modified' -'bq_no_docstring_outlier_acl.csv' -updated docstring file containing code with docstrings removed -full length code samples -'docstring_method_split.csv' -updated docstring file containing code with docstrings removed in method format instead of full files -whenever using docstrings tokenize this file and split as needed