# Methodology #

- Considered two main ATLAS papers		
	- SC98	
	- PC (J. Parallel Computing) 2001	
- Used Publish or Perish to download metadata about papers that cited these papers		
	- Publish or Perish has a limitation: You can get at most 1,000 such results	
	- SC98: Google Scholar says there are 1,442 papers that cite this one, but Publish or Perish only returned 998 results (within the 1,000 limit)	
	- PC2001: Google Scholar says there are 1,911 papers that cite this one, but Publish or Perish only returned 967 results	
	- I am not sure why you can’t get exactly 1,000 results if that many exist, but it is what it is.	
- SC98 paper: Raw metadata in `raw-sc98-cites-2022-07-04` (Excel tab and `.csv`; downloaded from PoP on July 4, 2022)
	- Removed self citations, i.e., any paper on which either Jack or Clint were coauthors. [-24 papers]	
	- Looked at paper titles. Kept papers that seemed likely to include a _relevant_ citation to ATLAS. “Relevance” means the paper is clearly about autotuning, is a likely application of autotuning, is a likely user of the ATLAS library.	
		- For “opaque” titles, used Google Translate and Bing Translate to assess relevance.
		- Looked at the actual papers in several instances to verify the presence of a _relevant_ citation. Excluded additional papers based on nonrelevance. [-10 papers]
	- Checked for “duplicate” papers. De-duplication is tricky because sometimes papers have the exact same title but are otherwise distinct papers (e.g., workshop version versus journal version). In some cases, retained the distinct copies; in other cases, “merged” the citation counts. For instance, arXiv or tech report versions were always merged (since they aren’t distinct peer-reviewed publications). [-13 papers]	
	- Corrected years [3 papers]	
	- Removed impossible years [5 papers]	
	- Corrected entire entry [1 paper]	
	- Missing source information [1 paper]	
	- *Total “distinct” entries removed: 54 papers (leaving 944 results for analysis)*	
	- `sc98-filter-notes` has manually annotated notes; `sc98-filtered-and-merged` is derived from `sc98-filter-notes` (Excel tab and `.csv`)	
- PC2001 paper: Raw metadata in `raw-pc2001-cites-2022-07-07` (Excel tab + `.csv`; downloaded from PoP on July 7, 2022)
	- 967 citing papers initially	
	- -59 papers: self citations	
	- -11: not valid (bad year — less than 2001 or greater than 2022)	
	- -5: Duplicates (merges)	
	- *Total “distinct” entries removed: 75 papers (leaving 892)*	
	- `pc2001-filter-notes` has manually annotated notes; `pc2001-filtered-and-merged` is derived from `pc2001-filter-notes` (Excel tab and `.csv`)	
- Merged the two lists: 1835 papers (including duplicates — see below)		
	- Result is `merge-for-dedup`	
	- [-75] Deduplicated publications (i.e., papers that cite both the SC98 and PC2001 paper)	
- Final list: 1,760 papers: `final-dedup` (Excel tab and `.csv`)		
		
# For Word Cloud #

- Used this tool: [Free Word Cloud Generator](https://www.freewordcloudgenerator.com/generatewordcloud)		
- Word replacements / substitutions / cleaning		
	- Replaced “auto-tuning” with “autotuning” for consistency (31 instances)	
	- Removed: using, based, approach, methods, via	
	- Approximate synonyms	
		- model, models, modeling -> modeling
		- optimization, optimizations, optimizing, optimisation -> optimization
		- programs, program -> program
		- multiply, multiplication -> multiplication
		- high performance, high-performance, hpc, performance -> performance
		- gpus, gpu -> gpu
		- application, applications -> applications
		- automatic performance tuning, automated performance tuning, automatic tuning, automated tuning, autotuning -> autotuning
		- library, libraries -> libraries
		- generating, generator -> generator
		- algorithm, algorithms -> algorithms
		- matrices, matrix -> matrix
		- system, systems -> system
		- locality, cache -> locality
		- compilation, compilers, compiling, compile -> compilers
		- parallelism, parallelization, parallelisation, parallel -> parallel
		- tile, tiles, tiling, tiled -> tiling
		- computation, computational, computing, compute -> computational
	- End result: 11,745 words in total across all titles; of these, 3,116 unique tokens (terms)
	- This result is `titles-for-word-cloud` (Excel tab and `.csv`)
- Roboto Condensed, Discoteca palette, 100 entries
	- Exported counts from word cloud generator appear in `word-cloud-results` (Excel tab and `.csv`)

# Other resources #

* Google Sheet: https://docs.google.com/spreadsheets/d/11kPhpIO4XzXIibeRk0BtQkaR3kPkHLlBK5EO-N6yj8Y/edit?usp=sharing
* Excel and ODS (OpenDocument / OpenOffice) formats also available in this collection: `cites-atlas-results.xlsx` and `cites-atlas-results.ods`
* To generate plots from the data, see `make-plots.ipynb` (Jupyter notebook with R code; you will need to clone https://bitbucket.org/rvuduc/r-vutils/ for additional plotting utility code)
