# READ ME [gs-quickstart]:https://cloud.google.com/storage/docs/quickstarts-console [gs-patcit]:https://console.cloud.google.com/storage/browser/patcit/ [^nest]: E.g. there are many authors with a name, surname, a gender name, etc for a unique publication #### Data structure The PatCit dataset has the following structure: ```bash 📁 patcit ├── 📝 README.md ├── 🗃️ frontpage_allmeta.tar ├── 🗃️ frontpage_bibliographicalreference.tar ├── 🗃️ frontpage_database.tar ├── 🗃️ frontpage_normstandard.tar ├── 🗃️ frontpage_wiki.tar ├── 🗃️ intext_bibliographicalreference.tar └── 🗃️ intext_patent.tar ``` Each `.tar` file contains: - compressed data file(s) in newline delimited JSON (`.jsonl.gz`) corresponding to the data table itself. When the table is large, it is chunked in multiple files. - the schema of the data table in JSON (`.json`). #### Build a table It is not possible to detail all the possible procedures due to the large diversity of database services. Instead, below are the general guidelines for any database service. 1. Download the tar file(s) corresponding to the table(s) you are interested in 2. Untar the file(s) (e.g. `tar -xvf ` on mac/linux) 3. Unzip the data file(s) (e.g. `gunzip *.jsonl.gz` on mac/linux). This step is actually optional since some database services enable table building using zipped data files. 4. Build the table in your SQL like database service using the specified schema #### Any issue? In case you have any trouble, feel free to raise it on the patCit gitHub repository: https://github.com/cverluise/PatCit.