Published August 3, 2022
                      
                       | Version v1
                    
                    
                      
                        
                          Dataset
                        
                      
                      
                        
                          
                        
                        
                          Open
                        
                      
                    
                  Data Cleaning, Translation & Split of the Dataset for the Automatic Classification of Documents for the Classification System for the Berliner Handreichungen zur Bibliotheks- und Informationswissenschaft
Creators
Contributors
Supervisors:
- 1. Humboldt-Universität zu Berlin
- 2. University of Kassel
Description
- Cleaned_Dataset.csv – The combined CSV files of all scraped documents from DABI, e-LiS, o-bib and Springer.
- Data_Cleaning.ipynb – The Jupyter Notebook with python code for the analysis and cleaning of the original dataset.
- ger_train.csv – The German training set as CSV file.
- ger_validation.csv – The German validation set as CSV file.
- en_test.csv – The English test set as CSV file.
- en_train.csv – The English training set as CSV file.
- en_validation.csv – The English validation set as CSV file.
- splitting.py – The python code for splitting a dataset into train, test and validation set.
- DataSetTrans_de.csv – The final German dataset as a CSV file.
- DataSetTrans_en.csv – The final English dataset as a CSV file.
- translation.py – The python code for translating the cleaned dataset.
Files
      
        Cleaned_Dataset.csv
        
      
    
    
      
        Files
         (866.4 MB)
        
      
    
    | Name | Size | Download all | 
|---|---|---|
| md5:33c171cb8be54533dc94c4f27d426960 | 166.7 MB | Preview Download | 
| md5:def2b046b3f4617db27945983817ad30 | 397.8 kB | Preview Download | 
| md5:284adf9cb77c07cbc51c0e1520ff3894 | 184.5 MB | Preview Download | 
| md5:c7cff475f4ae44819259f645476e2758 | 165.0 MB | Preview Download | 
| md5:a4491c0d2fca781eaa93ff66c64bcb01 | 36.6 MB | Preview Download | 
| md5:db8dbc1005a119f4a5017b9202c54513 | 92.3 MB | Preview Download | 
| md5:932e2f06114c58fa76341b13e4609464 | 36.2 MB | Preview Download | 
| md5:8b2a07697b869d3ec74e95ff89576289 | 40.8 MB | Preview Download | 
| md5:2e7f7d370b1d6a7f24490d2d8f788064 | 103.2 MB | Preview Download | 
| md5:1924c883a2bd1fee39d72a39f6ea9311 | 40.6 MB | Preview Download | 
| md5:f7b2240dc5f9a6c92849f94c463e16ca | 5.2 kB | Download | 
| md5:f0cbe75a49a888d5804e030504f8dc71 | 2.0 kB | Download |