Recognising innovative companies by using a diversified stacked generalisation method for website classification – the raw results
Description
Introduction
The classification models were trained out by using the Classification and Regression Training package (caret) [1]. The models' parameters were fine-tuned by the 10-fold cross-validation procedure [2].
Cluster parameters
Most computations were carried out on a cluster having the following parameters:
- GPU: NVIDIA Tesla P100;
- CPU: 2.0 GHz Intel® Xeon® Platinum 8167M;
- The number of GPUs: 2;
- The number of CPU cores: 28;
- The number of CPU threads: 56;
- RAM: 192 GB;
- Storage: 3 TB.
Only one model (k-nn) was calculated on a cluster having the following parameters:
- Processor: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz 3.40 GHz;
- RAM: 16 GB;
- Windows 64 bit.
Performance statistics
All performance statistics are stored in cvs files. Each file corresponds to a particular machine learning method such as a file, "methodName-stat.csv" contains all data regarding a method, "methodName." All files cover the following columns:
- dataSetName – a name of a data set on which evaluation was carried out; there are three possible values: (i) firstPages refers to the first data set (LD) that contains textual description of a company; (ii) firstPageLabels refers to the second data set (LL) that involves link labels that were extracted from an index page; (iii) aggregateDocument refers to the third data set (LB) that consists of a so-called big document;
- fmeasure - the number of features that were taken into account during evaluation;
- method - the name of function in the caret package;
- parameters - the values of parameters received from a tuning phase of a given classification method;
- precision – the value of method’s precision;
- recall – the value of method’s recall;
- fmeasure - the value of method’s F-measure;
- error - the value of method’s error;
- acc – the value of method’s.
Time processing statistics
All time processing statistics, like the performance statistics, are stored in cvs files. Each file corresponds to a particular machine learning method such as a file, "methodName-time.csv". All files cover the following columns:
- dataSetName – a name of a data set on which evaluation was carried out; there are three possible values: (i) firstPages refers to the first data set (LD) that contains textual description of a company; (ii) firstPageLabels refers to the second data set (LL) that involves link labels that were extracted from an index page; (iii) aggregateDocument refers to the third data set (LB) that consists of a so-called big document;
- featureNo - the number of features that were taken into account during evaluation;
- method - the name of function in the caret package;
- user - user time elapsed for executing a method as an R process;
- system - system time elapsed for executing a method as an R process;
- elapsed - total time elapsed for executing a method as an R process.
For more information about user, system and total elapsed time, please see documentation [3].
References
[1] https://cran.r-project.org/web/packages/caret/
[2] https://topepo.github.io/caret/model-training-and-tuning.html
[3] https://stat.ethz.ch/R-manual/R-devel/library/base/html/proc.time.htm
Files
adaboost-stat.csv
Files
(983.1 kB)
Name | Size | Download all |
---|---|---|
md5:2eb673f8a4f9c4840fc1b422d752f295
|
1.5 kB | Preview Download |
md5:fd3bc3abda51cf3edb33b3a66ebdaa71
|
683 Bytes | Preview Download |
md5:c8b9cdbef0647caeb4214a515a9c13c7
|
10.4 kB | Preview Download |
md5:7119999d440155a877e0bbc74c4e0621
|
6.2 kB | Preview Download |
md5:ce325fa70fc2a185346e15ec90d10358
|
1.1 kB | Preview Download |
md5:72882e7ddfd46135a1c81500e0f53372
|
686 Bytes | Preview Download |
md5:e2bc4734cbecf452e50bc9de0cd69f8d
|
12.4 kB | Preview Download |
md5:4b3b6145d71abb29a864755d00a9894d
|
6.5 kB | Preview Download |
md5:9fb53998bffb770157154395f2a90a40
|
1.9 kB | Preview Download |
md5:6d198e1dbdcf26a8edae64f4daab9177
|
661 Bytes | Preview Download |
md5:3383538b7ea562144a5934122b9eeed4
|
11.4 kB | Preview Download |
md5:df165186dc9949363b3b56575cd43293
|
7.1 kB | Preview Download |
md5:2d9dcde50c2018d4c9158f1f60893af1
|
12.5 kB | Preview Download |
md5:f7e7a50ce2441016dc58fe0a86f877cd
|
6.1 kB | Preview Download |
md5:8d1b66f6e6368cb6c5218dfb84c9f426
|
27.5 kB | Preview Download |
md5:2e3f919fd0bf59af3653f09d9cbc5911
|
15.6 kB | Preview Download |
md5:484162db06d71f467462bac2afb2a965
|
5.1 kB | Preview Download |
md5:074cd5619cb141230af7eeadb01066b2
|
2.7 kB | Preview Download |
md5:5352789adcbab6d62c2d28766d4c6a5a
|
11.4 kB | Preview Download |
md5:7a3f98a4438e3e83267b62528ebc9b60
|
6.6 kB | Preview Download |
md5:b9a4d8ca7759691c15ace0a5ab47b52b
|
15.4 kB | Preview Download |
md5:b62a433de30ea7a165e68fd9331b67dc
|
6.8 kB | Preview Download |
md5:9b9136aa314b9fb9543264ff2d7a9151
|
9.8 kB | Preview Download |
md5:ba596a468fd854a2348bf779e536df34
|
5.5 kB | Preview Download |
md5:300e584be37627fd5a1aee883e597ca8
|
55.8 kB | Preview Download |
md5:81647bc0daff05df0e1911ec13e2c76c
|
32.2 kB | Preview Download |
md5:8807bf5483f2304228bb8a319128315f
|
1.6 kB | Preview Download |
md5:f461a8ef8ecabed5ee9a6876b989dc9e
|
817 Bytes | Preview Download |
md5:6ebebf44107fcaed206af98e3516aefa
|
8.2 kB | Preview Download |
md5:c03c2162c6c661983c69c3bae404014d
|
6.3 kB | Preview Download |
md5:9669f3a59f493f72e5ba42bc02518b96
|
49.7 kB | Preview Download |
md5:5f29f41a303a7140ac9d24af1f9ef5e0
|
31.7 kB | Preview Download |
md5:5e48a2436ea8afb532c93cae9bdf4d48
|
11.9 kB | Preview Download |
md5:00668acf411764eb57c3bf9a8849f510
|
7.6 kB | Preview Download |
md5:f403eba814a783de0b5697afb675f81a
|
12.5 kB | Preview Download |
md5:9e5840508fc7b2e75103010b7e7c87c6
|
7.0 kB | Preview Download |
md5:9714dc2fb742af720ee5fb1684def8f9
|
569.8 kB | Preview Download |
md5:444e3c18fc6112312413a2218447f145
|
1.7 kB | Preview Download |
md5:6649637777fedd82e8ef921119912d42
|
888 Bytes | Preview Download |