Published January 11, 2019 | Version 1.0.0
Dataset Open

Recognising innovative companies by using a diversified stacked generalisation method for website classification – the raw results

  • 1. National Information Processing Institute

Description

Introduction

The classification models were trained out by using the Classification and Regression Training package (caret) [1]. The models' parameters were fine-tuned by the 10-fold cross-validation procedure [2].

Cluster parameters

Most computations were carried out on a cluster having the following parameters:

  • GPU: NVIDIA Tesla P100;
  • CPU: 2.0 GHz Intel® Xeon® Platinum 8167M;
  • The number of GPUs: 2;
  • The number of CPU cores: 28;
  • The number of CPU threads: 56;
  • RAM: 192 GB;
  • Storage: 3 TB.

Only one model (k-nn) was calculated on a cluster having the following parameters:

  • Processor: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz 3.40 GHz;
  • RAM: 16 GB;
  • Windows 64 bit.

Performance statistics

All performance statistics are stored in cvs files. Each file corresponds to a particular machine learning method such as a file, "methodName-stat.csv" contains all data regarding a method, "methodName." All files cover the following columns:

  • dataSetName – a name of a data set on which evaluation was carried out; there are three possible values: (i) firstPages refers to the first data set (LD) that contains textual description of a company; (ii)  firstPageLabels refers to the second data set (LL) that involves link labels that were extracted from an index page; (iii) aggregateDocument refers to the third data set (LB) that consists of a so-called big document;
  • fmeasure - the number of features that were taken into account during  evaluation;
  • method  - the name of function in the caret package;
  • parameters - the values of parameters received from a tuning phase of a given classification method;
  • precision – the value of method’s precision;
  • recall – the value of method’s recall;
  • fmeasure  - the value of method’s F-measure; 
  • error - the value of method’s error;
  • acc – the value of method’s.

Time processing statistics

All time processing statistics, like the performance statistics, are stored in cvs files. Each file corresponds to a particular machine learning method such as a file, "methodName-time.csv". All files cover the following columns:

  • dataSetName – a name of a data set on which evaluation was carried out; there are three possible values: (i) firstPages refers to the first data set (LD) that contains textual description of a company; (ii)  firstPageLabels refers to the second data set (LL) that involves link labels that were extracted from an index page; (iii) aggregateDocument refers to the third data set (LB) that consists of a so-called big document;
  • featureNo - the number of features that were taken into account during  evaluation;
  • method  - the name of function in the caret package;
  • user - user time elapsed for executing a method as an R process;
  • system  - system time elapsed for executing a method as an R process;
  • elapsed - total time elapsed for executing a method as an R process.

For more information about user, system and total elapsed time, please see documentation [3].

References

[1] https://cran.r-project.org/web/packages/caret/

[2] https://topepo.github.io/caret/model-training-and-tuning.html

[3] https://stat.ethz.ch/R-manual/R-devel/library/base/html/proc.time.htm

Files

adaboost-stat.csv

Files (983.1 kB)

Name Size Download all
md5:2eb673f8a4f9c4840fc1b422d752f295
1.5 kB Preview Download
md5:fd3bc3abda51cf3edb33b3a66ebdaa71
683 Bytes Preview Download
md5:c8b9cdbef0647caeb4214a515a9c13c7
10.4 kB Preview Download
md5:7119999d440155a877e0bbc74c4e0621
6.2 kB Preview Download
md5:ce325fa70fc2a185346e15ec90d10358
1.1 kB Preview Download
md5:72882e7ddfd46135a1c81500e0f53372
686 Bytes Preview Download
md5:e2bc4734cbecf452e50bc9de0cd69f8d
12.4 kB Preview Download
md5:4b3b6145d71abb29a864755d00a9894d
6.5 kB Preview Download
md5:9fb53998bffb770157154395f2a90a40
1.9 kB Preview Download
md5:6d198e1dbdcf26a8edae64f4daab9177
661 Bytes Preview Download
md5:3383538b7ea562144a5934122b9eeed4
11.4 kB Preview Download
md5:df165186dc9949363b3b56575cd43293
7.1 kB Preview Download
md5:2d9dcde50c2018d4c9158f1f60893af1
12.5 kB Preview Download
md5:f7e7a50ce2441016dc58fe0a86f877cd
6.1 kB Preview Download
md5:8d1b66f6e6368cb6c5218dfb84c9f426
27.5 kB Preview Download
md5:2e3f919fd0bf59af3653f09d9cbc5911
15.6 kB Preview Download
md5:484162db06d71f467462bac2afb2a965
5.1 kB Preview Download
md5:074cd5619cb141230af7eeadb01066b2
2.7 kB Preview Download
md5:5352789adcbab6d62c2d28766d4c6a5a
11.4 kB Preview Download
md5:7a3f98a4438e3e83267b62528ebc9b60
6.6 kB Preview Download
md5:b9a4d8ca7759691c15ace0a5ab47b52b
15.4 kB Preview Download
md5:b62a433de30ea7a165e68fd9331b67dc
6.8 kB Preview Download
md5:9b9136aa314b9fb9543264ff2d7a9151
9.8 kB Preview Download
md5:ba596a468fd854a2348bf779e536df34
5.5 kB Preview Download
md5:300e584be37627fd5a1aee883e597ca8
55.8 kB Preview Download
md5:81647bc0daff05df0e1911ec13e2c76c
32.2 kB Preview Download
md5:8807bf5483f2304228bb8a319128315f
1.6 kB Preview Download
md5:f461a8ef8ecabed5ee9a6876b989dc9e
817 Bytes Preview Download
md5:6ebebf44107fcaed206af98e3516aefa
8.2 kB Preview Download
md5:c03c2162c6c661983c69c3bae404014d
6.3 kB Preview Download
md5:9669f3a59f493f72e5ba42bc02518b96
49.7 kB Preview Download
md5:5f29f41a303a7140ac9d24af1f9ef5e0
31.7 kB Preview Download
md5:5e48a2436ea8afb532c93cae9bdf4d48
11.9 kB Preview Download
md5:00668acf411764eb57c3bf9a8849f510
7.6 kB Preview Download
md5:f403eba814a783de0b5697afb675f81a
12.5 kB Preview Download
md5:9e5840508fc7b2e75103010b7e7c87c6
7.0 kB Preview Download
md5:9714dc2fb742af720ee5fb1684def8f9
569.8 kB Preview Download
md5:444e3c18fc6112312413a2218447f145
1.7 kB Preview Download
md5:6649637777fedd82e8ef921119912d42
888 Bytes Preview Download