Conference paper Open Access
Faul, Anita C.; Pilikos, Georgios
There are several challenges with which data present us nowadays. For one there is the abundance of data and the necessity to extract the essential information from it. When tackling this task a balance has to be struck between putting aside irrelevant information and keeping the relevant one without getting lost in detail, known as over-fitting. The law of parsimony, also known as Occam’s razor should be a guiding principle, keeping models simple while explaining the data. The next challenge is the fact that the data samples are not static. New samples arrive constantly through the pipeline. Therefore, there is a need for models which update themselves as the new sample becomes available. The models should be flexible enough to become more complex should this be necessary. In addition the models should inform us which samples need to be collected so that the collection process becomes most informative. Another challenge are the conclusions we draw from the data. After all, as popularized by Mark Twain: "There are three kinds of lies: lies, damned lies, and statistics." An objective measure of confidence is needed to make generalized statements The last challenge is the analysis. Can we build systems which inform us of the underlying structure and processes which gave rise to the data? Moreover, it is not enough to discover the structure and processes, we also need to add meaning to it. Here different disciplines need to work together.