The data are collections for SMT scripts of SMT-LIB language which is used in SMT solving research. The data contain the SMT scripts that take a relatively long time to solve, which are collected from symbolic execution tools, angr, and KLEE. It is the data used in Boosting Symbolic Execution via Constraint Solving Time Prediction: An Experience Reportwe share the data with the community for conducting further studies.

The data include four datasets, which are constraint models generated with GNU Coreutils using angr and KLEE, BusyBox using angr, and from SMT-comp.

For our collected data, the scripts are collected during symbolic execution analyzing the binary file. The solving that takes more than 1 second would be recorded with the constraint model, analyzing file name and solving time. The index of the SMT script reflects the order and number of solving the query. The data are saved in the JSON structure. However, SMT solving time may not be accurate or stable during experiments. So we further adjust the solving time by querying each model with a pure z3 solver. The time is also saved in the file with the key solving_time_dic.

For SMT-comp, you may also get more information from https://smt-comp.github.io/2020/. We mainly use the (non-increment,QF_BV) dataset in our experiment. To download original data, the address is https://www.starexec.org/starexec/secure/explore/spaces.jsp?id=404954. Or you may use our released version with solving time for this dataset so you can replay our prediction result.

This data may further be used in solving time prediction, solver selection, and path selection of symbolic execution.