Published October 20, 2017 | Version v0.1.14
Software Open

diana-hep/spark-root: Release 0.1.14

  • 1. CERN

Description

Updates for experimental package:

  • Polishing of I/O
  • Added Optimization Passes over the constructed Intermediate Schema:
    • Remove Empty Rows Passes
    • Remove nulls. Comes in 2 versions: Soft and Hard. Hard will remove all the branches that are not splittable and contains null as one of the fields. Soft just removes nulls without checking for this "branch safety".
    • Schema Pruning - Prunes as deep as Spark allows. Takes effect together with Apache Spark PR: https://github.com/apache/spark/pull/16578
    • All of optimizations are enabled by default (w/o SoftRemove) and can be turned off/on with spark.sqlContext.read.option("OptimizationName", true/false or "on/off").

Updates for org.dianahep.sparkroot package:

  • Default parallelism is the number of files.

Files

diana-hep/spark-root-v0.1.14.zip

Files (228.8 kB)

Name Size Download all
md5:3c126ddf1989c683739d536b7cca6e64
228.8 kB Preview Download

Additional details

Related works