=============================================
Using the Sumatra API within your own scripts
=============================================

Using the ``smt run`` command is quick and convenient, but it does require you to
change the way you launch your simulations/analyses.

One way to avoid this, if you use Python, is to use the ``sumatra`` package within your own
scripts to perform the record-keeping tasks performed by ``smt run``.

You may also wish to write your own custom script for creating a Sumatra project,
instead of using ``smt init``, but we do not cover this scenario here.
    
We will start with a simple example script, a dummy simulation, that reads a
parameter file, generates some random numbers, and writes some data to file::

    import numpy
    import sys
    
    def main(parameters):
        numpy.random.seed(parameters["seed"])
        distr = getattr(numpy.random, parameters["distr"])
        data = distr(size=parameters["n"])    
        output_file = "example.dat"
        numpy.savetxt(output_file, data)
    
    parameter_file = sys.argv[1]
    parameters = {}
    execfile(parameter_file, parameters) # this way of reading parameters
                                         # is not necessarily recommended
    main(parameters)
    
Let's suppose this script is in a file named ``myscript.py``, and that we have a
parameter file named ``defaults.param``, which contains::

    seed = 65784
    distr = "uniform"
    n = 100
    
Without Sumatra, we would normally run this script using something like::

    $ python myscript.py defaults.param

To run the script using the ``smt`` command line tool, we would use::

    $ smt run --reason="reason for running this simulation" defaults.param
    
(This assumes we have previously used ``smt init`` or ``smt configure`` to specify that
our executable is ``python`` and our main file is ``myscript.py``.)

To benefit from the functionality of Sumatra without having to use ``smt run``, we
have to integrate the steps performed by ``smt run`` into our script.

First, we have to load the Sumatra project::

    from sumatra.projects import load_project
    project = load_project()
    
We're going to want to record the simulation duration, so we import the standard
Python ``time`` module and record the start time::

    import time
    start_time = time.time()
    
We need to slightly modify the procedure for reading parameters. Sumatra stores
the parameters for later use in searching and comparison, so they need to be
transformed into a form Sumatra can use. This is very simple, we just replace
the ``execfile()`` call with a ``build_parameters()`` call::

    from sumatra.parameters import build_parameters
    parameters = build_parameters(parameter_file)
    
Now we create a new ``Record`` object, telling it that the script is the
current file; this automatically registers information about the simulation environment::
    
    record = project.new_record(parameters=parameters,
                                main_file=__file__,
                                reason="reason for running this simulation")

Now comes the main body of the simulation, which is unchanged except that we
take the opportunity to give the output data file a more informative name by
adding the record label to the parameter file::

    output_file = "%s.dat" % parameters["sumatra_label"]

At the end of the simulation, we
calculate the simulation duration and search for newly created files::

    record.duration = time.time() - start_time
    record.output_data = record.datastore.find_new_data(record.timestamp)

Now we add this simulation record to the project, and save the project::

    project.add_record(record)
    project.save()
    
Putting this all together::

    import numpy
    import sys
    import time
    from sumatra.projects import load_project
    from sumatra.parameters import build_parameters
    
    def main(parameters):
        numpy.random.seed(parameters["seed"])
        distr = getattr(numpy.random, parameters["distr"])
        data = distr(size=parameters["n"])    
        output_file = "%s.dat" % parameters["sumatra_label"]
        numpy.savetxt(output_file, data)
    
    parameter_file = sys.argv[1]
    parameters = build_parameters(parameter_file)
    
    project = load_project()    
    record = project.new_record(parameters=parameters,
                                main_file=__file__,
                                reason="reason for running this simulation")
    parameters.update({"sumatra_label": record.label})
    start_time = time.time()
    
    main(parameters)
    
    record.duration = time.time() - start_time
    record.output_data = record.datastore.find_new_data(record.timestamp)
    project.add_record(record)

    project.save()

Now you can run the simulation in the original way::

    python myscript.py defaults.param
    
and still have the simulation recorded in your Sumatra project. For such a
simple script and simple run environment there is no advantage to doing it this
way: ``smt run`` is much simpler. However, if you already have a fairly complex run
environment, this provides a straightforward way to integrate Sumatra's
functionality into your existing system.

You will have noticed that much of the Sumatra code you have to add is
effectively boilerplate, which will be the same for all your scripts. To save time,
and typing therefore, Sumatra provides a ``@capture`` decorator for your
``main()`` function::

    import numpy
    import sys
    from sumatra.parameters import build_parameters
    from sumatra.decorators import capture
    
    @capture
    def main(parameters):
        numpy.random.seed(parameters["seed"])
        distr = getattr(numpy.random, parameters["distr"])
        data = distr(size=parameters["n"])
        numpy.savetxt("%s.dat" % parameters["sumatra_label"], data)
  
    parameter_file = sys.argv[1]
    parameters = build_parameters(parameter_file)
    main(parameters)

This is now hardly any longer than the original script.