Have done recent work on the quality of the IRAF process but it's still very time consuming to work with & adapt for new data in CL, eg.:
Filenames repeated numerous times throughout the DR script.
Inconsistent support for passing lists of files to different steps.
Re-processing means deleting files, commenting bits out etc.
Easy to make a mistake but costly.
Script can end up not reflecting what was actually done.
Calibrations have to be selected & specified by user for each step.
No global settings for interactivity, error propagation etc.
Lots of routine boilerplate changes to parameter defaults.
Eg. (contd.):
Also:
Opt to re-write the processing sequence in Python, instead of spending time on monkey work.
Support code to allow scripting the IRAF process simply in Python.
# Use an example that looks like this in FITS:
from astropy.io import fits
fits.info('ergS20120827S0066.fits')
Filename: ergS20120827S0066.fits No. Name Type Cards Dimensions Format 0 PRIMARY PrimaryHDU 251 () 1 MDF BinTableHDU 43 1500R x 8C [1J, 1E, 1E, 5A, 1J, 1J, 1J, 1J] 2 SCI ImageHDU 1583 (2879, 743) float32 3 VAR ImageHDU 1583 (2879, 743) float32 4 DQ ImageHDU 1583 (2879, 743) int16 (rescales to uint16) 5 SCI ImageHDU 1583 (2879, 745) float32 6 VAR ImageHDU 1583 (2879, 745) float32 7 DQ ImageHDU 1583 (2879, 745) int16 (rescales to uint16)
from ndmapper.data import *
# Open the FITS file as a list of NDData instances:
df = DataFile('ergS20120827S0066.fits')
df
DataFile 'ergS20120827S0066.fits' (len 2)
# The FITS primary header lives in a file-level `meta` attribute:
df.meta[:10] # print first 10 lines
SIMPLE = T / Fits standard BITPIX = 16 / Bits per pixel NAXIS = 0 / Number of axes EXTEND = T / File may contain extensions ORIGIN = 'NOAO-IRAF FITS Image Kernel July 2003' / FITS file originator DATE = '2016-01-22T23:43:30' / Date FITS file was generated IRAF-TLM= '2016-01-29T22:21:56' / Time of last modification COMMENT FITS (Flexible Image Transport System) format is defined in 'Astronomy COMMENT and Astrophysics', volume 376, page 359; bibcode: 2001A&A...376..359H INSTRUME= 'GMOS-S ' / Instrument used to acquire data
# The DataFile behaves as a list of NDDataArray instances, with the SCI, VAR & DQ FITS
# extensions translated to data, uncertainty & flags, respectively:
df[0].data # SCI
array([[ 16.32850838, 37.23521423, 50.67963028, ..., 0. , 0. , 0. ], [ 23.59534264, 14.43008041, 45.55400467, ..., 0. , 0. , 0. ], [ 0. , 0. , 23.65302849, ..., 0. , 0. , 0. ], ..., [ 64.25991821, 48.29379272, 35.84579468, ..., 0. , 0. , 0. ], [ 75.59520721, 40.56682205, 18.23557091, ..., 0. , 0. , 0. ], [ 79.64089966, 61.27500153, 83.26106262, ..., 0. , 0. , 0. ]], dtype=float32)
df[0].uncertainty # VAR
<astropy.nddata.nduncertainty.StdDevUncertainty at 0x7f1576990d10>
df[0].flags # DQ
array([[0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], ..., [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0]], dtype=uint16)
df[0].meta[:10]
XTENSION= 'IMAGE ' / Image extension BITPIX = -32 / Bits per pixel NAXIS = 2 / Number of axes NAXIS1 = 2879 / Axis length NAXIS2 = 743 / Axis length PCOUNT = 0 / No 'random' parameters GCOUNT = 1 / Only one group EXTNAME = 'SCI ' / Extension name EXTVER = 1 / Extension version INHERIT = F / Inherits global header
# The name is kept in a filename parser object:
# (which uses a configurable regexp to recognize the base name)
df.filename
FileName 'ergS20120827S0066.fits'
df.filename.dir, df.filename.prefix, df.filename.base, df.filename.ext # etc.
('', 'erg', 'S20120827S0066', 'fits')
# Convert to a new name with modifiers:
FileName(df.filename, prefix='t', suffix='_sum', dirname='')
FileName 'tergS20120827S0066_sum.fits'
df.filename == '../talk/ergS20120827S0066.fits' # path equivalence
True
str(df)
'ergS20120827S0066.fits'
from astropy.nddata import NDData
df=DataFile('arthur.fits', mode='overwrite') # 'read', 'update', 'new' or 'overwrite'
df.append(NDData([1,2,3,4]))
df.append(NDData([5,6,7]))
df
DataFile 'arthur.fits' (len 2)
df[0].ident # FITS EXTVER; can be changed
1
df.save()
fits.info('arthur.fits')
Filename: arthur.fits No. Name Type Cards Dimensions Format 0 PRIMARY PrimaryHDU 4 () 1 SCI ImageHDU 9 (4,) int64 2 SCI ImageHDU 9 (3,) int64
FITS I/O is abstracted into a replaceable back end, but ...
NDData
instances.ident
must be integer.Binary tables get propagated as AstroPy Table instances.
Preserves information on loading & saving.
But no public interface defined for accessing them yet.
Unrecognized image extensions are ignored for now (expect access via something like "extras" attribute).
Pixel data are lazily loaded, avoiding duplicate memory use when just running an IRAF task etc.
NDDataArray
, shamelessly called NDLater
.io.fits
on demand.# Lists of DataFile objects can be instantiated using a DataFileList object:
dfl = DataFileList(['S20120827S0066.fits', 'S20120827S0067.fits', \
'S20120827S0068.fits'], prefix='erg')
dfl.append(df)
dfl
[DataFile 'ergS20120827S0066.fits' (len 2), DataFile 'ergS20120827S0067.fits' (len 2), DataFile 'ergS20120827S0068.fits' (len 2), DataFile 'arthur.fits' (len 2)]
Serves dual purpose of tracking lists of filenames & accessing the data.
Functionality currently minimal but envisage ops. on multiple DataFile
objects.
from ndmapper.iraf_task import run_task
dfl1 = DataFileList(['ergS20120827S0066.fits', 'ergS20120827S0067.fits',
'ergS20120827S0068.fits'])
dfl2 = dfl1[::-1] # flip it for fun
dfout1 = run_task('gemini.gemtools.gemarith', \
inputs={'operand1' : dfl1, 'operand2' : dfl2}, \
outputs={'result' : '@operand1'}, \
suffix='_test1', MEF_ext=False, op='+')
----- START run_task(gemini.gemtools.gemarith) 2016-03-24 22:38:38 GEMARITH begin (this task relies on gemexpr) GEMARITH PHU copy: ergS20120827S0066.fits[0] --> ergS20120827S0066_test1.fits[0,APPEND]) GEMARITH Going to copy MDF from ergS20120827S0066.fits[1] GEMARITH gemarith: done GEMARITH begin (this task relies on gemexpr) GEMARITH PHU copy: ergS20120827S0067.fits[0] --> ergS20120827S0067_test1.fits[0,APPEND]) GEMARITH Going to copy MDF from ergS20120827S0067.fits[1] GEMARITH gemarith: done GEMARITH begin (this task relies on gemexpr) GEMARITH PHU copy: ergS20120827S0068.fits[0] --> ergS20120827S0068_test1.fits[0,APPEND]) GEMARITH Going to copy MDF from ergS20120827S0068.fits[1] GEMARITH gemarith: done END run_task(gemini.gemtools.gemarith) 2016-03-24 22:38:39 -----
dfout1
{'result': [DataFile 'ergS20120827S0066_test1.fits' (len 2), DataFile 'ergS20120827S0067_test1.fits' (len 2), DataFile 'ergS20120827S0068_test1.fits' (len 2)]}
Can loop over files and FITS extensions, handling several bookkeeping tasks.
Input DataFile instances get auto-saved for IRAF if they might have changed in memory.
Thin wrappers for IRAF tasks, using run_task()
.
Simplify the old API & set lots of parameter defaults.
Obey global settings, eg. for interactivity.
Look virtually the same as pure Python steps (with docstrings).
Currently have 2 simple proof-of-concept steps in pure Python.
Also fairly easy to define.
Determine file names & modes, loop over files & NDData
instances and
do the appropriate calculation.
Simplify further with a meta-decorator for iteration etc?
Option to overwrite or re-use existing files automatically.
Hierarchical instrument library exposes inherited processing steps or overloaded, mode-specific versions as needed.
Modules calibrations
and services
provide automated calibration matching
& file downloads (currently for Gemini).
Calibrations matched recursively to get everything needed up front.
Both file download & calibration matching steps become no-ops when previously completed.
Support functions allow 1-line operations.
Checksums verified automatically when downloading; no bad copies.
Only ~15% of code in Gemini-specific back end.
# A draft Python version of my GMOS IFU data reduction example,
# by James E.H. Turner, Jan. 2016.
import ndmapper
from ndmapper.data import FileName, DataFile, DataFileList
from ndmapper.services import look_up_cals, download_files
from ndmapper.calibrations import cal_entries, associate_cals
from ndmapper.lib.cosmetics import init_bpm, add_bpm
from ndmapper.lib.gmos.spec.ifu import *
ndmapper.config['logfile'] = 'example.log'
ndmapper.config['reprocess'] = False
ndmapper.config['interact'] = False
rawdir = 'raw'
# Download specified raw science exposures & open them as DataFiles:
obj_spec = DataFileList(
download_files(['S20120827S0066.fits', 'S20120827S0067.fits',
'S20120827S0068.fits'], server='gemini', dirname=rawdir)
)
# Look up the corresponding calibration files (& their calibrations in turn)
# in the Gemini archive. For now, a "trace" entry has to be added manually for
# the arc after running this ("trace": "S20120827S0069_flat.fits") but is
# preserved when re-running the step.
cal_dict = look_up_cals(obj_spec, dependencies=CAL_DEPS, server='gemini',
cache='calibrations.json')
# Download the matching calibrations:
download_files(cal_dict, server='gemini', dirname=rawdir)
# Process the biases:
for name, flist in cal_entries(cal_dict, 'bias'):
bias_im = DataFileList(flist, dirname=rawdir)
bias_im = make_bias(bias_im, name) # also used for the BPM below
biases[name] = bias_im
# Create manually-defined bad pixel masks for later use (based on a reference
# that has already had its overscann trimmed off). To keep the script generic,
# this should be replaced with a step that picks out a correct pre-defined BPM
# for the GMOS detectors used, but the following is illustrative because the
# exact regions are somewhat dataset-specific and it's currently one of the few
# processing steps implemented purely in Python.
basic_bpm = init_bpm(
bias_im,
[['1909:1912,306:822'
],
['1909:1913,1:4608', '1498:1501,3896:4608', '408:412,1126:4608',
'782,2352:2820', '1090,2062:2308'
],
['1415:1420,2622:4608', '1116:1120,4530:4608'
]], convention='FITS', filename='bpm_1x1_basic.fits'
)
# basic_bpm.save()
# Trace each flat as a reference for extraction:
for name, flist in cal_entries(cal_dict, 'flat'):
flat_spec = DataFileList(flist, dirname=rawdir)
flat_spec = prepare(flat_spec)
flat_spec = associate_cals(biases, flat_spec, 'bias', cal_dict=cal_dict)
flat_spec = subtract_bias(flat_spec)
flat_spec = add_bpm(flat_spec, basic_bpm)
flats[name] = flat_spec # save intermediate result for later processing
out_spec = [FileName(df, suffix='_init') for df in flat_spec]
flat_spec = extract_spectra(flat_spec, out_spec, interact=False)
traces[name] = flat_spec[0]
# Extract the arcs:
for name, flist in cal_entries(cal_dict, 'arc'):
arc_spec = DataFileList(flist, dirname=rawdir)
arc_spec = prepare(arc_spec)
arc_spec = associate_cals(biases, arc_spec, 'bias', cal_dict=cal_dict)
arc_spec = subtract_bias(arc_spec)
arc_spec = associate_cals(traces, arc_spec, 'trace', cal_dict=cal_dict)
arc_spec = extract_spectra(arc_spec)
# (could combine multiple exposures here if needed)
arc_spec = calibrate_wavelength(arc_spec, interact=False)
arcs[name] = arc_spec[0] # assume there's only one for now
arc_spec = associate_cals(arcs, arc_spec, 'arc', from_type='self')
arc_spec = rectify_wavelength(arc_spec)
# Re-process the flats (eventually with scattered light subtraction, but for
# now it's just to complete the outline of the process):
for name in flats:
flat_spec = flats[name]
flat_spec = associate_cals(traces, flat_spec, 'trace', from_type='self')
flat_spec = extract_spectra(flat_spec)
flat_spec = associate_cals(arcs, flat_spec, 'arc', cal_dict=cal_dict)
# flat_spec = rectify_wavelength(flat_spec)
# Update the list of part-processed flats with the final ones:
flats[name] = make_flat(flat_spec, name)[0]
# Process the standard:
for name, flist in cal_entries(cal_dict, 'specphot'):
std_spec = DataFileList(flist, dirname=rawdir)
std_spec = prepare(std_spec)
std_spec = associate_cals(biases, std_spec, 'bias', cal_dict=cal_dict)
std_spec = subtract_bias(std_spec)
std_spec = associate_cals(traces, std_spec, 'trace', from_type='flat',
cal_dict=cal_dict)
std_spec = associate_cals(flats, std_spec, 'flat', cal_dict=cal_dict)
std_spec = extract_spectra(std_spec)
std_spec = associate_cals(arcs, std_spec, 'arc', cal_dict=cal_dict)
std_spec = rectify_wavelength(std_spec)
std_spec = subtract_sky(std_spec)
# std_cube = resample_to_cube(std_spec) # just to check PSF
std_spec = sum_spectra(std_spec)
# For now there's no way to avoid specifying where to look for the flux
# table corresponding to the standard observed, limiting the generality
# of this example a bit, unless the user copies it to the current dir.
standards[name] = calibrate_flux(std_spec, lookup_dir='gmos$calib/',
interact=False)[0]
# Check results:
std_spec = associate_cals(standards, std_spec, 'specphot', from_type='self')
std_spec = apply_flux_cal(std_spec)
# Reduce the science data:
obj_spec = prepare(obj_spec)
obj_spec = associate_cals(biases, obj_spec, 'bias', cal_dict=cal_dict)
obj_spec = subtract_bias(obj_spec)
obj_spec = add_bpm(obj_spec, basic_bpm)
obj_spec = associate_cals(traces, obj_spec, 'trace', from_type='flat',
cal_dict=cal_dict)
obj_spec = associate_cals(flats, obj_spec, 'flat', cal_dict=cal_dict)
obj_spec = extract_spectra(obj_spec)
obj_spec = associate_cals(arcs, obj_spec, 'arc', cal_dict=cal_dict)
obj_spec = rectify_wavelength(obj_spec)
obj_spec = subtract_sky(obj_spec)
obj_spec = associate_cals(standards, obj_spec, 'specphot', cal_dict=cal_dict)
obj_spec = apply_flux_cal(obj_spec)
obj_spec = resample_to_cube(obj_spec)
print obj_spec[0][0].data
Installation from PyPI (v0.1.1 from 12 Mar.) -- requires PyRAF:
# Ureka environment:
ur_setup -n ndmapper_test
# Conda (Python 2) environment:
conda create --name ndmapper_test astropy
source activate ndmapper_test
# Install:
pip install ndmapper
API documentation: http://ndmapper.readthedocs.org
Source code: https://github.com/jehturner/ndmapper
(A few library steps require minor fixes from my Gemini IRAF version at http://drforum.gemini.edu/topic/gmos-ifu-data-reduction-scripts/.)