Skip to the navigation links
Last modified: 2 January 2025

URL: https://cxc.cfa.harvard.edu/ciao/scripting/io.html

Reading and writing files in Python

The CIAO Data Model can be used to read and write FITS and ASCII files in Python, using the crates module. This provides a high-level interface to data; the cxcdm module is also provided for those cases where low-level access is required and the user is familiar with the Data Model documentation.

The AstroPy I/O module can be used to read and write files with Python, but the CIAO helpdesk does not provide support for this module. The AstroPy I/O modules do not recognize Data Model syntax, such as column selection, filtering, or binning.


The following imports are assumed below:

import numpy as np
from matplotlib import pyplot as plt

import pycrates

Many Crates routines can be used either as a function, such as get_colvals, or as a method on an object, in this case the get_column method of the TABLECrate object returned by read_file. This guide prefers the method option, but either can be used (although note that sometimes there are differences in what the calls return).

For most examples the following event file will be used:

% dmlist acisf13770_repro_evt2.fits blocks

--------------------------------------------------------------------------------
Dataset: acisf13770_repro_evt2.fits
--------------------------------------------------------------------------------

     Block Name                          Type         Dimensions
--------------------------------------------------------------------------------
Block    1: PRIMARY                        Null
Block    2: EVENTS                         Table        16 cols x 85650    rows
Block    3: GTI3                           Table         2 cols x 1        rows
Block    4: GTI2                           Table         2 cols x 1        rows
Block    5: GTI1                           Table         2 cols x 2        rows
Block    6: GTI0                           Table         2 cols x 1        rows
Block    7: GTI6                           Table         2 cols x 1        rows

Note that the CIAO contributed package includes the crates_contrib.utils and crates_contrib.images modules that may provide useful routines.


Reading a single block from a FITS file

The read_file function will read in the "most interesting" block (in this case the "EVENTS block), returning a "crate" object:

cr = pycrates.read_file("acisf13770_repro_evt2.fits")
print(cr)
   Crate Type:        <TABLECrate>
   Crate Name:        EVENTS
   Ncols:             16
   Nrows:             85650

print(cr.get_filename())
acisf13770_repro_evt2.fits
print(cr.get_colnames())
['time', 'expno', 'ccd_id', 'node_id', 'chip(chipx, chipy)',
'tdet(tdetx, tdety)', 'det(detx, dety)', 'sky(x, y)', 'phas',
'pha', 'pha_ro', 'energy', 'pi', 'fltgrade', 'grade', 'status']

There are a number of routines in the crates module for accessing data, such as get_col_names and get_colvals, as well as methods on the crate object, such as get_colnames and get_column (use the Python help routine for help). Some of the routines will return the data as a cratedata object:

tdata = cr.get_column("time")
print(tdata)
  Name:     time
  Shape:    (85650,)
  Datatype: float64
  Nsets:    85650
  Unit:     s
  Desc:     S/C TT corresponding to mid-exposure
  Eltype:   Scalar
  Range:
     Min:   455646659.46568
     Max:   455677834.84234

print(type(tdata))
print(f"name={tdata.name} units={tdata.unit} desc={tdata.desc}")
<class 'pycrates.cratedata.CrateData'>
name=time units=s desc=S/C TT corresponding to mid-exposure
tvals = tdata.values
print(type(tvals))
print(tvals[0:4])
<class 'numpy.ndarray'>
[4.55647616e+08 4.55647616e+08 4.55647616e+08 4.55647616e+08]
x = cr.get_column("x").values
y = cr.get_column("y").values

plt.hexbin(x, y, mincnt=1)
plt.gca().set_aspect(aspect=1)
plt.colorbar()
[A binned representation of the X and Y columns of the event file, similar to that created by DS9.]
[Print media version: A binned representation of the X and Y columns of the event file, similar to that created by DS9.]

The SKY coordinates (that is, x and y) displayed using hex binning.

The data can be filtered, for instance to remove the low and high energy values:

energy = cr.get_column("energy").values
idx, = np.where((energy >= 500) & (energy < 2000))

plt.hexbin(x[idx], y[idx], mincnt=1)
plt.gca().set_aspect(aspect=1)
plt.colorbar()
[The filter applied to the data means that the faing emission of the source is much-more visible in the data now.]
[Print media version: The filter applied to the data means that the faing emission of the source is much-more visible in the data now.]

The events have been filtered to accept only those with energies between 500 and 2000 eV.


Reading an ASCII file

The same routine (read_file) can also be used to read in an ASCII file. First we use the dmcopy tool to copy a subset of the columns from the event file to an ASCII file:

% dmcopy "pcadf13770_001N001_asol1.fits[cols time,ra,dec]" "asol.dat[opt kernel=text/simple]"
% file asol.dat
asol.dat: ASCII text
% head -4 asol.dat
#TEXT/SIMPLE
# time ra dec
4.5564730880322e+08 286.7639309292 4.524806469711
4.5564730905947e+08 286.7640088001 4.524777927964
  

This can then be used with read_file, which returns a TABLECrate object, just as the FITS version did above:

acr = pycrates.read_file("asol.dat")
print(acr)
   Crate Type:        <TABLECrate>
   Crate Name:        asol.dat
   Ncols:             3
   Nrows:             113138

As an example, the aspect solution can be plotted:

ra = acr.get_column("ra").values
dec = acr.get_column("dec").values
t = acr.get_column("time").values

dt = t - t[0]
dra = 3600 * (ra - ra.mean())
ddec = 3600 * (dec - dec.mean())

plt.plot(dt, dra, '-', alpha=0.5, c="black", label="RA")
plt.plot(dt, ddec, '-', alpha=0.5, c="gray", label="Dec")
plt.legend()
plt.xlabel(r"$\Delta T$ (s)")
plt.ylabel(r"$\Delta$ (arcsec)")
[The RA and DEC values vary with time periodically (via a Lissajous curve which means the values are related but they do not just simply repeat).]
[Print media version: The RA and DEC values vary with time periodically (via a Lissajous curve which means the values are related but they do not just simply repeat).]
[NOTE]
Note

The ASCII format does not have all the metadata that the FITS format has, as can be seen in the lack of extra information - such as a description, units, or a data range - in the CrateData object:

print(acr.get_column("time"))
  Name:     time
  Shape:    (113138,)
  Datatype: float64
  Nsets:    113138
  Unit:
  Desc:
  Eltype:   Scalar
  Range:
     Min:   -1.7976931348623157e+308
     Max:   1.7976931348623157e+308

This is partly because the "simple" ASCII format was used when creating the asol.dat file. If opt kernel=text/dtf were used in the dmcopy call above then more metadata would be available, but the file itself would be harder to parse by code outside of CIAO.


Reading in all the blocks in a file

All the blocks can be accessed using the read_dataset routine:

ds = pycrates.read_dataset("acisf13770_repro_evt2.fits")
print(type(ds))
<class 'pycrates.cratedataset.CrateDataset'>
print(ds)
   Crate Dataset:
     File Name:         acisf13770_repro_evt2.fits
     Read-Write Mode:   rw
     Number of Crates:  7
       1)     Crate Type:        <IMAGECrate>
   Crate Name:        PRIMARY

       2)     Crate Type:        <TABLECrate>
   Crate Name:        EVENTS
   Ncols:             16
   Nrows:             85650

       3)     Crate Type:        <TABLECrate>
   Crate Name:        GTI3
   Ncols:             2
   Nrows:             1

       4)     Crate Type:        <TABLECrate>
   Crate Name:        GTI2
   Ncols:             2
   Nrows:             1

       5)     Crate Type:        <TABLECrate>
   Crate Name:        GTI1
   Ncols:             2
   Nrows:             2

       6)     Crate Type:        <TABLECrate>
   Crate Name:        GTI0
   Ncols:             2
   Nrows:             1

       7)     Crate Type:        <TABLECrate>
   Crate Name:        GTI6
   Ncols:             2
   Nrows:             1

[NOTE]
Note

Although read_dataset has read in enough of the data file to recognize the different blocks, and the columns in these blocks, it will not read in the actual data values until a method like get_column or routines like get_colvals and get_piximg are called.


Reading the metadata

The keyword values can be found either as a cratedata object

ekey = cr.get_key("EXPOSURE")
print(ekey)
   Name:   EXPOSURE
   Value:  28309.22660496
   Unit:   s
   Desc:   Exposure time
   Groups: ['BASIC', 'EVENT', 'O_1', 'O_B', 'O_E']

or as a scalar

print(cr.get_key_value("EXPOSURE"))
28309.22660496
[NOTE]
Note

The CIAO Data Model is not the same as the FITS system, and so some FITS header keys, particularly those that represent how column and image data are stored (such as TUNIT1 and TTYPE1), are not available as keywords. The values are available through other means - such as accessing the list of column names in a table crate, or the data type stored in the values field of a CrateData object.

Some of the metadata is accessible with the Subspace and History and Comments routines of the crates module.


Writing out a FITS block

Although it is possible to create a crate directly - with TABLECrate or IMAGECrate - it is easiest to read in a file, edit it, and then write it out. This can be done with the write_file routine or with the write method of the Crate class.

As an example, the ASCII file from above can have

  1. a units field added to the TIME column;
  2. a new column, set to the time written as an offset from the first time and stored in kiloseconds;
  3. and converted to FITS format.
acr = pycrates.read_file("asol.dat")
t = acr.get_column("time")
t.unit = "s"

newcol = pycrates.CrateData()
newcol.name = "dt"
newcol.unit = "ks"
newcol.desc = "Offset time"
newcol.values = (t.values - t.values[0]) / 1000
acr.add_column(newcol)

acr.write("asol.fits", clobber=True)

This creates the file:

% dmlist asol.fits cols

--------------------------------------------------------------------------------
Columns for Table Block asol.dat
--------------------------------------------------------------------------------

ColNo  Name                 Unit        Type             Range
   1   time                 s            Real8          -Inf:+Inf
   2   ra                                Real8          -Inf:+Inf
   3   dec                               Real8          -Inf:+Inf
   4   dt                   ks           Real8          -Inf:+Inf            Offset time

Writing out an ASCII file

The DataModel opt argument is used to specify the output file format in a write method call, write_file call, or write_dataset call.

acr.write("asol_copy.dat[opt kernel=text/dtf]", clobber=True)

Since this uses the CIAO Data Text format, which uses a FITS-like encoding in ASCII, the output includes much of the metadata, such as column units and descriptions, as the FITS format does. Using "opt kernel=text/simple" would produce an ASCII output file with no metadata.

% head asol_copy.dat
#TEXT/DTF
XTENSION= "TABLE   "
HDUNAME = "asol.dat"
EXTNAME = "asol.dat"
TFIELDS =                    4
TTYPE1  = "time    "
TFORM1  = "1D      "           / data format of field.
TUNIT1  = "s       "           / physical unit of field.
TTYPE2  = "ra      "
TFORM2  = "1D      "           / data format of field.
% dmlist asol_copy.dat cols

--------------------------------------------------------------------------------
Columns for Table Block asol.dat
--------------------------------------------------------------------------------

ColNo  Name                 Unit        Type             Range
   1   time                 s            Real8          -Inf:+Inf
   2   ra                                Real8          -Inf:+Inf
   3   dec                               Real8          -Inf:+Inf
   4   dt                   ks           Real8          -Inf:+Inf            Offset time

Writing out all the FITS blocks

The CrateDataset object also has a write method, or write_dataset call, that will write all the blocks out. For example

ds.write("copy.fits", clobber=True)

Create a column

A column is created by adding a CrateData object to a TABLECrate with the add_col routine or the add_column method of the TABLECrate object.

cd1 = pycrates.CrateData()
cd1.name = "zF"
cd1.values = [1, 3, -2.3]
cd1.desc = "The amazing zF value"
print(cd1)
  Name:     zF
  Shape:    (3,)
  Datatype: float64
  Nsets:    3
  Unit:     
  Desc:     The amazing zF value
  Eltype:   Scalar
  Range:    
     Min:   None
     Max:   None

The object can be added to a new crate, or to an existing one, as shown earlier:

out1 = pycrates.TABLECrate()
out1.name = "TESTDATA"
out1.add_column(cd1)
out1.write("test.fits")

The output is a FITS file (as no DM option was included in the output file name). The output file includes the explitly-set metadata - such as the block and column names, the number of columns and rows, and the data - as well as automaticlly-created ones (e.g. the use of Real8 rather than Real4 for the data type of the column).

% dmlist test.fits blocks
 
--------------------------------------------------------------------------------
Dataset: test.fits
--------------------------------------------------------------------------------
 
     Block Name                          Type         Dimensions
--------------------------------------------------------------------------------
Block    1: PRIMARY                        Null        
Block    2: TESTDATA                       Table         1 cols x 3        rows
% dmlist test.fits cols
 
--------------------------------------------------------------------------------
Columns for Table Block TESTDATA
--------------------------------------------------------------------------------
 
ColNo  Name                 Unit        Type             Range
   1   zF                                Real8          -Inf:+Inf            The amazing zf value
% dmlist test.fits data,clean
#  zF
                  1.0
                  3.0
                -2.30
[NOTE]
Note

Columns of different lengths can be added to a crate, as the crate will be re-sized to the maximum length and the other columns set to a default value (e.g. 0 or the empty string) for the extra rows.

Creating a vector column

The CIAO data model allows pairs of columns to be given a name, such as SKY representing the X and Y columns:

% dmlist acisf13770_repro_evt2.fits"[cols sky]" cols
 
--------------------------------------------------------------------------------
Columns for Table Block EVENTS
--------------------------------------------------------------------------------
 
ColNo  Name                 Unit        Type             Range
   1   sky(x,y)             pixel        Real4          0.50:     8192.50    sky coordinates
 
--------------------------------------------------------------------------------
World Coord Transforms for Columns in Table Block EVENTS
--------------------------------------------------------------------------------
 
ColNo    Name
1:    EQPOS(RA ) = (+286.7680)[deg] +TAN[(-0.000136667)* (sky(x)-(+4096.50))]
           (DEC)   (+4.5239  )           (+0.000136667)  (   (y) (+4096.50)) 

The pycrates create_vector_column routine is used to create such a vector column. It returns a single CrateData object which represents the two components of the vector, in this case COMBO(XC, YC):

cdvec = pycrates.create_vector_column("COMBO", ["XC", "YC"])
xc = np.asarray([12, 14, 15])
yc = np.asarray([2, 1, 3])
vec = np.vstack((xc, yc)).T
cdvec.values = vec
print(cdvec)
out1.add_column(cdvec)
out1.write("vtest.fits")
  Name:     COMBO
  Shape:    (3, 2)
  Datatype: int64
  Nsets:    3
  Unit:     
  Desc:     
  Eltype:   Vector 
     NumCpts:   2
     Cpts:      ['XC', 'YC']
  Range:    
     Min:   None
     Max:   None

Note that the data is arranged in pairs, that is (12, 2) then (14, 1) and ending with (14, 3), rather than (12, 14, 15) and then (2, 1, 3). There are several ways to achieve this with NumPy, such as the approach above of using the transpose of the vertical stack.

% dmlist vtest.fits cols
 
--------------------------------------------------------------------------------
Columns for Table Block TESTDATA
--------------------------------------------------------------------------------
 
ColNo  Name                 Unit        Type             Range
   1   zF                                Real8          -Inf:+Inf            The amazing zf value
   2   COMBO(XC,YC)                      Int4           -                    
% dmlist vtest.fits data
 
--------------------------------------------------------------------------------
Data for Table Block TESTDATA
--------------------------------------------------------------------------------
 
ROW    zF                   COMBO(XC,YC)
 
     1                  1.0               (12,2)
     2                  3.0               (14,1)
     3                -2.30               (15,4)

Removing a column

Columns can be removed from a crate with the delete_col routine or the delete_column method of the parent crate:

out1.delete_column("COMBO")

Create an image

Am image is also created with a CrateData object, but this time added to an IMAGECrate with the add_piximg routine or the add_image method of the IMAGECrate object.

cd2 = pycrates.CrateData()
cd2.name = "ivals"
cd2.values = np.arange(12).reshape(3, 4)
cd2.unit = "m"
cd2.desc = "A superlative image"
print(cd2)
  Name:     ivals
  Shape:    (3, 4)
  Datatype: int64
  Nsets:    3
  Unit:     m
  Desc:     A superlative image
  Eltype:   Array
     Ndim:     1
     Dimarr:   (4,)
  Range:    
     Min:   None
     Max:   None
out2 = pycrates.IMAGECrate()
out2.name = "IMGDATA"
out2.add_image(cd2)
print(out2)
   Crate Type:        <IMAGECrate>
   Crate Name:        IMGDATA

Rather than write out a single block, we use a CrateDataset to store both the previous blocks (out1 and out2), with the image block added first to avoid creating an empty PRIMARY block:

ds2 = pycrates.CrateDataset()
ds2.add_crate(out2)
ds2.add_crate(out1)
print(ds2)
ds2.write("ds.fits")
   Crate Dataset:
     File Name:         
     Read-Write Mode:   rw
     Number of Crates:  2
       1)     Crate Type:        <IMAGECrate>
   Crate Name:        IMGDATA

       2)     Crate Type:        <TABLECrate>
   Crate Name:        TESTDATA
   Ncols:             1
   Nrows:             3

The FITS format doesn't support all the metadata (e.g. the cd2.name field):

% dmlist ds.fits blocks
 
--------------------------------------------------------------------------------
Dataset: ds.fits
--------------------------------------------------------------------------------
 
     Block Name                          Type         Dimensions
--------------------------------------------------------------------------------
Block    1: IMGDATA                        Image      Int4(4x3)
Block    2: TESTDATA                       Table         1 cols x 3        rows
% dmlist ds.fits cols
 
--------------------------------------------------------------------------------
Columns for Image Block IMGDATA
--------------------------------------------------------------------------------
 
ColNo  Name                 Unit        Type             Range
   1   IMGDATA[4,3]         m            Int4(4x3)      -                    
 
--------------------------------------------------------------------------------
Physical Axis Transforms for Image Block IMGDATA
--------------------------------------------------------------------------------
 
Group# Axis# 
   1   1      X                    = #1 
   2   2      Y                    = #2 
 
--------------------------------------------------------------------------------
World Coordinate Axis Transforms for Image Block IMGDATA
--------------------------------------------------------------------------------
 
Group# Axis# 
[NOTE]
Note

Please be careful with the axis ordering of the data. Here a NumPy array of shape (3, 4) - which uses C ordering, so y then x - is written out as a 4 x 3 FITS array - which uses FORTRAN ordering (x then y).