Skip to content

mmfausnaugh/mapspec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mapspec

mapspec stands for Mcmc Algorithm for Parameters of SPECtra. It was originally designed to rescale astronomical spectra to a standard flux scale, using a method similar to that of van Groningen & Wanders 1992. Full details of the method are in Fausnaugh 2017, published in PASP (please cite this paper if you use the code). Along the way, several spectra analysis tools useful for dealing with bright/broad emission lines have been implemented.


Installation

mapspec is a fairly simple python package. Download it with git clone, cd to the mapspec directory, and run

python setup.py sdist
pip install -e ./

This installs to your full system---consider using a virtual environment (virtualenv or conda) or

python setup.py sdist
pip install ./ --user <MAPSPECDIR>

for more fine-grained control.

Alternatively, put the mapspec directory in your PYTHONPATH or your current working directory.

mapspec is compatible with python 2.7 and python 3.6 or later.

To test the installation, go to examples and try running the regression test script:

cd ../mapspec_test
bash run_test.sh

run_test.sh runs the regression test---it runs mapspec on some sample data and checks that the outputs match standardized files in mapspec_test/output_checks. The tests take about 10 minutes to run, and the results can be plotted with mapspec_test/plot_test.py. More details are in mapspec_test/README.

Quick Start

The workhorse script is script/do_map, which is available from the command line after installing mapspec and has a help function.

Here are the data files that you need:

  • A reference spectrum in 3 column ascii format (wavelength, flux, and flux error).
  • A file with wavelength regions for the emission line. This should have 3 rows: 1) the wavelength window of the emission line, 2) the wavelength window of the blueward continuum, and 3) the wavelength window of the redward continuum. Each line has 2 numbers: the blue edge of the wavelength region window and the red edge of the wavelength region window.
  • A list of files with spectra to rescale, 1 file per line. Like the reference, the spectra should be in 3 column ascii format.

Examples of all of these files are in examples/mapspec_test.

do_map runs a list of spectra through the rescaling model. This is the normal modus operandi for reverberation mapping studies, so do_map may have everything that most users need. The output (using the Gauss-Hermite smoothing kernel) will be saved in files called 'scale.h._<input_spectrum_name>'.

To see how do_map works, you can look at the source in mapspec/scripts/do_map.

do_map also does some simple model comparisons (Gaussian smoothing vs. Gauss-Hermite smoothing). The file mapspec.params compares the chi^2 from two different different rescaling models, and two output spectra are saved by default ('scale_<input_spectrum_name>' for Gaussian smoothing and 'scale.h._<input_spectrum_name>' for Gauss-Hermite smoothing).


Outline

mapspec is object-oriented and is designed to be modular and highly extensible. It makes heavy use of numpy and scipy libraries, and was designed to easily integrate with these packages.

There are two main parts of the package: First, general-usage spectrum utilities, which are bundled in mapspec/spectrum.py. Second, the rescaling procedure, which is kept in mapspec/mapspec.py.

Some helpful scripts for constructing a reference spectrum are also provided----these are make_ref and smooth_ref. Check the help menus and internal comments for details. By default, they are installed to your system and available on the command line.

We begin by describing (with illustrative examples) the spectrum utilities.


Spectrum

The primary object is the Spectrum class. Spectrum has 3 fundamental attributes: wavelength (Spectrum.wv), flux (Spectrum.f), and flux error (or measurement error, Spectrum.ef). Each attribute is a numpy array, so all the functionality of numpy/scipy is preserved. For example, to get the mean flux-level:

from mapspec.spectrum import Spectrum
my_spec = Spectrum()
print my_spec.f.mean()

or to get the number of pixels:

print my_spec.wv.size

(to actually put data in the Spectrum object, see "Reading Data" below). Spectrum also has many useful methods to interpolate, rebin, extinction-correct, and smooth (or log-smooth) the data. For example, this will rebin the spectrum to twice the original wavelength spacing:

import numpy as np
dx = my_spec.wv[1] - my_spec.wv[0] #assumes even wavelength spacing throughout
xnew = np.arange(my_spec.wv.min(), my_spec.wv.max() + 2.*dx, 2.*dx)
my_spec.rebin(xnew)

Emission Lines

The next object is the EmissionLine class. EmissionLine inherits from Spectrum, so it can be used in approximately the same way. To instantiate an EmissionLine, you need to specify the line window and surrounding continuum:

from mapspec.spectrum import EmissionLine
oiii = [ [5065, 5115],     #OIII 5007, at redshift 1.018
         [5059, 5065],     #blue continuum
         [5115, 5129]  ]   # red continuum
my_line = EmissionLine(my_spec, oiii[0], [oiii[1],oiii[2]] )

At instantization, EmissionLine fits a linear model to the local continuum (oiii[1] and oiii[2]) and subtracts the model underneath the emission line. That is, my_line.wv is the wavelengths of my_spec between oiii[0,0] and oiii[0,1] and my_line.f is the corresponding section of my_spec.f minus the continuum fit.

EmissionLine also has other useful methods, such as my_line.integrate_line(), which will use scipy.integrate.simps to return the total line flux, or my_line.fwhm(), which will return the full-width-at-half-maximum of the line.


Line Model

The last class is LineModel. At instantization, LineModel fits a function to an input EmissionLine:

from mapspec.spectrum import LineModel
my_lmodel = LineModel(my_line, func='gauss-hermite')

The raison d'être of LineModel is to generate flux predictions at arbitrary wavelengths. This is accomplished by overloading the class's call method, so you can think of LineModel as a kind of function:

xeval = np.linspace( my_line.wv.min(), my_line.wv.max(), 500 )
fpredict = my_lmodel(xeval)

You could use these predictions to, for example, subtract the line from the original spectrum:

mask = ( my_spec.wv > oiii[0,0])*(my_spec.wv < oiii[0,1] )
my_spec.f[mask] -= my_lmodel(my_spec.wv[mask])

If the parameters of the fit are of interest, you can access then with my_lmodel.p (and the covariance matrix, for errors, is stored in my_lmodel.covar), but you will have to check the individual functions/methods to see which parameter is which.


Reading Data

Input from a file is handled with inheritance. If you know what format your data are in, the following classes know how to read these formats and return fully functioning Spectrum objects:

from mapspec.spectrum import TextSpec, TextSpec_2c, FitsSpec

s1 = TextSpec('mockdata.txt')  #for 3 column format
s2 = TestSpec_2c('mockdata_2col.txt') #for 2 columns, assumes no errors
s3 = FitsSpec('mockdata.fits')

FitsSpec contains many keywords in order to deal with the vast diversity of .fits formats:

s3 = FitsSpec('mockdata.fits', extension = 1, data_axis = 1, error_axis = 2,
              x1key = 'CRVAL1',
              dxkey = 'CD1_1',
              p1key = 'CRPIX1'
              )

Two files with test data from NGC 5548 (3 column ascii) are in examples (test.LBT.dat and test.MDM.dat---test.dat is a copy of test.LBT.dat), and 6 spectra for rescaling are in examples/mapspec_test


Rescaling Spectra

The tools for rescaling spectra are in mapspec/mapspec.py. Most of the work is done by the RescaleModel class:

from mapspec.spectrum import *
from mapspec.mapspec import RescaleModel

sref = TextSpec('ref.txt')
lref = EmissionLine(sref, oiii[0], [ oiii[1], oiii[2] ] )

my_model = RescaleModel(lref, kernel = 'Gauss')

my_model will take an input emission line and try to align it with lref. The model consists of a wavelength shift, rescaling factor, and smoothing---the form of the smoothing kernal is specified by the kernel keyword (in the above example, a simple Gaussian).

The parameters of the model are stored in a dictionary, so they are easy to access. If you know what parameters you want, you can immediate apply the model to the data:

#shift by 0.25 x-units, multiply by 2, and smooth with kernel of width 1 x-units
my_model.p = {'shift': 0.25, 'scale': 2.0, 'width': 1.0} 
sout, lost_pix = my_model.output(my_spec)

lost_pix is a boolean array that tells you how much of the original spectrum is lost---you loose at least one pixel on either side for any non-zero shift. If you don't care about the original wavelength grid, it is suggested to change the wavelengths in place (i.e., my_spec.wv -= shift).

Usually, you will want to optimize these parameters so that they match the reference. To do this, a function metro_hast is included:

from mapspec.mapspec import metro_hast
chi2, p_best, frac_accept = metro_hast(5000, my_line, my_model)

This will run an MCMC for 5000 steps, and try to optimize my_model.p so that my_line matches lref (as a reminder, lref was specified when instantiating my_model---any RescaleModel object should probably have its own reference line). chi2 is the best (minimum) chi^2, p_best are the corresponding parameters, and frac_accept is the acceptance fraction of the chain.

You then apply the model as above:

my_model.p = p_best
sout,lost_pix =  my_model.output(my_spec)

metro_hast has some other useful options:

chi2, p_best, frac_accept, p_chain = metro_hast(5000, my_line, my_model, keep = True)

This will return the MCMC chain of parameters in p_chain. p_chain is actually its own object (mapspec.mapspec.Chain), but it is little more than a glorified NxM array that knows how to plot itself (it also stores the log-likelihood for each set of parameters in a separate attribute). Also

plt.ion()
chi2, p_best, frac_accept, p_chain = metro_hast(5000, my_line, my_model, plot = True)

will plot the MCMC chain as it runs.

Details

Warning on Interpolation

At the moment, only linear interpolation supports a rigorous error propagation. This includes calculating the covariance matrix due to both interpolation and smoothing when rescaling the data.

Other interpolation methods are supported---any 'kind' keyword for scipy.interpolate.interp1d can be used. There is also an implementation of bsplines using scipy.interpolate, and a custom built sinc interpolator (with a Lanczos window by default).

These alternative methods do not support full error propagation. For now, they will perform the same operation as called for on the flux spectrum, but on the square of the error spectrum (i.e., the variance). This usually results in errors similar to the input and roughly preserves the fractional uncertainty. However, we note that this is not strictly correct, and may be a feature that we improve in the future.

Calculation of the Likelihood and Fitting Procedure

mapspec aligns the data and the model by minimzing: (data - model)^2/error^2, where data is the input line, model is the reference value chosen at instantization, and error^2 = (reference error)^2 + (data error)^2. In other words, mapspec takes a maximum likelihood approach, assuming normally distributed residuals and independent uncertainties on the data and reference.

Because of wavelength shifts and edge-effects from convolution, there can be problems aligning the edges of the fitting region---in order to censor these points, it would be necessary to change the amount of data, and therefore the number of degrees-of-freedom, during the fit. To prevent this while mitigating the edge-effects, mapspec ignores 10% of the data (5% on each end) when calculating the likelihood. Although this is not an optimal solution, it does produces reasonable results and appears to be a good compromise between correctness, simplicity, and practicability.

It is therefore suggested to choose the fitting region (line wavelengths) slightly larger than would naively be expected for isolating the emission line flux. It is also suggested to remove large shifts (greater than 1 pixel) before performing the fit. A convenience function mapspec.mapspec.get_cc is provided to cross correlate two spectra---see do_map.py for an example.

In the Bayesian framework, priors are always added into the likelihood calculation. mapspec assumes uniform priors on all parameters, so the priors do not explicitly enter into the calculation of the log-likelihood. Note that a flat prior is informative for the case of the scaling parameter---however, because the rescaling is a factor of a few (not many orders of magnitude), this situation makes little difference.

When using a Gauss-Hermite Kernel, h3 and h4 are constrained to be > -0.3 and < 0.3; experimentation shows that this range is sufficient to produce a wide range of emission line shapes. The finite interval is enforced by setting the prior probability to 0 outside of this interval. Similarly, strange behavior will occur if the kernel width drops below the spectral resolution (a kind of undersampling/aliasing effect). A minimum kernel width is imposed at 1/2 of the wavelength spacing of the reference data.

RescaleModel objects have the option to add priors, either through a function provided by the user or from posterior distributions stored in Chain objects. See do_map.py for an example.


Methods

Spectrum

Object Method args Description
Spectrum __init__ style = 'linear' Change the interpolation style at instantization (implicitly calls set_interp).
Spectrum restore Restore attributes (Spectrum.wv,Spectrum.f, and Spectrum.ef) to values when spectrum was instantiated.
Spectrum interp xnew Returns interpolated flux and flux errors at location of array xnew.
Spectrum set_interp kwargs Change interpolation style (see style below). Keywords for custom b-splines and sinc interpolation are also provided here.
Spectrum rebin xnew Rebins the spectrum to wavelengths xnew. Modifies Spectrum.wv,Spectrum.f, and Spectrum.ef.
Spectrum extinction_correct E_BV, RV De-extinguish the spectrum for a CCM89 extinction law, for a given E(B-V) (E_BV) and R_V (RV). Modifies Spectrum.f, and Spectrum.ef.
Spectrum smooth width,name Smooths the spectrum with a kernel of width width specified with name (calculated with scipy.signal.get_window). Modifies Spectrum.f, and Spectrum.ef. Edge-effects are treated by replacing with the original spectrum (a number of pixels equal to width on both edges).
Spectrum velocity_smooth v_width Smooths the spectrum with a velocity dispersion of width v_wdith (in km/s) by rebinning evenly in log wavelength. Modifies Spectrum.f, and Spectrum.ef. Converts back to original wavelengths, but cannot support sinc interpolation (unevenly spaced wavelengths).
Spectrum wv Attribute numpy array of wavelengths.
Spectrum f Attribute numpy array of fluxes.
Spectrum ef Attribute numpy array of flux errors.
Spectrum style Attribute String with instructions for set_interp. Options are anything supported by scipy.interpolate.interp1d. Also available is a custom sinc interpolator (style = 'sinc'), which uses convolution in real space and a Lanczos window to prevent ringing (very slow). Bsplines (style= 'bspline') are also available (using a custom class to interface with splrep and splev from scipy.interpolate ).

EmissionLine

Object Method args Description
EmissionLine __init__ Spec, window, cwindow Spec is a Spectrum object from which to extract the EmissionLine. window gives the wavelength region of the line (inclusive, blue edge first). cwindow is a 2x2 list/array that specifies the continuum. Each row is a continuum region (exclusive), which must have the blue edge first, but the blue and red regions may be in any order.
EmissionLine integrate_line Returns total line flux using scipy.integrate.simps.
EmissionLine equivalent_width Returns integrated flux divided by mean continuum.
EmissionLine wv_mean Returns first moment of the line.
EmissionLine wv_median Returns median wavelength (wavelength that splits the integrated flux in 2).
EmissionLine dispersion Returns second moment of the line.
EmissionLine mad Returns Mean Absolute Deviation.
EmissionLine fwhm center Returns a full-width-at-half-maximum, following the procedure in Peterson et al. 2004; center is the central wavelength, to check if the line is double-peaked. Actually returns a tuple with fwhm, width on blue side, width on red-side, and a boolean for which True indicates double-peaked.
EmissionLine ip_wv frac Returns inter-percentile region in wavelength units. The inter-percentile region contains frac of the total flux, where frac is between 0 and 1. Actually returns a tuple with the inter-percentile region, the corresponding blue wavelength, and the corresponding red wavelength.
EmissionLine ipv frac,center Returns the inter-percentile region in velocity units. frac is defined as above, center is used to convert to velocities. Actually returns a tuple with the inter-pecentile region, blue width, and red width. Assumes km/s.
EmissionLine wv Attribute numpy array of wavelengths (subset of Spec.wv within window).
EmissionLine f Attribute numpy array of fluxes (matching subset of Spec.f minus continuum flux).
EmissionLine ef Attribute numpy array of flux errors (as EmissionLine.f, but error propagation from the continuum fit is included).
EmissionLine cf Attribute Continuum flux underneath the line.
EmissionLine ecf Attribute Error on continuum flux. This is added in quadrature to the original measurement errors in the line wavelength region.

LineModel

Object Method args Description
LineModel __init__ EmLine,func='gaussian',window=None,nline = 1,floating=False,linedata=None EmLine is an EmissionLine to which the model is fit. func is a string that specifies the functional form to fit to the line; can be 'gaussian', 'gauss-hermite', 'lorentzian', 'approx-voig' (sum of lorentzian and gaussian), or 'data' (empirical template). window is a boolean array that masks the input EmissionLine (True indicates that the pixel is included in the fit). nline indicates the number of lines (components) used in the fit. floating=True adds a parameter to the model that allows a constant offset or 'pedestal' flux. linedata is an EmissionLine that serves as a template if func = 'data'. See examples in examples/test_linefit.py.
LineModel __call__ x Returns fitted function evaluated at input x wavelengths (i.e., flux-predictions, x is an array).
LineModel chi2 Returns the value of chi^2 from the initial fit.
LineModel plot_components x,ax,cuse Plots individual components (if nline is > 1) on axis ax in color cuse at abcissas x.
LineModel p Attribute Parameters of the fitted model.
LineModel covar Attribute Covariance matrix of fitted parameters.
LineModel wv Attribute Saves EmLine.wv.
LineModel lf Attribute Saves EmLine.f.
LineModel elf Attribute Saves EmLine.ef.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages