ModelBase

class mcalf.models.ModelBase(*, original_wavelengths=None, constant_wavelengths=None, delta_lambda=0.05, sigma=None, prefilter_response=None, prefilter_ref_main=None, prefilter_ref_wvscl=None, output=None, config=None)[source]

Bases: object

Base class for spectral line model fitting.

Warning

This class should not be used directly. Use derived classes instead.

Parameters:

original_wavelengths (array_like) – One-dimensional array of wavelengths that correspond to the uncorrected spectral data.
stationary_line_core (float, optional, default=None) – Wavelength of the stationary line core.
constant_wavelengths (array_like, ndim=1, optional, default= see description) – The desired set of wavelengths that the spectral data should be rescaled to represent. It is assumed that these have constant spacing, but that may not be a requirement if you specify your own array. The default value is an array from the minimum to the maximum wavelength of original_wavelengths in constant steps of delta_lambda, overshooting the upper bound if the maximum wavelength has not been reached.
delta_lambda (float, optional, default=0.05) – The step used between each value of constant_wavelengths when its default value has to be calculated.
sigma (optional, default=None) – Sigma values used to weight the fit. This attribute should be set by a child class of ModelBase.
prefilter_response (array_like, length=n_wavelengths, optional, default= see note) – Each constant wavelength scaled spectrum will be corrected by dividing it by this array. If prefilter_response is not given, and prefilter_ref_main and prefilter_ref_wvscl are not given, prefilter_response will have a default value of None.
prefilter_ref_main (array_like, optional, default= None) – If prefilter_response is not specified, this will be used along with prefilter_ref_wvscl to generate the default value of prefilter_response.
prefilter_ref_wvscl (array_like, optional, default=None) – If prefilter_response is not specified, this will be used along with prefilter_ref_main to generate the default value of prefilter_response.
config (str, optional, default=None) – Filename of a .yml file (relative to current directory) containing the initialising parameters for this object. Parameters provided explicitly to the object upon initialisation will override any provided in this file. All (or some) parameters that this object accepts can be specified in this file, except neural_network and config. Each line of the file should specify a different parameter and be formatted like emission_guess: ‘[-inf, wl-0.15, 1e-6, 1e-6]’ or original_wavelengths: ‘original.fits’ for example. When specifying a string, use ‘inf’ to represent np.inf and ‘wl’ to represent stationary_line_core as shown. If the string matches a file, mcalf.utils.misc.load_parameter() is used to load the contents of the file.
output (str, optional, default=None) – If the program wants to output data, it will place it relative to the location specified by this parameter. Some methods will only save data to a file if this parameter is not None. Such cases will be documented where relevant.

original_wavelengths

One-dimensional array of wavelengths that correspond to the uncorrected spectral data.

Type:: array_like

stationary_line_core

Wavelength of the stationary line core.

Type:: float, optional, default=None

neural_network

The neural network classifier object that is used to classify spectra. This attribute should be set by a child class of ModelBase.

Type:: optional, default=None

constant_wavelengths

The desired set of wavelengths that the spectral data should be rescaled to represent. It is assumed that these have constant spacing, but that may not be a requirement if you specify your own array. The default value is an array from the minimum to the maximum wavelength of original_wavelengths in constant steps of delta_lambda, overshooting the upper bound if the maximum wavelength has not been reached.

Type:: array_like, ndim=1, optional, default= see description

sigma

Sigma values used to weight the fit. This attribute should be set by a child class of ModelBase.

Type:: optional, default=None

prefilter_response

Each constant wavelength scaled spectrum will be corrected by dividing it by this array. If prefilter_response is not given, and prefilter_ref_main and prefilter_ref_wvscl are not given, prefilter_response will have a default value of None.

Type:: array_like, length=n_wavelengths, optional, default= see note

output

If the program wants to output data, it will place it relative to the location specified by this parameter. Some methods will only save data to a file if this parameter is not None. Such cases will be documented where relevant.

Type:: str, optional, default=None

array

Array holding spectra.

Type:: numpy.ndarray, dimensions are [‘time’, ‘row’, ‘column’, ‘spectra’]

background

Array holding spectral backgrounds.

Type:: numpy.ndarray, dimensions are [‘time’, ‘row’, ‘column’]

Attributes Summary

`default_kwargs`
`default_modelbase_kwargs`
`stationary_line_core`

Methods Summary

`_curve_fit`(model, spectrum, guess, sigma, ...)	`scipy.optimize.curve_fit()` wrapper with error handling.
`_fit`(spectrum[, classification, spectrum_index])	Fit a single spectrum for the given profile or classification.
`_get_time_row_column`([time, row, column])	Validate and infer the time, row and column index.
`_load_data`(array[, names, target])	Load a specified array into the model object.
`_set_prefilter`()	Set the prefilter_response parameter.
`_validate_base_attributes`()	Validate some of the object's attributes.
`classify_spectra`([time, row, column, ...])	Classify the specified spectra.
`fit`([time, row, column, spectrum, ...])	Fits the model to specified spectra.
`fit_spectrum`(spectrum, **kwargs)	Fits the specified spectrum array.
`get_spectra`([time, row, column, spectrum, ...])	Gets corrected spectra from the spectral array.
`load_array`(array[, names])	Load an array of spectra.
`load_background`(array[, names])	Load an array of spectral backgrounds.
`test`(X, y)	Test the accuracy of the trained neural network.
`train`(X, y)	Fit the neural network model to spectra matrix X and spectra labels y.

Attributes Documentation

default_kwargs = {'constant_wavelengths': None, 'delta_lambda': 0.05, 'original_wavelengths': None, 'output': None, 'prefilter_ref_main': None, 'prefilter_ref_wvscl': None, 'prefilter_response': None, 'sigma': None, 'stationary_line_core': None}

default_modelbase_kwargs = {'constant_wavelengths': None, 'delta_lambda': 0.05, 'original_wavelengths': None, 'output': None, 'prefilter_ref_main': None, 'prefilter_ref_wvscl': None, 'prefilter_response': None, 'sigma': None, 'stationary_line_core': None}

stationary_line_core

Methods Documentation

_curve_fit(model, spectrum, guess, sigma, bounds, x_scale, time=None, row=None, column=None, **kwargs)[source]

scipy.optimize.curve_fit() wrapper with error handling.

Passes a certain set of parameters to the scipy.optimize.curve_fit() function and catches some typical errors, presenting a more specific warning message.

Parameters:

model (callable) – The model function, f(x, …). It must take the ModelBase.constant_wavelenghts attribute as the first argument and the parameters to fit as separate remaining arguments.
spectrum (array_like) – The dependent data, with length equal to that of the ModelBase.constant_wavelengths attribute.
guess (array_like, optional) – Initial guess for the parameters to fit.
sigma (array_like) – Determines the uncertainty in the spectrum. Used to weight certain regions of the spectrum.
bounds (2-tuple of array_like) – Lower and upper bounds on each parameter.
x_scale (array_like) – Characteristic scale of each parameter.
time (optional, default=None) – The time index for error handling.
row (optional, default=None) – The row index for error handling.
column (optional, default=None) – The column index for error handling.

Returns:

fitted_parameters (numpy.ndarray, length=n_parameters) – The parameters that recreate the model fitted to the spectrum.
success (bool) – Whether the fit was successful or an error had to be handled.

See also

fit: General fitting method.
fit_spectrum: Explicit spectrum fitting method.

Notes

More details can be found in the documentation for scipy.optimize.curve_fit() and scipy.optimize.least_squares().

_fit(spectrum, classification=None, spectrum_index=None, **kwargs)[source]

Fit a single spectrum for the given profile or classification.

Warning

This call signature and docstring specify how the _fit method must be implemented in each subclass of ModelBase. It is not implemented in this class.

Parameters:

spectrum (numpy.ndarray, ndim=1, length=n_constant_wavelengths) – The spectrum to be fitted.
classification (int, optional, default=None) – Classification to determine the fitted profile to use.
spectrum_index (array_like or list or tuple, length=3, optional, default=None) – The [time, row, column] index of the spectrum provided. Only used for error reporting.

Returns:

result – Outcome of the fit returned in a mcalf.models.FitResult object.

Return type:

mcalf.models.FitResult

See also

fit: The recommended method for fitting spectra.
mcalf.models.FitResult: The object that the fit method returns.

Notes

This method is called for each requested spectrum by the models.ModelBase.fit() method. This is where most of the adjustments to the fitting method should be made. See other subclasses of models.ModelBase for examples of how to implement this method in a new subclass. See models.ModelBase.fit() for more information on how this method is called.

_get_time_row_column(time=None, row=None, column=None)[source]

Validate and infer the time, row and column index.

Takes any time, row and column index given and if any are not specified, they are returned as 0 if the spectral array only has one value at its dimension. If there are multiple and no index is specified, an error is raised due to the ambiguity.

Parameters:

time (optional, default=None) – The time index.
row (optional, default=None) – The row index.
column (optional, default=None) – The column index.

Returns:

time – The corrected time index.
row – The corrected row index.
column – The corrected column index.

See also

mcalf.utils.misc.make_iter: Make a variable iterable.

Notes

No type checking is done on the input indices so it can be anything but in most cases will need to be either an integer or iterable. The mcalf.utils.misc.make_iter() function can be used to make indices iterable.

_load_data(array, names=None, target=None)[source]

Load a specified array into the model object.

Load array with dimension names names into the attribute specified by target.

Parameters:

array (numpy.ndarray) – The array to load.
names (list of str, length=`array.ndim`) – List of dimension names for array. Valid dimension names depend on target.
target ({'array', 'background'}) – The attribute to load the array into.

See also

load_array: Load and array of spectra.
load_background: Load an array of spectral backgrounds.

_set_prefilter()[source]

Set the prefilter_response parameter.

Deprecated since version 0.2: Prefilter response correction code, and prefilter_response, prefilter_ref_main and prefilter_ref_wvscl, may be removed in a later release of MCALF. Spectra should be fully processed before loading into MCALF.

This method should be called in a child class once stationary_line_core has been set.

_validate_base_attributes()[source]

Validate some of the object’s attributes.

Raises:: ValueError – To signal that an attribute is not valid.

classify_spectra(time=None, row=None, column=None, spectra=None, only_normalise=False)[source]

Classify the specified spectra.

Will also normalise each spectrum such that its intensity will range from zero to one.

Parameters:

time (int or iterable, optional, default=None) – The time index. The index can be either a single integer index or an iterable. E.g. a list, a numpy.ndarray, a Python range, etc. can be used.
row (int or iterable, optional, default=None) – The row index. See comment for time parameter.
column (int or iterable, optional, default=None) – The column index. See comment for time parameter.
spectra (numpy.ndarray, optional, default=None) – The explicit spectra to classify. If only_normalise is False, this must be 1D. However, if only_normalise is set to true, spectra can be of any dimension. It is assumed that the final dimension is wavelengths, so return shape will be the same as spectra, except with no final wavelengths dimension.
only_normalise (bool, optional, default=False) – Whether the single spectrum given in spectra should not be interpolated and corrected. If set to true, the only processing applied to spectra will be a normalisation to be in range 0 to 1.

Returns:

classifications – Array of classifications with the same time, row and column indices as spectra.

Return type:

numpy.ndarray

See also

train: Train the neural network.
test: Test the accuracy of the neural network.
get_spectra: Get processed spectra from the objects array attribute.

Examples

Create a basic model:

>>> import mcalf.models
>>> import numpy as np
>>> wavelengths = np.linspace(8542.1, 8542.2, 30)
>>> model = mcalf.models.ModelBase(original_wavelengths=wavelengths)

Load a trained neural network:

>>> import pickle
>>> pkl = open('trained_neural_network.pkl', 'rb')  
>>> model.neural_network = pickle.load(pkl)  

Classify an individual spectrum:

>>> spectrum = np.random.rand(30)
>>> model.classify_spectra(spectra=spectrum)  
array([2])

When only_normalise=True, classify an n-dimensional spectral array:

>>> spectra = np.random.rand(5, 4, 3, 2, 30)
>>> model.classify_spectra(spectra=spectra, only_normalise=True).shape  
(5, 4, 3, 2)

Load spectra from a file and classify:

>>> from astropy.io import fits
>>> spectra = fits.open('spectra_0000.fits')[0].data  
>>> model.load_array(spectra, names=['wavelength', 'column', 'row'])  
>>> model.classify_spectra(column=range(10, 15), row=[7, 16])  
array([[[0, 2, 0, 3, 0],
        [4, 0, 1, 0, 0]]])

fit(time=None, row=None, column=None, spectrum=None, classifications=None, background=None, n_pools=None, **kwargs)[source]

Fits the model to specified spectra.

Fits the model to an array of spectra using multiprocessing if requested.

Parameters:

time (int or iterable, optional, default=None) – The time index. The index can be either a single integer index or an iterable. E.g. a list, numpy.ndarray, a Python range, etc. can be used.
row (int or iterable, optional, default=None) – The row index. See comment for time parameter.
column (int or iterable, optional, default=None) – The column index. See comment for time parameter.
spectrum (numpy.ndarray, ndim=1, optional, default=None) – The explicit spectrum to fit the model to.
classifications (int or array_like, optional, default=None) – Classifications to determine the fitted profile to use. Will use neural network to classify them if not. If a multidimensional array, must have the same shape as [time, row, column]. Dimensions that would have length of 1 can be excluded.
background (float, optional, default=None) – If provided, this value will be subtracted from the explicit spectrum provided in spectrum. Will not be applied to spectra found from the indices, use the load_background() method instead.
n_pools (int, optional, default=None) – The number of processing pools to calculate the fitting over. This allocates the fitting of different spectra to n_pools separate worker processes. When processing a large number of spectra this will make the fitting process take less time overall. It also distributes such that each worker process has the same ratio of classifications to process. This should balance out the workload between workers. If few spectra are being fitted, performance may decrease due to the overhead associated with splitting the evaluation over separate processes. If n_pools is not an integer greater than zero, it will fit the spectrum with a for loop.
**kwargs – Extra keyword arguments to pass to _fit().

Returns:

result – Outcome of the fits returned as a list of FitResult objects.

Return type:

list of FitResult, length=n_spectra

Examples

Create a basic model:

>>> import mcalf.models
>>> import numpy as np
>>> wavelengths = np.linspace(8541.3, 8542.7, 30)
>>> model = mcalf.models.ModelBase(original_wavelengths=wavelengths)

Set up the neural network classifier:

>>> model.neural_network = ...  # load an untrained classifier  
>>> model.train(...)  
>>> model.test(...)  

Load the spectra and background array:

>>> model.load_array(...)  
>>> model.load_background(...)  

Fit a subset of the loaded spectra, using 5 processing pools:

>>> fits = model.fit(row=range(3, 5), column=range(200), n_pools=5)  
>>> fits  
['Successful FitResult with ________ profile of classification 0',
 'Successful FitResult with ________ profile of classification 2',
 ...
 'Successful FitResult with ________ profile of classification 0',
 'Successful FitResult with ________ profile of classification 4']

Merge the fit results into a FitResults object:

>>> results = mcalf.models.FitResults((500, 500), 8)
>>> for fit in fits:  
...     results.append(fit)  

See fit_spectrum() examples for how to manually providing a spectrum to fit.

fit_spectrum(spectrum, **kwargs)[source]

Fits the specified spectrum array.

Passes the spectrum argument to the fit() method. For easily iterating over a list of spectra.

Parameters:

spectrum (numpy.ndarray, ndim=1) – The explicit spectrum.
**kwargs – Extra keyword arguments to pass to fit().

Returns:

result – Result of the fit.

Return type:

FitResult

See also

fit: General fitting method.

Examples

Create a basic model:

>>> import mcalf.models
>>> import numpy as np
>>> wavelengths = np.linspace(8541.3, 8542.7, 30)
>>> model = mcalf.models.ModelBase(original_wavelengths=wavelengths)

Quickly provide a spectrum and fit it. Remember that the model must be optimised for the spectra that it is asked to fit. In this example the neural network is not called upon to classify the provided spectrum as a classification is provided directly:

>>> spectrum = np.random.rand(30)
>>> model.fit_spectrum(spectrum, classifications=0, background=142.2)  
Successful FitResult with ________ profile of classification 0

As the spectrum is provided manually, any background value must also be provided manually. Alternatively, the background can be subtracted before passing to the function, as by default, no background is subtracted:

>>> model.fit_spectrum(spectrum - 142.2, classifications=0)  
Successful FitResult with ________ profile of classification 0

get_spectra(time=None, row=None, column=None, spectrum=None, correct=True, background=False)[source]

Gets corrected spectra from the spectral array.

Takes either a set of indices or an explicit spectrum and optionally applied corrections and background removal.

Parameters:

time (int or iterable, optional, default=None) – The time index. The index can be either a single integer index or an iterable. E.g. a list, a numpy.ndarray, a Python range, etc. can be used.
row (int or iterable, optional, default=None) – The row index. See comment for time parameter.
column (int or iterable, optional, default=None) – The column index. See comment for time parameter.
spectrum (ndarray of ndim=1, optional, default=None) – The explicit spectrum. If provided, time, row, and column are ignored.
correct (bool, optional, default=True) – Whether to reinterpolate the spectrum and apply the prefilter correction (if exists).
background (bool, optional, default=False) – Whether to include the background in the outputted spectra. Only removes the background if the relevant background array has been loaded. Does not remove background is processing an explicit spectrum.

Returns:

spectra

Return type:

ndarray

Examples

Create a basic model:

>>> import mcalf.models
>>> import numpy as np
>>> wavelengths = np.linspace(8541.3, 8542.7, 30)
>>> model = mcalf.models.ModelBase(original_wavelengths=wavelengths)

Provide a single spectrum for processing, and notice output is 1D:

>>> spectrum = model.get_spectra(spectrum=np.random.rand(30))
>>> spectrum.ndim
1

Load an array of spectra:

>>> spectra = np.random.rand(3, 4, 30)
>>> model.load_array(spectra, names=['column', 'row', 'wavelength'])

Extract a single (unprocessed) spectrum from the loaded array, and notice output is 4D:

>>> spectrum = model.get_spectra(row=1, column=0, correct=False)
>>> spectrum.shape
(1, 1, 1, 30)
>>> (spectrum[0, 0, 0] == spectra[0, 1]).all()
True

Extract an array of spectra, and notice output is 4D, and with dimensions time, row, column, wavelength regardless of the original dimensions and order:

>>> spectrum = model.get_spectra(row=range(4), column=range(3))
>>> spectrum.shape
(1, 4, 3, 30)

Notice that the time index can be excluded, as the loaded array only represents a single time. However, in this case leaving out row or column results in an error as it is ambiguous:

>>> spectrum = model.get_spectra(row=range(4))
Traceback (most recent call last):
 ...
ValueError: column index must be specified as multiple indices exist

load_array(array, names=None)[source]

Load an array of spectra.

Load array with dimension names names into the array parameter of the model object.

Parameters:

array (numpy.ndarray, ndim>1) – An array containing at least two spectra.
names (list of str, length=`array.ndim`) – List of dimension names for array. Valid dimension names are ‘time’, ‘row’, ‘column’ and ‘wavelength’. ‘wavelength’ is a required dimension.

See also

load_background: Load an array of spectral backgrounds.

Examples

Create a basic model:

>>> import mcalf.models
>>> from astropy.io import fits
>>> wavelengths = [0.0, 10.0, 20.0, 30.0, 40.0, 50.0]
>>> model = mcalf.models.ModelBase(original_wavelengths=wavelengths)

Load spectra from a file:

>>> spectra = fits.open('spectra_0000.fits')[0].data  
>>> model.load_array(spectra, names=['wavelength', 'column', 'row'])  

load_background(array, names=None)[source]

Load an array of spectral backgrounds.

Load array with dimension names names into background parameter of the model object.

Parameters:

array (numpy.ndarray, ndim>0) – An array containing at least two backgrounds.
names (list of str, length=`array.ndim`) – List of dimension names for array. Valid dimension names are ‘time’, ‘row’ and ‘column’.

See also

load_array: Load and array of spectra.

Examples

Create a basic model:

>>> import mcalf.models
>>> from astropy.io import fits
>>> wavelengths = [0.0, 10.0, 20.0, 30.0, 40.0, 50.0]
>>> model = mcalf.models.ModelBase(original_wavelengths=wavelengths)

Load background array from a file:

>>> background = fits.open('background_0000.fits')[0].data  
>>> model.load_background(background, names=['column', 'row'])  

test(X, y)[source]

Test the accuracy of the trained neural network.

Prints a table of results showing:

the percentage of predictions that equal the target labels;
the average classification deviation and standard deviation from the ground truth classification for each labelled classification;
the average classification deviation and standard deviation overall.

If the model object has an output parameter, it will create a CSV file (output/neural_network/test.csv) listing the predictions and ground truth data.

Parameters:

X (numpy.ndarray or sparse matrix, shape=(n_spectra, n_wavelengths)) – The input spectra.
y (numpy.ndarray, shape= (n_spectra,) or (n_spectra, n_outputs)) – The target class labels.

See also

train: Train the neural network.

train(X, y)[source]

Fit the neural network model to spectra matrix X and spectra labels y.

Calls the fit() method on the neural_network parameter of the model object.

Parameters:

X (numpy.ndarray or sparse matrix, shape=(n_spectra, n_wavelengths)) – The input spectra.
y (numpy.ndarray, shape= (n_spectra,) or (n_spectra, n_outputs)) – The target class labels.

See also

test: Test how well the neural network has been trained.