ModelBase
- class mcalf.models.ModelBase(*, original_wavelengths=None, constant_wavelengths=None, delta_lambda=0.05, sigma=None, prefilter_response=None, prefilter_ref_main=None, prefilter_ref_wvscl=None, output=None, config=None)[source]
Bases:
object
Base class for spectral line model fitting.
Warning
This class should not be used directly. Use derived classes instead.
- Parameters
original_wavelengths (array_like) – One-dimensional array of wavelengths that correspond to the uncorrected spectral data.
stationary_line_core (float, optional, default=None) – Wavelength of the stationary line core.
constant_wavelengths (array_like, ndim=1, optional, default= see description) – The desired set of wavelengths that the spectral data should be rescaled to represent. It is assumed that these have constant spacing, but that may not be a requirement if you specify your own array. The default value is an array from the minimum to the maximum wavelength of original_wavelengths in constant steps of delta_lambda, overshooting the upper bound if the maximum wavelength has not been reached.
delta_lambda (float, optional, default=0.05) – The step used between each value of constant_wavelengths when its default value has to be calculated.
sigma (optional, default=None) – Sigma values used to weight the fit. This attribute should be set by a child class of
ModelBase
.prefilter_response (array_like, length=n_wavelengths, optional, default= see note) – Each constant wavelength scaled spectrum will be corrected by dividing it by this array. If prefilter_response is not given, and prefilter_ref_main and prefilter_ref_wvscl are not given, prefilter_response will have a default value of None.
prefilter_ref_main (array_like, optional, default= None) – If prefilter_response is not specified, this will be used along with prefilter_ref_wvscl to generate the default value of prefilter_response.
prefilter_ref_wvscl (array_like, optional, default=None) – If prefilter_response is not specified, this will be used along with prefilter_ref_main to generate the default value of prefilter_response.
config (str, optional, default=None) – Filename of a .yml file (relative to current directory) containing the initialising parameters for this object. Parameters provided explicitly to the object upon initialisation will override any provided in this file. All (or some) parameters that this object accepts can be specified in this file, except neural_network and config. Each line of the file should specify a different parameter and be formatted like emission_guess: ‘[-inf, wl-0.15, 1e-6, 1e-6]’ or original_wavelengths: ‘original.fits’ for example. When specifying a string, use ‘inf’ to represent np.inf and ‘wl’ to represent stationary_line_core as shown. If the string matches a file,
mcalf.utils.misc.load_parameter()
is used to load the contents of the file.output (str, optional, default=None) – If the program wants to output data, it will place it relative to the location specified by this parameter. Some methods will only save data to a file if this parameter is not None. Such cases will be documented where relevant.
- original_wavelengths
One-dimensional array of wavelengths that correspond to the uncorrected spectral data.
- Type
array_like
- neural_network
The neural network classifier object that is used to classify spectra. This attribute should be set by a child class of
ModelBase
.- Type
optional, default=None
- constant_wavelengths
The desired set of wavelengths that the spectral data should be rescaled to represent. It is assumed that these have constant spacing, but that may not be a requirement if you specify your own array. The default value is an array from the minimum to the maximum wavelength of original_wavelengths in constant steps of delta_lambda, overshooting the upper bound if the maximum wavelength has not been reached.
- Type
array_like, ndim=1, optional, default= see description
- sigma
Sigma values used to weight the fit. This attribute should be set by a child class of
ModelBase
.- Type
optional, default=None
- prefilter_response
Each constant wavelength scaled spectrum will be corrected by dividing it by this array. If prefilter_response is not given, and prefilter_ref_main and prefilter_ref_wvscl are not given, prefilter_response will have a default value of None.
- Type
array_like, length=n_wavelengths, optional, default= see note
- output
If the program wants to output data, it will place it relative to the location specified by this parameter. Some methods will only save data to a file if this parameter is not None. Such cases will be documented where relevant.
- Type
str, optional, default=None
- array
Array holding spectra.
- Type
numpy.ndarray, dimensions are [‘time’, ‘row’, ‘column’, ‘spectra’]
- background
Array holding spectral backgrounds.
- Type
numpy.ndarray, dimensions are [‘time’, ‘row’, ‘column’]
Attributes Summary
Methods Summary
_curve_fit
(model, spectrum, guess, sigma, ...)scipy.optimize.curve_fit()
wrapper with error handling._fit
(spectrum[, classification, spectrum_index])Fit a single spectrum for the given profile or classification.
_get_time_row_column
([time, row, column])Validate and infer the time, row and column index.
_load_data
(array[, names, target])Load a specified array into the model object.
Set the prefilter_response parameter.
Validate some of the object's attributes.
classify_spectra
([time, row, column, ...])Classify the specified spectra.
fit
([time, row, column, spectrum, ...])Fits the model to specified spectra.
fit_spectrum
(spectrum, **kwargs)Fits the specified spectrum array.
get_spectra
([time, row, column, spectrum, ...])Gets corrected spectra from the spectral array.
load_array
(array[, names])Load an array of spectra.
load_background
(array[, names])Load an array of spectral backgrounds.
test
(X, y)Test the accuracy of the trained neural network.
train
(X, y)Fit the neural network model to spectra matrix X and spectra labels y.
Attributes Documentation
- default_kwargs = {'constant_wavelengths': None, 'delta_lambda': 0.05, 'original_wavelengths': None, 'output': None, 'prefilter_ref_main': None, 'prefilter_ref_wvscl': None, 'prefilter_response': None, 'sigma': None, 'stationary_line_core': None}
- default_modelbase_kwargs = {'constant_wavelengths': None, 'delta_lambda': 0.05, 'original_wavelengths': None, 'output': None, 'prefilter_ref_main': None, 'prefilter_ref_wvscl': None, 'prefilter_response': None, 'sigma': None, 'stationary_line_core': None}
- stationary_line_core
Methods Documentation
- _curve_fit(model, spectrum, guess, sigma, bounds, x_scale, time=None, row=None, column=None)[source]
scipy.optimize.curve_fit()
wrapper with error handling.Passes a certain set of parameters to the
scipy.optimize.curve_fit()
function and catches some typical errors, presenting a more specific warning message.- Parameters
model (callable) – The model function, f(x, …). It must take the ModelBase.constant_wavelenghts attribute as the first argument and the parameters to fit as separate remaining arguments.
spectrum (array_like) – The dependent data, with length equal to that of the ModelBase.constant_wavelengths attribute.
guess (array_like, optional) – Initial guess for the parameters to fit.
sigma (array_like) – Determines the uncertainty in the spectrum. Used to weight certain regions of the spectrum.
bounds (2-tuple of array_like) – Lower and upper bounds on each parameter.
x_scale (array_like) – Characteristic scale of each parameter.
time (optional, default=None) – The time index for error handling.
row (optional, default=None) – The row index for error handling.
column (optional, default=None) – The column index for error handling.
- Returns
fitted_parameters (numpy.ndarray, length=n_parameters) – The parameters that recreate the model fitted to the spectrum.
success (bool) – Whether the fit was successful or an error had to be handled.
See also
fit
General fitting method.
fit_spectrum
Explicit spectrum fitting method.
Notes
More details can be found in the documentation for
scipy.optimize.curve_fit()
andscipy.optimize.least_squares()
.
- _fit(spectrum, classification=None, spectrum_index=None)[source]
Fit a single spectrum for the given profile or classification.
Warning
This call signature and docstring specify how the _fit method must be implemented in each subclass of ModelBase. It is not implemented in this class.
- Parameters
spectrum (numpy.ndarray, ndim=1, length=n_constant_wavelengths) – The spectrum to be fitted.
classification (int, optional, default=None) – Classification to determine the fitted profile to use.
spectrum_index (array_like or list or tuple, length=3, optional, default=None) – The [time, row, column] index of the spectrum provided. Only used for error reporting.
- Returns
result – Outcome of the fit returned in a
mcalf.models.FitResult
object.- Return type
See also
fit
The recommended method for fitting spectra.
mcalf.models.FitResult
The object that the fit method returns.
Notes
This method is called for each requested spectrum by the
models.ModelBase.fit()
method. This is where most of the adjustments to the fitting method should be made. See other subclasses of models.ModelBase for examples of how to implement this method in a new subclass. Seemodels.ModelBase.fit()
for more information on how this method is called.
- _get_time_row_column(time=None, row=None, column=None)[source]
Validate and infer the time, row and column index.
Takes any time, row and column index given and if any are not specified, they are returned as 0 if the spectral array only has one value at its dimension. If there are multiple and no index is specified, an error is raised due to the ambiguity.
- Parameters
time (optional, default=None) – The time index.
row (optional, default=None) – The row index.
column (optional, default=None) – The column index.
- Returns
time – The corrected time index.
row – The corrected row index.
column – The corrected column index.
See also
mcalf.utils.misc.make_iter
Make a variable iterable.
Notes
No type checking is done on the input indices so it can be anything but in most cases will need to be either an integer or iterable. The
mcalf.utils.misc.make_iter()
function can be used to make indices iterable.
- _load_data(array, names=None, target=None)[source]
Load a specified array into the model object.
Load array with dimension names names into the attribute specified by target.
- Parameters
array (numpy.ndarray) – The array to load.
names (list of str, length=`array.ndim`) – List of dimension names for array. Valid dimension names depend on target.
target ({'array', 'background'}) – The attribute to load the array into.
See also
load_array
Load and array of spectra.
load_background
Load an array of spectral backgrounds.
- _set_prefilter()[source]
Set the prefilter_response parameter.
Deprecated since version 0.2: Prefilter response correction code, and prefilter_response, prefilter_ref_main and prefilter_ref_wvscl, may be removed in a later release of MCALF. Spectra should be fully processed before loading into MCALF.
This method should be called in a child class once stationary_line_core has been set.
- _validate_base_attributes()[source]
Validate some of the object’s attributes.
- Raises
ValueError – To signal that an attribute is not valid.
- classify_spectra(time=None, row=None, column=None, spectra=None, only_normalise=False)[source]
Classify the specified spectra.
Will also normalise each spectrum such that its intensity will range from zero to one.
- Parameters
time (int or iterable, optional, default=None) – The time index. The index can be either a single integer index or an iterable. E.g. a list, a
numpy.ndarray
, a Python range, etc. can be used.row (int or iterable, optional, default=None) – The row index. See comment for time parameter.
column (int or iterable, optional, default=None) – The column index. See comment for time parameter.
spectra (numpy.ndarray, optional, default=None) – The explicit spectra to classify. If only_normalise is False, this must be 1D. However, if only_normalise is set to true, spectra can be of any dimension. It is assumed that the final dimension is wavelengths, so return shape will be the same as spectra, except with no final wavelengths dimension.
only_normalise (bool, optional, default=False) – Whether the single spectrum given in spectra should not be interpolated and corrected. If set to true, the only processing applied to spectra will be a normalisation to be in range 0 to 1.
- Returns
classifications – Array of classifications with the same time, row and column indices as spectra.
- Return type
See also
train
Train the neural network.
test
Test the accuracy of the neural network.
get_spectra
Get processed spectra from the objects array attribute.
Examples
Create a basic model:
>>> import mcalf.models >>> import numpy as np >>> wavelengths = np.linspace(8542.1, 8542.2, 30) >>> model = mcalf.models.ModelBase(original_wavelengths=wavelengths)
Load a trained neural network:
>>> import pickle >>> pkl = open('trained_neural_network.pkl', 'rb') >>> model.neural_network = pickle.load(pkl)
Classify an individual spectrum:
>>> spectrum = np.random.rand(30) >>> model.classify_spectra(spectra=spectrum) array([2])
When
only_normalise=True
, classify an n-dimensional spectral array:>>> spectra = np.random.rand(5, 4, 3, 2, 30) >>> model.classify_spectra(spectra=spectra, only_normalise=True).shape (5, 4, 3, 2)
Load spectra from a file and classify:
>>> from astropy.io import fits >>> spectra = fits.open('spectra_0000.fits')[0].data >>> model.load_array(spectra, names=['wavelength', 'column', 'row']) >>> model.classify_spectra(column=range(10, 15), row=[7, 16]) array([[[0, 2, 0, 3, 0], [4, 0, 1, 0, 0]]])
- fit(time=None, row=None, column=None, spectrum=None, classifications=None, background=None, n_pools=None, **kwargs)[source]
Fits the model to specified spectra.
Fits the model to an array of spectra using multiprocessing if requested.
- Parameters
time (int or iterable, optional, default=None) – The time index. The index can be either a single integer index or an iterable. E.g. a list,
numpy.ndarray
, a Python range, etc. can be used.row (int or iterable, optional, default=None) – The row index. See comment for time parameter.
column (int or iterable, optional, default=None) – The column index. See comment for time parameter.
spectrum (numpy.ndarray, ndim=1, optional, default=None) – The explicit spectrum to fit the model to.
classifications (int or array_like, optional, default=None) – Classifications to determine the fitted profile to use. Will use neural network to classify them if not. If a multidimensional array, must have the same shape as [time, row, column]. Dimensions that would have length of 1 can be excluded.
background (float, optional, default=None) – If provided, this value will be subtracted from the explicit spectrum provided in spectrum. Will not be applied to spectra found from the indices, use the
load_background()
method instead.n_pools (int, optional, default=None) – The number of processing pools to calculate the fitting over. This allocates the fitting of different spectra to n_pools separate worker processes. When processing a large number of spectra this will make the fitting process take less time overall. It also distributes such that each worker process has the same ratio of classifications to process. This should balance out the workload between workers. If few spectra are being fitted, performance may decrease due to the overhead associated with splitting the evaluation over separate processes. If n_pools is not an integer greater than zero, it will fit the spectrum with a for loop.
**kwargs (dict, optional) – Extra keyword arguments to pass to
_fit()
.
- Returns
result – Outcome of the fits returned as a list of
FitResult
objects.- Return type
list of
FitResult
, length=n_spectra
Examples
Create a basic model:
>>> import mcalf.models >>> import numpy as np >>> wavelengths = np.linspace(8541.3, 8542.7, 30) >>> model = mcalf.models.ModelBase(original_wavelengths=wavelengths)
Set up the neural network classifier:
>>> model.neural_network = ... # load an untrained classifier >>> model.train(...) >>> model.test(...)
Load the spectra and background array:
>>> model.load_array(...) >>> model.load_background(...)
Fit a subset of the loaded spectra, using 5 processing pools:
>>> fits = model.fit(row=range(3, 5), column=range(200), n_pools=5) >>> fits ['Successful FitResult with ________ profile of classification 0', 'Successful FitResult with ________ profile of classification 2', ... 'Successful FitResult with ________ profile of classification 0', 'Successful FitResult with ________ profile of classification 4']
Merge the fit results into a
FitResults
object:>>> results = mcalf.models.FitResults((500, 500), 8) >>> for fit in fits: ... results.append(fit)
See
fit_spectrum()
examples for how to manually providing a spectrum to fit.
- fit_spectrum(spectrum, **kwargs)[source]
Fits the specified spectrum array.
Passes the spectrum argument to the
fit()
method. For easily iterating over a list of spectra.- Parameters
spectrum (numpy.ndarray, ndim=1) – The explicit spectrum.
**kwargs (dict, optional) – Extra keyword arguments to pass to
fit()
.
- Returns
result – Result of the fit.
- Return type
See also
fit
General fitting method.
Examples
Create a basic model:
>>> import mcalf.models >>> import numpy as np >>> wavelengths = np.linspace(8541.3, 8542.7, 30) >>> model = mcalf.models.ModelBase(original_wavelengths=wavelengths)
Quickly provide a spectrum and fit it. Remember that the model must be optimised for the spectra that it is asked to fit. In this example the neural network is not called upon to classify the provided spectrum as a classification is provided directly:
>>> spectrum = np.random.rand(30) >>> model.fit_spectrum(spectrum, classifications=0, background=142.2) Successful FitResult with ________ profile of classification 0
As the spectrum is provided manually, any background value must also be provided manually. Alternatively, the background can be subtracted before passing to the function, as by default, no background is subtracted:
>>> model.fit_spectrum(spectrum - 142.2, classifications=0) Successful FitResult with ________ profile of classification 0
- get_spectra(time=None, row=None, column=None, spectrum=None, correct=True, background=False)[source]
Gets corrected spectra from the spectral array.
Takes either a set of indices or an explicit spectrum and optionally applied corrections and background removal.
- Parameters
time (int or iterable, optional, default=None) – The time index. The index can be either a single integer index or an iterable. E.g. a list, a
numpy.ndarray
, a Python range, etc. can be used.row (int or iterable, optional, default=None) – The row index. See comment for time parameter.
column (int or iterable, optional, default=None) – The column index. See comment for time parameter.
spectrum (ndarray of ndim=1, optional, default=None) – The explicit spectrum. If provided, time, row, and column are ignored.
correct (bool, optional, default=True) – Whether to reinterpolate the spectrum and apply the prefilter correction (if exists).
background (bool, optional, default=False) – Whether to include the background in the outputted spectra. Only removes the background if the relevant background array has been loaded. Does not remove background is processing an explicit spectrum.
- Returns
spectra
- Return type
ndarray
Examples
Create a basic model:
>>> import mcalf.models >>> import numpy as np >>> wavelengths = np.linspace(8541.3, 8542.7, 30) >>> model = mcalf.models.ModelBase(original_wavelengths=wavelengths)
Provide a single spectrum for processing, and notice output is 1D:
>>> spectrum = model.get_spectra(spectrum=np.random.rand(30)) >>> spectrum.ndim 1
Load an array of spectra:
>>> spectra = np.random.rand(3, 4, 30) >>> model.load_array(spectra, names=['column', 'row', 'wavelength'])
Extract a single (unprocessed) spectrum from the loaded array, and notice output is 4D:
>>> spectrum = model.get_spectra(row=1, column=0, correct=False) >>> spectrum.shape (1, 1, 1, 30) >>> (spectrum[0, 0, 0] == spectra[0, 1]).all() True
Extract an array of spectra, and notice output is 4D, and with dimensions time, row, column, wavelength regardless of the original dimensions and order:
>>> spectrum = model.get_spectra(row=range(4), column=range(3)) >>> spectrum.shape (1, 4, 3, 30)
Notice that the time index can be excluded, as the loaded array only represents a single time. However, in this case leaving out row or column results in an error as it is ambiguous:
>>> spectrum = model.get_spectra(row=range(4)) Traceback (most recent call last): ... ValueError: column index must be specified as multiple indices exist
- load_array(array, names=None)[source]
Load an array of spectra.
Load array with dimension names names into the array parameter of the model object.
- Parameters
array (numpy.ndarray, ndim>1) – An array containing at least two spectra.
names (list of str, length=`array.ndim`) – List of dimension names for array. Valid dimension names are ‘time’, ‘row’, ‘column’ and ‘wavelength’. ‘wavelength’ is a required dimension.
See also
load_background
Load an array of spectral backgrounds.
Examples
Create a basic model:
>>> import mcalf.models >>> from astropy.io import fits >>> wavelengths = [0.0, 10.0, 20.0, 30.0, 40.0, 50.0] >>> model = mcalf.models.ModelBase(original_wavelengths=wavelengths)
Load spectra from a file:
>>> spectra = fits.open('spectra_0000.fits')[0].data >>> model.load_array(spectra, names=['wavelength', 'column', 'row'])
- load_background(array, names=None)[source]
Load an array of spectral backgrounds.
Load array with dimension names names into background parameter of the model object.
- Parameters
array (numpy.ndarray, ndim>0) – An array containing at least two backgrounds.
names (list of str, length=`array.ndim`) – List of dimension names for array. Valid dimension names are ‘time’, ‘row’ and ‘column’.
See also
load_array
Load and array of spectra.
Examples
Create a basic model:
>>> import mcalf.models >>> from astropy.io import fits >>> wavelengths = [0.0, 10.0, 20.0, 30.0, 40.0, 50.0] >>> model = mcalf.models.ModelBase(original_wavelengths=wavelengths)
Load background array from a file:
>>> background = fits.open('background_0000.fits')[0].data >>> model.load_background(background, names=['column', 'row'])
- test(X, y)[source]
Test the accuracy of the trained neural network.
Prints a table of results showing:
the percentage of predictions that equal the target labels;
the average classification deviation and standard deviation from the ground truth classification for each labelled classification;
the average classification deviation and standard deviation overall.
If the model object has an output parameter, it will create a CSV file (
output
/neural_network/test.csv) listing the predictions and ground truth data.- Parameters
X (numpy.ndarray or sparse matrix, shape=(n_spectra, n_wavelengths)) – The input spectra.
y (numpy.ndarray, shape= (n_spectra,) or (n_spectra, n_outputs)) – The target class labels.
See also
train
Train the neural network.
- train(X, y)[source]
Fit the neural network model to spectra matrix X and spectra labels y.
Calls the
fit()
method on the neural_network parameter of the model object.- Parameters
X (numpy.ndarray or sparse matrix, shape=(n_spectra, n_wavelengths)) – The input spectra.
y (numpy.ndarray, shape= (n_spectra,) or (n_spectra, n_outputs)) – The target class labels.
See also
test
Test how well the neural network has been trained.