Labelling Tutorial

This Jupyter notebook provides a simple, semi-automated, method to produce a ground truth data set that can be used to train a neural network for use as a spectral shape classifier in the MCALF package. The following code can be adapted depending on the number of classifications that you want.

Download LabellingTutorial.ipynb

Load the required packages

[ ]:

import mcalf.models
from mcalf.utils.spec import normalise_spectrum
import requests
import numpy as np
from astropy.io import fits
import matplotlib.pyplot as plt

Download sample data

[ ]:

path = 'https://raw.githubusercontent.com/ConorMacBride/mcalf/main/examples/data/ibis8542data/'

for file in ('wavelengths.txt', 'spectra.fits'):
    r = requests.get(path + file, allow_redirects=True)
    with open(file, 'wb') as f:
        f.write(r.content)

Load data files

[ ]:

wavelengths = np.loadtxt('wavelengths.txt')  # Original wavelengths

with fits.open('spectra.fits') as hdul:  # Raw spectral data
    datacube = np.asarray(hdul[0].data, dtype=np.float64)

Initialise the model that will use the labelled data

[ ]:

model = mcalf.models.IBIS8542Model(original_wavelengths=wavelengths)

Select the spectra to label

[ ]:

n_points = 50

flat_choice = np.random.choice(np.arange(datacube[0].size), n_points, replace=False)
i_points, j_points = np.unravel_index(flat_choice, datacube[0].shape)
np.save('labelled_points.npy', np.array([i_points, j_points]))

[ ]:

i_points, j_points = np.load('labelled_points.npy')

Select the spectra to label from the data file

[ ]:

raw_spectra = datacube[:, i_points, j_points].T

Normalise each spectrum to be in range [0, 1]

[ ]:

labelled_spectra = np.empty((len(raw_spectra), len(model.constant_wavelengths)))
for i in range(len(labelled_spectra)):
    labelled_spectra[i] = normalise_spectrum(raw_spectra[i], model=model)

Script to semi-automate the classification process

Type a number 0 - 4 for assign a classification to the plotted spectrum
Type 5 to skip and move on to the next spectrum
Type back to move to the previous spectrum
Type exit to give up (keeping ones already done)

The labels are present in the labels variable (-1 represents an unclassified spectrum)

[ ]:

labels = np.full(len(labelled_spectra), -1, dtype=int)
i = 0
while i < len(labelled_spectra):

    # Show the spectrum to be classified along with description
    plt.figure(figsize=(15, 10))
    plt.plot(labelled_spectra[i])
    plt.show()
    print("i = {}".format(i))
    print("absorption --- both --- emission / skip")
    print("       0    1    2    3    4         5 ")

    # Ask for user's classification
    classification = input('Type [0-4]:')

    try:  # Must be an integer
        classification_int = int(classification)
    except ValueError:
        classification_int = -1  # Try current spectrum again

    if classification == 'back':
        i -= 1  # Go back to the previous spectrum
    elif classification == 'exit':
        break  # Exit the loop, saving labels that were given
    elif 0 <= classification_int <= 4:  # Valid classification
        labels[i] = int(classification)  # Assign the classification to the spectrum
        i += 1  # Move on to the next spectrum
    elif classification_int == 5:
        i += 1  # Skip and move on to the next spectrum
    else:  # Invalid integer classification
        i += 0  # Try current spectrum again

Plot bar chart of classification populations

[ ]:

unique, counts = np.unique(labels, return_counts=True)
plt.figure()
plt.bar(unique, counts)
plt.title('Number of spectra in each classification')
plt.xlabel('Classification')
plt.ylabel('N_spectra')
plt.show()

Overplot the spectra of each classification

[ ]:

for classification in unique:
    plt.figure()
    for spectrum in labelled_spectra[labels == classification]:
        plt.plot(model.constant_wavelengths, spectrum)
    plt.title('Classification {}'.format(classification))
    plt.yticks([0, 1])
    plt.show()

Save the labelled spectra for use later

[ ]:

np.save('labelled_data.npy', labelled_spectra)
np.save('labels.npy', labels)