Labelling Tutorial
This Jupyter notebook provides a simple, semi-automated, method to produce a ground truth data set that can be used to train a neural network for use as a spectral shape classifier in the MCALF package. The following code can be adapted depending on the number of classifications that you want.
Download LabellingTutorial.ipynb
Load the required packages
[ ]:
import mcalf.models
from mcalf.utils.spec import normalise_spectrum
import requests
import numpy as np
from astropy.io import fits
import matplotlib.pyplot as plt
Download sample data
[ ]:
path = 'https://raw.githubusercontent.com/ConorMacBride/mcalf/main/examples/data/ibis8542data/'
for file in ('wavelengths.txt', 'spectra.fits'):
r = requests.get(path + file, allow_redirects=True)
with open(file, 'wb') as f:
f.write(r.content)
Load data files
[ ]:
wavelengths = np.loadtxt('wavelengths.txt') # Original wavelengths
with fits.open('spectra.fits') as hdul: # Raw spectral data
datacube = np.asarray(hdul[0].data, dtype=np.float64)
Initialise the model that will use the labelled data
[ ]:
model = mcalf.models.IBIS8542Model(original_wavelengths=wavelengths)
Select the spectra to label
[ ]:
n_points = 50
flat_choice = np.random.choice(np.arange(datacube[0].size), n_points, replace=False)
i_points, j_points = np.unravel_index(flat_choice, datacube[0].shape)
np.save('labelled_points.npy', np.array([i_points, j_points]))
[ ]:
i_points, j_points = np.load('labelled_points.npy')
Select the spectra to label from the data file
[ ]:
raw_spectra = datacube[:, i_points, j_points].T
Normalise each spectrum to be in range [0, 1]
[ ]:
labelled_spectra = np.empty((len(raw_spectra), len(model.constant_wavelengths)))
for i in range(len(labelled_spectra)):
labelled_spectra[i] = normalise_spectrum(raw_spectra[i], model=model)
Script to semi-automate the classification process
Type a number 0 - 4 for assign a classification to the plotted spectrum
Type 5 to skip and move on to the next spectrum
Type
backto move to the previous spectrumType
exitto give up (keeping ones already done)
The labels are present in the labels variable (-1 represents an unclassified spectrum)
[ ]:
labels = np.full(len(labelled_spectra), -1, dtype=int)
i = 0
while i < len(labelled_spectra):
# Show the spectrum to be classified along with description
plt.figure(figsize=(15, 10))
plt.plot(labelled_spectra[i])
plt.show()
print("i = {}".format(i))
print("absorption --- both --- emission / skip")
print(" 0 1 2 3 4 5 ")
# Ask for user's classification
classification = input('Type [0-4]:')
try: # Must be an integer
classification_int = int(classification)
except ValueError:
classification_int = -1 # Try current spectrum again
if classification == 'back':
i -= 1 # Go back to the previous spectrum
elif classification == 'exit':
break # Exit the loop, saving labels that were given
elif 0 <= classification_int <= 4: # Valid classification
labels[i] = int(classification) # Assign the classification to the spectrum
i += 1 # Move on to the next spectrum
elif classification_int == 5:
i += 1 # Skip and move on to the next spectrum
else: # Invalid integer classification
i += 0 # Try current spectrum again
Plot bar chart of classification populations
[ ]:
unique, counts = np.unique(labels, return_counts=True)
plt.figure()
plt.bar(unique, counts)
plt.title('Number of spectra in each classification')
plt.xlabel('Classification')
plt.ylabel('N_spectra')
plt.show()
Overplot the spectra of each classification
[ ]:
for classification in unique:
plt.figure()
for spectrum in labelled_spectra[labels == classification]:
plt.plot(model.constant_wavelengths, spectrum)
plt.title('Classification {}'.format(classification))
plt.yticks([0, 1])
plt.show()
Save the labelled spectra for use later
[ ]:
np.save('labelled_data.npy', labelled_spectra)
np.save('labels.npy', labels)