Labelling Tutorial
This Jupyter notebook provides a simple, semi-automated, method to produce a ground truth data set that can be used to train a neural network for use as a spectral shape classifier in the MCALF package. The following code can be adapted depending on the number of classifications that you want.
Download LabellingTutorial.ipynb
Load the required packages
[ ]:
import mcalf.models
from mcalf.utils.spec import normalise_spectrum
import requests
import numpy as np
from astropy.io import fits
import matplotlib.pyplot as plt
Download sample data
[ ]:
path = 'https://raw.githubusercontent.com/ConorMacBride/mcalf/main/examples/data/ibis8542data/'
for file in ('wavelengths.txt', 'spectra.fits'):
r = requests.get(path + file, allow_redirects=True)
with open(file, 'wb') as f:
f.write(r.content)
Load data files
[ ]:
wavelengths = np.loadtxt('wavelengths.txt') # Original wavelengths
with fits.open('spectra.fits') as hdul: # Raw spectral data
datacube = np.asarray(hdul[0].data, dtype=np.float64)
Initialise the model that will use the labelled data
[ ]:
model = mcalf.models.IBIS8542Model(original_wavelengths=wavelengths)
Select the spectra to label
[ ]:
n_points = 50
flat_choice = np.random.choice(np.arange(datacube[0].size), n_points, replace=False)
i_points, j_points = np.unravel_index(flat_choice, datacube[0].shape)
np.save('labelled_points.npy', np.array([i_points, j_points]))
[ ]:
i_points, j_points = np.load('labelled_points.npy')
Select the spectra to label from the data file
[ ]:
raw_spectra = datacube[:, i_points, j_points].T
Normalise each spectrum to be in range [0, 1]
[ ]:
labelled_spectra = np.empty((len(raw_spectra), len(model.constant_wavelengths)))
for i in range(len(labelled_spectra)):
labelled_spectra[i] = normalise_spectrum(raw_spectra[i], model=model)
Script to semi-automate the classification process
Type a number 0 - 4 for assign a classification to the plotted spectrum
Type 5 to skip and move on to the next spectrum
Type
back
to move to the previous spectrumType
exit
to give up (keeping ones already done)
The labels are present in the labels
variable (-1 represents an unclassified spectrum)
[ ]:
labels = np.full(len(labelled_spectra), -1, dtype=int)
i = 0
while i < len(labelled_spectra):
# Show the spectrum to be classified along with description
plt.figure(figsize=(15, 10))
plt.plot(labelled_spectra[i])
plt.show()
print("i = {}".format(i))
print("absorption --- both --- emission / skip")
print(" 0 1 2 3 4 5 ")
# Ask for user's classification
classification = input('Type [0-4]:')
try: # Must be an integer
classification_int = int(classification)
except ValueError:
classification_int = -1 # Try current spectrum again
if classification == 'back':
i -= 1 # Go back to the previous spectrum
elif classification == 'exit':
break # Exit the loop, saving labels that were given
elif 0 <= classification_int <= 4: # Valid classification
labels[i] = int(classification) # Assign the classification to the spectrum
i += 1 # Move on to the next spectrum
elif classification_int == 5:
i += 1 # Skip and move on to the next spectrum
else: # Invalid integer classification
i += 0 # Try current spectrum again
Plot bar chart of classification populations
[ ]:
unique, counts = np.unique(labels, return_counts=True)
plt.figure()
plt.bar(unique, counts)
plt.title('Number of spectra in each classification')
plt.xlabel('Classification')
plt.ylabel('N_spectra')
plt.show()
Overplot the spectra of each classification
[ ]:
for classification in unique:
plt.figure()
for spectrum in labelled_spectra[labels == classification]:
plt.plot(model.constant_wavelengths, spectrum)
plt.title('Classification {}'.format(classification))
plt.yticks([0, 1])
plt.show()
Save the labelled spectra for use later
[ ]:
np.save('labelled_data.npy', labelled_spectra)
np.save('labels.npy', labels)