Assignment - Spectral Peaks

Aim

Given a data set representing the energy with respect to frequency, identify the location of any peaks and estimate the intensity (area within peak). The data is noisy and there is an underlying non-linear trend.

You will need to

Identify the trend. It could be linear, quadratic, or exponential.

\[ bx + c \qquad\mbox{or}\qquad a x^2 + b x + c \qquad\mbox{or}\qquad a e^{b x} + c \]

Identify number of peaks, and whether they are positive or negative.
For each peak, estimate
- The center
- The height above the underlying trend
- The area within the peak.

Dataset

Each data file consists of two columns (frequency, energy) both are scaled so that

the input (frequency) is in the interval 0 to 1.
the output (energy) is in the interval -5 to 5.

I have two data files that are common to all of you and four individual data files each. The common data files are

The individual data files are determined by your student number. As per GDPR I should not publish your student number so instead you run the following code with your student number.

import hashlib, os

def list_dataset(id='W00000000'):

    def name_to_seed(s):
        return int(hashlib.sha512(str(s).encode()).hexdigest(),16) % 2**32

    for p in ['A','B','C','D']:
        problem = id.upper() + '-' + p
        seed = name_to_seed(problem)
        filename = f'p-{seed:010d}.txt'

        print(id, p, filename, os.path.isfile(f'data/{filename}'))

All of the images are in the archive.
I have also included some simpler data files which should help the development of your code.

To use the data files in the archive, you can use the following code which copies the archive to the same folder as your notebook and extracts contents to current folder. Note that it creates two folders

data — contains the data files (.txt) and images of the spectral peaks (.png).
output — empty folder that you will use for your output.

url = 'https://setu-computationalphysics.github.io/live/topics/05-Spectral_Peaks/01-Problem_Specification/files/individual.zip'

import urllib.request 
urllib.request.urlretrieve(url, 'data.zip')

import zipfile
with zipfile.ZipFile('data.zip', 'r') as zip_ref:
    zip_ref.extractall('.')

Required functions

Your notebook should have the function

def process_data(filename='kmurphy.txt', ADD_YOUR_OTHER_ARGUMENTS_HERE_IF_WANTED):
    pass

which when passed the name of a txt file in data containing the data points:

identifies the trend.
identifies the peaks — center, height, width and area
generates txt file saved to output folder summarising results (see next section for format).
generates csv file saved to output folder summarising numerical results (see next section for format).

Output on Generating a Dataset

Each problem in folder data should generate two files in folder output with same basename. The txt file should have contents like

Trend:
    -0.826 x^2 +0.437 x +1.188
Number of peaks: 9
    Peak 0: c=0.1531    h=-0.7210    w=0.0152    a=0.0073
    Peak 1: c=0.2067    h= 0.8023    w=0.0161    a=0.0087
    Peak 2: c=0.2981    h=-0.7538    w=0.0162    a=0.0082
    Peak 3: c=0.3662    h=-0.7761    w=0.0165    a=0.0086
    Peak 4: c=0.4095    h=-0.9409    w=0.0156    a=0.0098
    Peak 5: c=0.5164    h= 0.7872    w=0.0178    a=0.0094
    Peak 6: c=0.7771    h= 0.6157    w=0.0188    a=0.0078
    Peak 7: c=0.8169    h=-0.8501    w=0.0118    a=0.0067
    Peak 8: c=0.8675    h=-0.9438    w=0.0128    a=0.0081

and the csv should have contents like

centre,height,width,area
15313706712831004,-0.7210378048534919,0.015214553810080614,0.007339635054804814
20672243434181659,0.8022816492248048,0.016131975744163238,0.0086590775416792
2981468738739708,-0.7537656205977101,0.0162392088819803,0.008189519158254182
3661868473275811,-0.7761159842494884,0.016479398153909846,0.008557071965497709
4094939136295262,-0.9409097426352324,0.01557047009979476,0.009801824000493207
5164469575753882,0.7871710197136214,0.01776265756743838,0.009354794467043001
7770860565574634,0.6156611198029336,0.018847540823450016,0.007763429801721572
8168541310701665,-0.8501072299550835,0.011831154164392762,0.006729116279430779
8675404723019733,-0.9437532545518952,0.012808659598865919,0.00808759427537501

Deliverables

Notebook containing code and conclusions describing your implementation.

The notebook must be fully executable as is.
Downloads and extracts the data archive.
Processes the common and student specific data files saving results to required files in folder output.
- The txt file contains the trend and the peak information
- The csv saves the peak data to a CSV with columns (centre, height, width, area).
In the event of errors/issues in your algorithm, your code should output a sane (understandable message) to the txt file.