pysprint.core.bases package¶

Submodules¶

pysprint.core.bases.algorithms module¶

longest_common_subsequence(x1, y1, x2, y2, tol=None)¶

Given two datasets with x-y values, find the longest common subsequence of them including a small threshold which might be present due to numerical errors. This function is mainly used when two datasets’s y values need to be multiplied together, but their domains are slightly off.

Parameters:

x1 (np.ndarray-like) – The x values for the original array
y1 (np.ndarray-like) – The y values for the original array
x2 (np.ndarray-like) – The x values for the second array
y2 (np.ndarray-like) – The y values for the second array
tol (float, optional) – The tolerance which determines how big difference is allowed _between x values to interpret them as the same datapoint.

Returns:

longest_x1 (np.ndarray) – The x values of longest common subsequence in x1.
longest_y1 (np.ndarray) – The y values of longest common subsequence in y1.
longest_x2 (np.ndarray) – The x values of longest common subsequence in x2.
longest_y2 (np.ndarray) – The y values of longest common subsequence in y2.

pysprint.core.bases.dataset module¶

This file implements the Dataset class with all the functionality that an interferogram should have in general.

class Dataset(x, y, ref=None, sam=None, meta=None, errors='raise', callback=None, parent=None, **kwargs)¶

Bases: object

This class implements all the functionality a dataset should have in general.

__init__(x, y, ref=None, sam=None, meta=None, errors='raise', callback=None, parent=None, **kwargs)¶

Base constructor for Dataset.

Parameters:

x (np.ndarray) – The x values.
y (np.ndarray) – The y values.
ref (np.ndarray, optional) – The reference arm’s spectra.
sam (np.ndarray, optional) – The sample arm’s spectra.
meta (dict-like) – The dictionary containing further information about the dataset. Can be extended, or set to be any valid ~collections.abc.Mapping.
errors (str, optional) – Whether to raise on missmatching sized data. Must be “raise” or “force”. If “force” then truncate to the shortest size. Default is “raise”.
callback (callable, optional) – The function that notifies parent objects about SPP related changes. In most cases the user should leave this empty. The default callback is only initialized if this object is constructed by the pysprint.SPPMethod object.
parent (any class, optional) – The object which handles the callback function. In most cases the user should leave this empty.
kwargs (dict, optional) – The window class to use in WFTMethod. Has no effect while using other methods. Must be a subclass of pysprint.core.windows.WindowBase.

Note

To load in data by files, see the other constructor parse_raw.

meta¶: Additional info about the dataset

chrange(current_unit, target_unit='phz')¶

Change the domain range of the dataset.

Supported units for frequency:

PHz
THz
GHz

Supported units for wavelength:

um
nm
pm
fm

Parameters:

current_unit (str) – The current unit of the domain. Case insensitive.
target_unit (str, optional) – The target unit. Must be compatible with the currect unit. Case insensitive. Default is phz.
inplace (bool, optional) – Whether to apply the operation on the dataset in an “inplace” manner. This means if inplace is True it will apply the changes directly on the current dataset and returns None. If inplace is False, it will leave the current object untouched, but returns a copy of it, and the operation will be performed on the copy. It’s useful when chaining operations on a dataset.

transform(func, axis=None, args=None, kwargs=None)¶

Function which enables to apply arbitrary function to the dataset.

Parameters:

func (callable) – The function to apply on the dataset.
axis (int or str, optional) – The axis which is the operation is performed on. Must be ‘x’, ‘y’, ‘0’ or ‘1’.
args (tuple, optional) – Additional arguments to pass to func.
kwargs (dict, optional) – Additional keyword arguments to pass to func.
inplace (bool, optional) – Whether to apply the operation on the dataset in an “inplace” manner. This means if inplace is True it will apply the changes directly on the current dataset and returns None. If inplace is False, it will leave the current object untouched, but returns a copy of it, and the operation will be performed on the copy. It’s useful when chaining operations on a dataset.

phase_plot(exclude_GD=False)¶

Plot the phase if the dispersion is already calculated.

Parameters:	exclude_GD (bool) – Whether to exclude the GD part of the polynomial. Default is False.

delay¶: Return the delay value if set.

positions¶: Return the SPP position(s) if set.

scale_up()¶: If the interferogram is normalized to [0, 1] interval, scale up to [-1, 1] with easy algebra.

GD_lookup(reference_point=None, engine='cwt', silent=False, **kwargs)¶

Quick GD lookup: it finds extremal points near the reference_point and returns an average value of 2*pi divided by distances between consecutive minimal or maximal values. Since it’s relying on peak detection, the results may be irrelevant in some cases. If the parent class is ~pysprint.CosFitMethod, then it will set the predicted value as initial parameter for fitting.

Parameters:

reference_point (float) – The reference point for the algorithm.
engine (str, optional) – The backend to use. Must be “cwt”, “normal” or “fft”. “cwt” will use scipy.signal.find_peaks_cwt function to detect peaks, “normal” will use scipy.signal.find_peaks to detect peaks. The “fft” engine uses Fourier-transform and looks for the outer peak to guess delay value. It’s not reliable when working with low delay values.
silent (bool, optional) – Whether to print the results immediately. Default in False.
kwargs (dict, optional) – Additional keyword arguments to pass for peak detection algorithms. These are: pmin, pmax, threshold, width, floor_thres, etc.. Most of them are described in the find_peaks and find_peaks_cwt docs.

static wave2freq(value)¶: Switches a single value between wavelength and angular frequency.

static freq2wave(value)¶: Switches a single value between angular frequency and wavelength.

classmethod parse_raw(filename, ref=None, sam=None, skiprows=0, decimal='.', sep=None, delimiter=None, comment=None, usecols=None, names=None, swapaxes=False, na_values=None, skip_blank_lines=True, keep_default_na=False, meta_len=1, errors='raise', callback=None, parent=None, **kwargs)¶

Dataset object alternative constructor. Helps to load in data just by giving the filenames in the target directory.

Parameters:

filename (str) – base interferogram file generated by the spectrometer
ref (str, optional) – reference arm’s spectra file generated by the spectrometer
sam (str, optional) – sample arm’s spectra file generated by the spectrometer
skiprows (int, optional) – Skip rows at the top of the file. Default is 0.
decimal (str, optional) – Character recognized as decimal separator in the original dataset. Often , for European data. Default is ..
sep (str, optional) – The delimiter in the original interferogram file. Default is ,.
delimiter (str, optional) – The delimiter in the original interferogram file. This is preferred over the sep argument if both given. Default is ,.
comment (str, optional) – Indicates remainder of line should not be parsed. If found at the beginning of a line, the line will be ignored altogether. This parameter must be a single character. Default is ‘#’.
usecols (list-like or callable, optional) – If there a multiple columns in the file, use only a subset of columns. Default is [0, 1], which will use the first two columns.
names (array-like, optional) – List of column names to use. Default is [‘x’, ‘y’]. Column marked with x (y) will be treated as the x (y) axis. Combined with the usecols argument it’s possible to select data from a large number of columns.
swapaxes (bool, optional) – Whether to swap x and y values in every parsed file. Default is False.
na_values (scalar, str, list-like, or dict, optional) – Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘<NA>’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’.
skip_blank_lines (bool) – If True, skip over blank lines rather than interpreting as NaN values. Default is True.
keep_default_na (bool) – Whether or not to include the default NaN values when parsing the data. Depending on whether na_values is passed in, the behavior changes. Default is False. More information available at: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
meta_len (int, optional) – The first n lines in the original file containing the meta information about the dataset. It is parsed to be dict-like. If the parsing fails, a new entry will be created in the dictionary with key unparsed. Default is 1.
errors (string, optional) – Determines the way how mismatching sized datacolumns behave. The default is raise, and it will raise on any error. If set to force, it will truncate every array to have the same shape as the shortest column. It truncates from the top of the file.
callback (callable, optional) – The function that notifies parent objects about SPP related changes. In most cases the user should leave this empty. The default callback is only initialized if this object is constructed by the pysprint.SPPMethod object.
parent (any class, optional) – The object which handles the callback function. In most cases the user should leave this empty.
kwargs (dict, optional) – The window class to use in WFTMethod. Has no effect while using other methods. Must be a subclass of pysprint.core.windows.WindowBase.

data¶: Returns the current dataset as pandas.DataFrame.

is_normalized¶: Retuns whether the dataset is normalized.

chdomain()¶

Changes from wavelength [nm] to ang. freq. [PHz] domain and vica versa.

Parameters: inplace (bool, optional) – Whether to apply the operation on the dataset in an “inplace” manner. This means if inplace is True it will apply the changes directly on the current dataset and returns None. If inplace is False, it will leave the current object untouched, but returns a copy of it, and the operation will be performed on the copy. It’s useful when chaining operations on a dataset.

detect_peak_cwt(widths, floor_thres=0.05, side='both')¶

Basic algorithm to find extremal points in data using scipy.signal.find_peaks_cwt.

Parameters:

widths (np.ndarray) – The widths passed to find_peaks_cwt.
floor_thres (float) – Will be removed.
side (str) – The side to use. Must be “both”, “max” or “min”. Default is “both”.

Returns:

xmax (array-like) – x coordinates of the maximums
ymax (array-like) – y coordinates of the maximums
xmin (array-like) – x coordinates of the minimums
ymin (array-like) – y coordinates of the minimums

Note

When using “min” or “max” as side, all the detected minimal and maximal values will be returned, but only the given side will be recorded for further calculation.

savgol_fil(window=5, order=3)¶

Applies Savitzky-Golay filter on the dataset.

Parameters:	window (int) – Length of the convolutional window for the filter. Default is 10. order (int) – Degree of polynomial to fit after the convolution. If not odd, it’s incremented by 1. Must be lower than window. Usually it’s a good idea to stay with a low degree, e.g 3 or 5. Default is 3.

Note

If arms were given, it will merge them into the self.y and self.y_norm variables. Also applies a linear interpolation o n dataset (and raises warning).

slice(start=None, stop=None)¶

Cuts the dataset on x axis.

Parameters:

start (float) – Start value of cutting interval. Not giving a value will keep the dataset’s original minimum value. Note that giving None will leave original minimum untouched too. Default is None.
stop (float) – Stop value of cutting interval. Not giving a value will keep the dataset’s original maximum value. Note that giving None will leave original maximum untouched too. Default is None.
inplace (bool, optional) – Whether to apply the operation on the dataset in an “inplace” manner. This means if inplace is True it will apply the changes directly on the current dataset and returns None. If inplace is False, it will leave the current object untouched, but returns a copy of it, and the operation will be performed on the copy. It’s useful when chaining operations on a dataset.

Note

If arms were given, it will merge them into the self.y and self.y_norm variables. After this operation, the arms’ spectra cannot be retrieved.

convolution(window_length, std=20)¶

Convolve the dataset with a specified Gaussian window.

Parameters:	window_length (int) – Length of the gaussian window. std (float) – Standard deviation of the gaussian window. Default is 20.

Note

If arms were given, it will merge them into the self.y and self.y_norm variables. Also applies a linear interpolation on dataset.

resample(N, kind='linear', **kwds)¶

Resample the interferogram to have N datapoints.

Parameters:

N (int) – The number of datapoints required.
kind (str, optional) – The type of interpolation to use. Default is linear.
kwds (optional) – Additional keyword argument to pass to scipy.interpolate.interp1d.
inplace (bool, optional) – Whether to apply the operation on the dataset in an “inplace” manner. This means if inplace is True it will apply the changes directly on the current dataset and returns None. If inplace is False, it will leave the current object untouched, but returns a copy of it, and the operation will be performed on the copy. It’s useful when chaining operations on a dataset.

Raises:

PySprintWarning, if trying to subsample to lower N datapoints than original.

detect_peak(pmax=0.1, pmin=0.1, threshold=0.1, except_around=None, side='both')¶

Basic algorithm to find extremal points in data using scipy.signal.find_peaks.

Parameters:

pmax (float) – Prominence of maximum points. The lower it is, the more peaks will be found. Default is 0.1.
pmin (float) – Prominence of minimum points. The lower it is, the more peaks will be found. Default is 0.1.
threshold (float) – Sets the minimum distance (measured on y axis) required for a point to be accepted as extremal. Default is 0.
except_around (interval (array or tuple),) – Overwrites the threshold to be 0 at the given interval. format is (lower, higher) or [lower, higher]. Default is None.
side (str) – The side to use. Must be “both”, “max” or “min”. Default is “both”.

Returns:

xmax (array-like) – x coordinates of the maximums
ymax (array-like) – y coordinates of the maximums
xmin (array-like) – x coordinates of the minimums
ymin (array-like) – y coordinates of the minimums

Note

When using “min” or “max” as side, all the detected minimal and maximal values will be returned, but only the given side will be recorded for further calculation.

plot_outside(*args, **kwargs)¶: Plot the current dataset out of the notebook. For detailed parameters see Dataset.plot function.

plot(ax=None, title=None, xlim=None, ylim=None, **kwargs)¶

Plot the dataset.

Parameters:	ax (matplotlib.axes.Axes, optional) – An axis to draw the plot on. If not given, it will plot on the last used axis. title (str, optional) – The title of the plot. xlim (tuple, optional) – The limits of x axis. ylim (tuple, optional) – The limits of y axis. kwargs (dict, optional) – Additional keyword arguments to pass to plot function.

Note

If SPP positions are correctly set, it will mark them on plot.

show()¶: Equivalent with plt.show().

normalize(filename=None, smoothing_level=0)¶

Normalize the interferogram by finding upper and lower envelope on an interactive matplotlib editor. Points can be deleted with key d and inserted with key i. Also points can be dragged using the mouse. On complete just close the window. Must be called with interactive backend. The best practice is to call this function inside ~pysprint.interactive context manager.

Parameters:

filename (str, optional) – Save the normalized interferogram named by filename in the working directory. If not given it will not be saved. Default None.
smoothing_level (int, optional) – The smoothing level used on the dataset before finding the envelopes. It applies Savitzky-Golay filter under the hood. Default is 0.
inplace (bool, optional) – Whether to apply the operation on the dataset in an “inplace” manner. This means if inplace is True it will apply the changes directly on the current dataset and returns None. If inplace is False, it will leave the current object untouched, but returns a copy of it, and the operation will be performed on the copy. It’s useful when chaining operations on a dataset.

open_SPP_panel(header=None)¶

Opens the interactive matplotlib editor for SPP data. Use i button to add a new point, use d key to delete one. The delay field is parsed to only get the numeric values. Close the window on finish. Must be called with interactive backend. The best practice is to call this function inside ~pysprint.interactive context manager.

Parameters:	header (str, optional) – An arbitary string to include as header. This can be any attribute’s name, or even metadata key.

emit()¶

Emit the current SPP data.

Returns:	delay (np.ndarray) – The delay value for the current dataset, shaped exactly like positions. positions (np.ndarray) – The given SPP positions.

set_SPP_data(delay, positions, force=False)¶

Set the SPP data (delay and SPP positions) for the dataset.

Parameters:

delay (float) – The delay value that belongs to the current interferogram. Must be given in fs units.
positions (float or iterable) – The SPP positions that belong to the current interferogram. Must be float or sequence of floats (tuple, list, np.ndarray, etc.)
force (bool, optional) – Can be used to set specific SPP positions which are outside of the dataset’s range. Note that in most cases you should avoid using this option. Default is False.

Note

Every position given must be in the current dataset’s range, otherwise ValueError is raised. Be careful to change domain to frequency before feeding values into this function.