First data analysis
We'll show you how it's done!
Fit: Fitting functions to data
In the last section, we loaded the data from the file "scripts/examples/amplitude.dat" into the table data() (if this is not the case for you, it is best to repeat the last section). We have also seen that the data contains amplitude vs. frequency curves, each with a maximum. So it is obvious that these are resonance measurements. So as a first analysis you can try to fit a resonance curve (also called Lorentz or Breit-Wigner function) to the data to extract the parameters of the measurement.
The commands shown here have many more options to adapt them to special situations. To show all of them here would go beyond the scope and possibilities. We recommend that you read the internal documentation (e.g. help fit and help fft) for all commands to familiarize yourself with them.
To do this, we first define the Lorentz function. This is not mandatory, but it makes the code in the following more readable:
<- define lorentz(x, x0, amplitude, damping, offset) := amplitude/sqrt((x^2-x0^2)^2+4*damping^2*x^2) + offset
-> The function definitions were updated successfully.
The parameters of the Lorentz function x0, amplitude, damping and offset, which are currently unknown to you, can now be extracted from the data using the fit command:
<- fit data(:,1:2) -with=lorentz(x, x0, amp, damp, off)
-> Fitting "amp/sqrt((x^2-x0^2)^2+4*damp^2*x^2) + off" ... Success.
===========================================================================================
-> NUMERE: FITTING RESULT
===========================================================================================
-> Function: 0.43003/sqrt((x^2-0.00083679^2)^2+4*3.0264^2*x^2) + 0.0080817
[...]
-> Parameter Initial value Fitted Asymptotic standard error
----------------------------------------------------------------------------------------
amp 0 0.4300348 ± 369.36 (8.589e+004%)
damp 0 3.026374 ± 3.0706e+006 (1.015e+008%)
off 0 0.008081738 ± 4.5169 (5.589e+004%)
x0 0 0.0008367877 ± 2.2077e+010 (2.638e+015%)
[...]
===========================================================================================
For you, there will be much more text in the command window here. We have abbreviated the very long output that fit generates at this point and want to focus on the result for now. In the table you will find the alphabetically sorted list of parameters with their initial and adjusted values as well as an estimate for their accuracy. If you look at the percentage deviation, you will surely suspect that something is wrong here. If you are more used to working with the coefficient of determination R², you can also calculate it with the following line:
<- 1 - chi^2 / sum((data(:, 2) - avg(data(:, 2)))^2)
-> ans = 0.08690886
Here, too, it can be seen that apparently no particularly good match is given. The variable chi was automatically generated by fit. To improve the analysis, let's graph the data together with the fit.
To do this, use the plot command again:
<- plot data(), Fit(x)
Again, two things stand out: You have used data() as an abbreviation for data(:, 1:2) (this actually happens very often and NumeRe will always interpret this abbreviation as appropriately as possible). Also, you did not use the lorentz() function to plot, but Fit(x). This function is always generated by fit with the current result, so you can plot the result easily and quickly.
Nevertheless, the result does not look very pleasing, as you can see on the right side. Although a function has been put through the data points, strangely enough hardly any point seems to be really close to the function.
Unfortunately, this is a fundamental problem of fitting algorithms. The results depend on the initial values and can be completely different by slight changes of these parameters. Therefore, for non-polynomial functions, it is useful to specify the approximate parameters as estimates.
Until now, we have used the (automatic) start value 0 for all parameters when fitting (correctly, this is the automatic initial value of the newly created variable). But you can also specify this explicitly by using the params=PARAMS option:
<- fit data(:,1:2) -with=lorentz(x, x0, amp, damp, off) params=[x0=2.5,amp=2,damp=0,off=0]
-> Fitting "amp/sqrt((x^2-x0^2)^2+4*damp^2*x^2) + off" ... Success.
===========================================================================================
-> NUMERE: FITTING RESULT
===========================================================================================
-> Function: 0.031963/sqrt((x^2-2.4575^2)^2+4*0.11405^2*x^2) + 0.007953
[...]
-> Parameter Initial value Fitted Asymptotic standard error
----------------------------------------------------------------------------------------
amp 2 0.0319629 ± 0.0087816 (27.47%)
damp 0 0.114046 ± 0.030319 (26.58%)
off 0 0.00795303 ± 0.0045607 (57.35%)
x0 2.5 2.457466 ± 0.015739 (0.6404%)
[...]
===========================================================================================
If you calculate the coefficient of determination R² again, you get the following:
<- 1 - chi^2 / sum((data(:, 2) - avg(data(:, 2)))^2)
-> ans = 0.9443302
Now completely different parameter values are generated as result and also the percentage deviations of the errors are clearly smaller. If you now repeat the last plot (simply press [ARROW UP] twice and then [ENTER]), you will get the result on the right side.
Here the measured points are more clearly near the Lorentz function. So this parameter set describes the measured points much better. Especially the approximation of the x0 parameter has a big impact on the result in this case. The reason for this is that x0 occurs in the denominator of the function. Such parameters are particularly sensitive, since small changes here can have large effects.
Of course, you can also display the result for x0 directly in the legend. Just enter the following:
<- plot data(), Fit(x) "x_0 = " + #x0
Fourier transforms using FFT
Frequency or Fourier analysis is one of the most important tools in data analysis. Therefore we want to show you an example how to calculate a Fourier transformation in NumeRe. For this we will use the command fft. But first we need a suitable data set, which we can transform. Enter the following four lines into the command window:
<- new signal();
<- signal(:, 1) = {0:0.001:0.5};
<- signal(:, 2) = sin(_2pi*50*signal(:, 1)) + sin(_2pi*120*signal(:, 1)) + gauss(0, 2);
<- signal(#, :) = "Time t [s]", "Signal x(t)";
With the first three lines, we have created a somewhat noisy data set, like you might encounter when working with audio recordings. In the second line you can also see a way to generate a vector with iteratively filled values: {A : B : C} = {A, A+B, A+2*B, ..., C}, where C becomes part of the vector only if it is expressible by A+n*B. Moreover, {A : B} = {A : 1 : B}, or {A : -1 : B} if B < A.
The noise is accounted for by the function gauss(x0,x1), which produces Gaussian distributed noise around the position x0 with half-width x1. We also used automatic vectorization by using table columns (signal(:, 1)). The last row then adds the column headers, as you already know.
If you then display the data graphically, for example using
<- plot signal() -set title="Noisy time domain signal" xlabel="Time t [s]" ylabel="x(t)" interpolate
then you will see for example (the data is randomized by the Gaussian noise) the result on the right side. This is also the first time we have used the interpolate option to represent the data points as a closed curve.
Using this representation to infer the frequencies present in this signal can be quite complicated if not impossible. In fact, with this curve, you would expect that it is just noise and any further analysis is unnecessary.
However, you know (as an experienced Data Scientist) that you can reconstruct the frequencies of a signal using an FFT. For this purpose, you use the command fft, as already indicated above:
<- fft signal()
The result of the FFT is then created in the fftdata() table. Of course you can also specify the target table with the -target=TABLE() option. To improve the signal-to-noise ratio, we convert the result into the so-called power spectral density:
<- fftdata(:, 2) ^= 2;
Correctly it should be fftdata(:, 2) *= conj(fftdata(:, 2)), but fft generates a purely real-valued amplitude-phase representation by default. If you want complex-valued results, you have to use the -complex option in fft.
If you then display the result graphically, you may be surprised:
<- plot fftdata(:, 1:2) -set title="Power spectral density" xlabel="Frequency f [Hz]" ylabel="Power"
The spectrum is horizontally symmetrical with positive and negative frequencies. You can also see a horizontal line running from left to right. What is the reason for this?
As you probably know, a frequency spectrum always contains positive and negative frequencies, whereby with a purely real input signal the negative frequencies are exact reflections of the positive frequencies. The horizontal line (in your case it can be at a different height) results from the fact that the frequencies are not displayed in the interval [-500 .. 500] as expected, but in the standard display in the interval [0 .. 500, -500 .. 0] and when plotting with interpolate the 500 and -500 points are connected horizontally (both amplitudes are identical).
So let's restrict the row range from the fftdata() table and also use the bars option to switch to a bar chart:
<- plot fftdata(:250, 1:2) -set title="Power spectral density" xlabel="Frequency f [Hz]" ylabel="Power" bars
We can now clearly see two sharp peaks protruding above the background. Looking at them, we can see that their frequencies are at about 50 Hz and 120 Hz. To verify this even more precisely, we can look for the indices of the table where amplitudes are greater than 0.5:
<- logtoidx(fftdata(:250, 2) > 0.5)
-> ans = { 26, 61}
The function logtoidx() converts the passed logic values (amplitude greater than 0.5: yes/no) into the indices where the logical statement is true. If we use these two indices to extract the corresponding frequencies, we get the following frequencies.
<- fftdata({26, 61}, 1)
-> ans = { 50, 120}
So, as we had suspected, the peaks are at the frequencies 50 Hz and 120 Hz. But since we had generated the data with exactly these two frequencies, this result probably surprises no one.