How to Access Data from the Net

In many cases, data that one would like to process is not on the local machine, but is located in an online repository. This can either be a file host or it is an API, through which you can get the desired and possibly even pre-filtered data by means of a URL. We'll show how to get to it, because it's an easy to understand example, with the data of the DAX, which we get from the Yahoo Finance API.

The Basics: How to Access Data Online?

Many providers on the net make their data available via an API, which can usually be found more or less easily. Occasionally, you will also find documentation on how to use it, but usually a look at the corresponding URL is enough. In our example we want to get historical data of the DAX from the Yahoo Finance API. We can find this (for the DAX, but for other stock indices it is analogous) at https://de.finance.yahoo.com/quote/%5EGDAXI/history?p=%5EGDAXI. Yahoo makes it easy for us by providing a download button whose link directly displays a readable URL:

https://query1.finance.yahoo.com/v7/finance/download/%5EGDAXI?period1=1663484464&period2=1695020464&interval=1d&events=history&includeAdjustedClose=true

The first part up to and including /download/ describes the type of API. This is beside the point for us and simply needs to be used as is. The parameters that occur after that are much more interesting to us. %5EGDAXI is the name of the stock index and contains a HEX-encoded circumflex (these characters are not allowed as part of a URL), so it should be read as ^GDAXI. As another example, the EuroSTOXX50 is represented as %5ESTOXX50E, as ^STOXX50E.

period1 and period2 describe two points in time in seconds since the epoch, i.e. since 1/1/1970. interval contains the time resolution, in our case 1 day, events=history says that we want historical data and includeAdjustedClose=true adds another column with the adjusted closing prices to the received data.

If we use this link, the API sends us a CSV file, which is very charming and easy to act. Consequently, we can use the url command to get data directly from this API. In this case, the CSV file is saved as dax.csv in <loadpath> with the following line:

url "https://query1.finance.yahoo.com/v7/finance/download/%5EGDAXI?period1=1663484464&period2=1695020464&interval=1d&events=history&includeAdjustedClose=true" -file="<loadpath>/dax.csv";

But of course we don't want the data between the same two dates every time, but for example always from last year related to the current date. To do this, we need to encode the corresponding date in seconds, which we can easily do with the function as_date(nYear, nMonth, nDay). We then convert the obtained value directly into a string using the string parser: #as_date(...)

(The complete code ready to copy and run can be found below).

If we run the command, download the data and then load it into NumeRe, we can plot it using the candlestick plot option, for example. We will get the following result:

How Can We Further Process this Data?

If you look closely, you will notice that some of the bars in the candlestick plot are wider than all the other bars. If you examine the corresponding data, you will find that only the working days, i.e. the stock exchange trading days, can be found in it. Since the width of the bars depends on the distance to the following data point, they coincidentally always index the days before weekends and holidays.

We can fix this by inserting the missing weekends into the data with the value NaN. In doing so, we want to keep the order of the dates as much as possible and find the weekends in the correct positions. This is where matop mode brings in the helpful function coordstogrid(), which can be used to interpolate coordinates from one data grid into another data grid (or the whole thing can be done one-dimensionally), i.e. to determine at which grid point of another grid the passed points would be located. To use this, we first create a complete coordinate axis including the weekends using {as_date(nYear-1, nMonth, nDay):86400:as_date(nYear, nMonth, nDay)}, where 86400 is exactly the number of seconds per day. We now map the existing data points into this axis with coordstogrid().

The whole thing already looks much more respectable. But we can do more to make the data even more accessible: we can highlight the long-term trend by the moving average. One possible function for this is the matrix function movavg(). This also copes with data series that contain NaNs. However, we can also use the retouch command to simply retouch the missing points into the data. This is not perfect, of course, but can make patchy data much easier to process in some cases.

Below you can see the cleaned data set together with the moving average, once calculated directly and once after applying retouch before.

Ready to give it a shot? Here's the code: