AlgoTraderAlgoTrader Documentation

Chapter 22. Historical Data

22.1. InfluxDB
22.2. Live Data Recording
22.3. Historical Data Download
22.4. Interactive Brokers Historical Data Download
22.5. Quandl Historical Data Download
22.6. Google Finance Historical Data Download
22.7. Market Data File Format
22.7.1. Tick Data Files
22.7.2. Bar Data Files

AlgoTrader uses the time series database InfluxDB for storage of historical data. InfluxDB is an open source database written in Go specifically designed to handle time series data with high availability and high performance requirements.

In addition the platform also provides a feature for downloading historical data from external market data providers. This historical data can be used for strategy simulations or for any type of analysis.

The HistoricalDataService provides all relevant functions for historical data:

To use the Historical Data Service the corresponding Spring profiles have to be added via VM argument:

InteractiveBrokers Historical Data Service:

-Dspring.profiles.active=influxDB,iBHistoricalData

Bloomberg Historical Data Service:

-Dspring.profiles.active=influxDB,bBHistoricalData

Quandl Historical Data Service:

-Dspring.profiles.active=influxDB,qdlHistoricalData

Noop Historical Data Service:

-Dspring.profiles.active=influxDB,noopHistoricalData

Note

The Noop Historical Data Service does not have a connection to an external data source. It can be used during Simulation to access existing historical data from InfluxDB.

For detailed information on InfluxDB please have a look at the InfluxDB Documentation .

InfluxDB can be installed locally or using Docker, please see Chapter 2, Installation and Deployment.

InfluxDB provides both a command line client (CLI) as well as a REST based client which is used by various client side libraries. AlgoTrader uses the influxdb-java library to communicate with InfluxDB. For all operations that involve the time series database InfluxDB, the following spring profile has to be specified via VM argument:

-Dspring.profiles.active=influxDB

If InfluxDB is installed locally, the influx command should be available via the command line. Executing influx will start the CLI and automatically connect to the local InfluxDB instance. The output should look like this:

$ influx -precision rfc3339
Connected to http://localhost:8086 version 1.1.x
InfluxDB shell 1.1.x
>

Note

  • The InfluxDB HTTP API runs on port 8086 by default. Therefore, influx will connect to port 8086 and localhost by default. If these defaults need to be altered please run influx --help.

  • The -precision argument specifies the format/precision of any returned timestamps. In the example above, rfc3339 tells InfluxDB to return timestamps in RFC3339 format (YYYY-MM-DDTHH:MM:SS.nnnnnnnnnZ).

The command line is now ready to take input in the form of the Influx Query Language (a.k.a InfluxQL) statements. To exit the InfluxQL shell, type exit and hit return.

Most InfluxQL statements must operate against a specific database. The CLI provides a convenience statement, USE <db-name>, which will automatically set the database for all future requests. To use the algotrader database please type:

> USE algotrader
Using database algotrader
>

Now future commands will only be run against the algotrader database.

At this point SQL-like queries can be executed against the database. In InfluxDB tables are called measurements. AlgoTrader uses the two measurements tick and bar. Columns that hold actual data (e.g. open or high) are called fields, and columns holding static data (e.g. barSize) are called tags.

As an example the following query shows all current bars in the database:

> select * from bar 
name: bar
time                    barSize close   feedType        high    low     open    securityId      vol
----                    ------- -----   --------        ----    ---     ----    ----------      ---
2017-01-02T16:48:05Z    MIN_1   116.41  IB              116.42  116.41  116.42  104             0
2017-01-02T16:49:05Z    MIN_1   116.44  IB              116.44  116.4   116.41  104             91
2017-01-02T16:50:04Z    MIN_1   116.44  IB              116.44  116.44  116.44  104             93
2017-01-02T16:59:00Z    MIN_1   116.49  IB              116.51  116.44  116.44  104             0

For an in depth description of the query syntax please visit the InfluxDB query language documentation.

To import existing data into InfluxDB please use the following command:

> influx -import -path <path-to-file>

To import bar data the import file has to be formatted as follows.

# DML
# CONTEXT-DATABASE: algotrader

bar,securityId=25,feedType=IB,barSize=MIN_1 open=1.30319,high=1.30402,low=1.30319,close=1.30367,vol=0 1324245720000000000
bar,securityId=25,feedType=IB,barSize=MIN_1 open=1.30369,high=1.30369,low=1.30351,close=1.30352,vol=0 1324245780000000000
bar,securityId=25,feedType=IB,barSize=MIN_1 open=1.30353,high=1.30383,low=1.30353,close=1.30382,vol=0 1324245840000000000
bar,securityId=25,feedType=IB,barSize=MIN_1 open=1.30381,high=1.30411,low=1.30373,close=1.30373,vol=0 1324245900000000000
bar,securityId=25,feedType=IB,barSize=MIN_1 open=1.30378,high=1.30428,low=1.30376,close=1.30425,vol=0 1324245960000000000
bar,securityId=25,feedType=IB,barSize=MIN_1 open=1.30426,high=1.30426,low=1.30396,close=1.30399,vol=0 1324246020000000000
bar,securityId=25,feedType=IB,barSize=MIN_1 open=1.30401,high=1.30411,low=1.30371,close=1.30378,vol=0 1324246080000000000

To import tick data the import file has to be formatted as follows:

# DML
# CONTEXT-DATABASE: algotrader

tick,securityId=25,feedType=IB last=1.303670,lastDateTime=1324245720000,bid=1.303670,ask=1.303670,volBid=0,volAsk=0,vol=0 1324245600000000000
tick,securityId=25,feedType=IB last=1.303670,lastDateTime=1324245720000,bid=1.303670,ask=1.303670,volBid=0,volAsk=0,vol=0 1324245660000000000
tick,securityId=25,feedType=IB last=1.303670,lastDateTime=1324245720000,bid=1.303670,ask=1.303670,volBid=0,volAsk=0,vol=0 1324245720000000000
tick,securityId=25,feedType=IB last=1.303520,lastDateTime=1324245780000,bid=1.303520,ask=1.303520,volBid=0,volAsk=0,vol=0 1324245780000000000
tick,securityId=25,feedType=IB last=1.303820,lastDateTime=1324245840000,bid=1.303820,ask=1.303820,volBid=0,volAsk=0,vol=0 1324245840000000000
tick,securityId=25,feedType=IB last=1.303730,lastDateTime=1324245900000,bid=1.303730,ask=1.303730,volBid=0,volAsk=0,vol=0 1324245900000000000
tick,securityId=25,feedType=IB last=1.304250,lastDateTime=1324245960000,bid=1.304250,ask=1.304250,volBid=0,volAsk=0,vol=0 1324245960000000000

For further information on InfluxDB import please visit the InfluxDB documentation

Note

  • The last column in the import file represents the time stamp, which needs to be defined in nanoseconds since the 1970-01-01

  • It is also possible to gz compress import files. In this case the command line switch -compressed has to be used when importing files.

To export bar data from InfluxDB into the AlgoTrader CSV file format (see Section 22.7, “Market Data File Format”) please use the following command:

> influx -execute "SELECT time as dateTime,open,high,low,close,vol FROM bar" -database "algotrader" -format csv -precision ms > bar.csv

To export tick data from InfluxDB please use the following command:

> influx -execute "SELECT time as dateTime,last,lastDateTime,volBid,volAsk,bid,ask,vol FROM tick" -database "algotrader" -format csv -precision ms > tick.csv

Note

The InfluxDB export adds an extra column named "name" as the first column. In order to use the exported .csv file for simulations one has to remove the first column of the file. The following Linux command can be used to accomplish this:

cut --complement -f 1 -d, tick.csv > tick-new.csv

To convert an AlgoTrader CSV file into an InfluxDB import file the following two Utility classes can be used:

ch.algotrader.util.influxdb.CSVBarToInfluxDBConverter
ch.algotrader.util.influxdb.CSVTickToInfluxDBConverter

Using InfluxDB it is possible to store tick-level live data for all subscribed instruments while the system is running. To enable this feature the following setting inside conf-core.properties has to be enabled:

# enables market data persistence
statement.persistMarketData = true

In addition recorded tick-level data can be aggregated into bar-data on the fly by enabling the following setting inside conf-core.properties:

# enables market data persistence
statement.aggregateBars = true

# the bar size used for tick-to-bar aggregation and end-of-day historical bar download
historicalData.barSize = MIN_1

The storeHistoricalBars method of the Historical Data Service saves historical bars directly into InfluxDB. If the parameter replace is set to false the method storeHistoricalBars will save newly retrieved Bars after the last Bar currently in the database. Bars before the current last Bar will not be touched. If the parameter replace is set to true the method storeHistoricalBars however will replace all current Bars in the database within the specified time period

Download and storage of historical data can be invoked via the HistoricalDataStarter.

HistoricalDataStarter replaceBars startDate endDate marketDataEventType barSize securityId(s)

For example:

HistoricalDataStarter true 2016-01-01 2016-12-31 TRADES DAY_1 10 11 12

AlgoTrader also provides features to download missing historical data for all subscribed instruments either on startup or at a specific time of the day. For these functions the following sections of the conf-core.properties file applies:

# enables end-of-day historical bar download
statement.downloadHistoricalDataEOD = true

# the bar size used for tick-to-bar aggregation and end-of-day historical bar download
historicalData.barSize = MIN_1

# the market data event type used by the end-of-day historical bar download
historicalData.marketDataEventType = MIDPOINT

# Hour-of-Day when the end-of-day historical bar download takes place
historicalData.downloadHour = 2

# enables historical bar download on startup
historicalData.downloadHistoricalDataOnStartup = true

Depending on whether InteractiveBrokers, Bloomberg or Quandl is used for the historical data download the corresponding marketData profile has to be specified via VM argument.

InteractiveBrokers:

-Dspring.profiles.active=influxDB,iBHistoricalData

Bloomberg:

-Dspring.profiles.active=influxDB,bBHistoricalData

Quandl:

-Dspring.profiles.active=influxDB,qdlHistoricalData

The Historical Data Download incorporates historical data limitations in place by Interactive Brokers.

With IB API the following conditions can lead to pacing violations:

  • Making six or more historical data requests for the same Contract, Exchange and Tick Type within two seconds.

  • Making more than 60 historical data requests in any ten-minute period.

The AlgoTrader Historical Data Download avoids pacing violation by separating subsequent download requests by 10 seconds.

The Historical Data Download also takes the Valid Duration and Bar Size Settings for Historical Data Requests into account and splits large requests into subsequent smaller requests.

Quandl is a public service that provides a wide range of financial, economic and alternative data. AlgoTrader allows downloading historical data from Quandl. For more information please visit section Section 24.17, “Quandl”.

To download historical data from Google Finance the following two classes are available: The class ch.algotrader.starter.GoogleDailyDownloader is available to download daily closing prices and the class ch.algotrader.starter.GoogleIntradayDownloader can be used to download intraday prices.

When using CSV files for the back test all data files are placed inside the following directory structure:

An alternative approach is to feed market data for multiple securities using one file. E.g. it is possible to feed market data for futures using market data from the corresponding generic future. In this approach an additional column security has to be added to the market data file which will be used to identify the actual Security.

The first line within the file is the header row.

The file name for Section 21.5, “Generic Events” follows a different logic.