본문 바로가기
Finance

Financial Market Data Access in Python

by 자동매매 2023. 3. 20.

7

Financial Market Data Access in Python

is chapter outlines several key market data sources, ranging from free to paid data sources. A more complete list of available resources can be obtained from https://github.com/wilsonfreitas/awesome-quant#data-sources.

e quality of algorithmic trading models’ signals fundamentally depends on the quality

of market data being analyzed. Has the market data been cleaned of erroneous records and is there a quality assurance process in place to rectify any errors as they occur? If there is a problem with the market data feed, how quickly can the data be corrected?

e following free data sources described are suitable for learning purposes, but not fit

for purpose as regards professional trading there may be a very low limit on the number of API calls per day, the APIs may be slow, and there is no support and no rectification of the data should it not be correct. In addition, when using any of these data providers, be aware of their terms of use.

In this chapter, we are going to cover the following key topics:

Exploring the yahoofinancials Python library

Exploring the pandas_datareader Python library

190 Financial Market Data Access in Python

Exploring the Quandl data source

Exploring the IEX Cloud data source

Exploring the MarketStack data source

Technical requirements

e Python code used in this chapter is available in the Chapter07/marketdata. ipynb notebook in the book’s code repository.

Exploring the yahoofinancials Python library

e yahoofinancials Python library provides free access to the market data available from Yahoo Finance, whose provider is ICE Data Services. e library repository is available at https://github.com/JECSand/yahoofinancials.

It provides access to historical and, for most assets, also real-time pricing data for the following:

Currencies

Indexes

Stocks

Commodities

ETFs

Mutual funds

US Treasuries

Cryptocurrencies

To find the right ticker, use the lookup at https://finance.yahoo.com/.

ere is a very strict limit on the number of calls per hour per IP address (about 1,000-2,000 requests per hour per IP address) and once you reach it, your IP address gets blocked for an extensive period of time. In addition, the functionality provided constantly changes.

Installation of the library is standard:

pip install yahoofinancials

Access to the data is very straightforward, as follows:

from yahoofinancials import YahooFinancials

e library supports both single-ticker retrieval and multiple-tickers retrieval.

Exploring the yahoofinancials Python library 191

Single-ticker retrieval

e steps regarding single-ticker retrieval are as follows:

  1. First, we define the AAPL ticker object:

aapl = yf.Ticker("AAPL")

  1. en, there is the issue of historical data retrieval. Let’s print all historical daily price data for the year of 2020:

hist = aapl.get_historical_price_data('2020-01-01', '2020-12-31', 'daily') print(hist)

e output starts with the following:

{'AAPL': {'eventsData': {'dividends': {'2020-02-07': {'amount': 0.1925, 'date': 1581085800, 'formatted_

date': '2020-02-07'}, '2020-05-08': {'amount': 0.205, 'date': 1588944600, 'formatted_date': '2020-05-08'}, '2020-08-07': {'amount': 0.205, 'date': 1596807000, 'formatted_date': '2020-08-07'}, '2020-11-06':

{'amount': 0.205, 'date': 1604673000, 'formatted_

date': '2020-11-06'}}, 'splits': {'2020-08-31':

{'date': 1598880600, 'numerator': 4, 'denominator': 1, 'splitRatio': '4:1', 'formatted_date': '2020-08-31'}}}, 'firstTradeDate': {'formatted_date': '1980-12-12',

'date': 345479400}, 'currency': 'USD', 'instrumentType': 'EQUITY', 'timeZone': {'gmtOffset': -18000}, 'prices': [{'date': 1577975400, 'high': 75.1500015258789, 'low': 73.79750061035156, 'open': 74.05999755859375, 'close': 75.0875015258789, 'volume': 135480400, 'adjclose': 74.4446029663086, 'formatted_date': '2020-01-02'},

{'date': 1578061800, 'high': 75.1449966430664,

'low': 74.125, 'open': 74.2874984741211, 'close': 74.35749816894531, 'volume': 146322800, 'adjclose': 73.72084045410156, 'formatted_date': '2020-01-03'}, {'date': 1578321000, 'high': 74.98999786376953,

'low': 73.1875, 'open': 73.44750213623047, 'close': 74.94999694824219, 'volume': 118387200, 'adjclose': 74.30826568603516, 'formatted_date': '2020-01-06'}, {'date': 1578407400, 'high': 75.2249984741211, 'low': 74.37000274658203, 'open': 74.95999908447266, 'close': 74.59750366210938, 'volume': 108872000, 'adjclose':

192 Financial Market Data Access in Python

73.95879364013672, 'formatted_date': '2020-01-07'}, {'date': 1578493800, 'high': 76.11000061035156, 'low': 74.29000091552734, 'open': 74.29000091552734, 'close': 75.79750061035156, 'volume': 132079200, 'adjclose': 75.14852142333984, 'formatted_date': '2020-01-08'}, {'date': 1578580200, 'high': 77.60749816894531,

'low': 76.55000305175781, 'open': 76.80999755859375, 'close': 77.40750122070312, 'volume': 170108400, 'adjclose': 76.7447280883789, 'formatted_date': '2020- 01-09'}, {'date': 1578666600, 'high': 78.1675033569336, 'low': 77.0625, 'open': 77.6500015258789, 'close': 77.5824966430664, 'volume': 140644800, 'adjclose': 76.91822052001953, 'formatted_date': '2020-01-10'}, {'date': 1578925800, 'high': 79.26750183105469, 'low': 77.7874984741211, 'open': 77.91000366210938, 'close': 79.23999786376953, 'volume': 121532000, 'adjclose': 78.56153106689453, 'formatted_date': '2020-01-13'}, {'date': 1579012200, 'high': 79.39250183105469,

'low': 78.0425033569336, 'open': 79.17500305175781, 'close': 78.16999816894531, 'volume': 161954400, 'adjclose': 77.50070190429688, 'formatted_date': '2020- 01-14'}, {'date': 1579098600, 'high': 78.875, 'low': 77.38749694824219, 'open': 77.9625015258789, 'close': 77.83499908447266, 'volume': 121923600, 'adjclose': 77.16856384277344, 'formatted_date': '2020-01-15'}, {'date': 1579185000, 'high': 78.92500305175781, 'low': 78.02249908447266, 'open': 78.39749908447266, 'close': 78.80999755859375, 'volume': 108829200, 'adjclose': 78.13522338867188, 'formatted_date': '2020-01-16'}, {'date': 1579271400, 'high': 79.68499755859375,

'low': 78.75, 'open': 79.06749725341797, 'close': 79.68250274658203, 'volume': 137816400, 'adjclose': 79.000244140625, 'formatted_date': '2020-01-17'},

{'date': 1579617000, 'high': 79.75499725341797,

'low': 79.0, 'open': 79.29750061035156, 'close': 79.14250183105469, 'volume': 110843200, 'adjclose': 78.46488189697266, 'formatted_date': '2020-01-21'}, {'date': 1579703400, 'high': 79.99749755859375, 'low': 79.32749938964844, 'open': 79.6449966430664, 'close': 79.42500305175781, 'volume': 101832400, 'adjclose': 78.74495697021484, 'formatted_date': '2020-01-22'}, ...

Note

You can change the frequency from 'daily' to 'weekly' or 'monthly'.

Exploring the yahoofinancials Python library 193

  1. Now, let’s inspect the weekly data results:

hist = aapl.get_historical_price_data('2020-01-01', '2020-12-31', 'weekly') print(hist)

e output is as follows:

{'AAPL': {'eventsData': {'dividends': {'2020-02-05': {'amount': 0.1925, 'date': 1581085800, 'formatted_

date': '2020-02-07'}, '2020-05-06': {'amount': 0.205, 'date': 1588944600, 'formatted_date': '2020-05-08'}, '2020-08-05': {'amount': 0.205, 'date': 1596807000, 'formatted_date': '2020-08-07'}, '2020-11-04': {'amount': 0.205, 'date': 1604673000, 'formatted_date': '2020-11- 06'}}, 'splits': {'2020-08-26': {'date': 1598880600, 'numerator': 4, 'denominator': 1, 'splitRatio': '4:1', 'formatted_date': '2020-08-31'}}}, 'firstTradeDate': {'formatted_date': '1980-12-12', 'date': 345479400}, 'currency': 'USD', 'instrumentType': 'EQUITY',

'timeZone': {'gmtOffset': -18000}, 'prices': [{'date': 1577854800, 'high': 75.2249984741211, 'low': 73.1875, 'open': 74.05999755859375, 'close': 74.59750366210938, 'volume': 509062400, 'adjclose': 73.95879364013672, 'formatted_date': '2020-01-01'}, {'date': 1578459600, 'high': 79.39250183105469, 'low': 74.29000091552734,

'open': 74.29000091552734, 'close': 78.16999816894531, 'volume': 726318800, 'adjclose': 77.50070190429688, 'formatted_date': '2020-01-08'}, {'date': 1579064400, 'high': 79.75499725341797, 'low': 77.38749694824219,

'open': 77.9625015258789, 'close': 79.14250183105469, 'volume': 479412400, 'adjclose': 78.46488189697266, 'formatted_date': '2020-01-15'}, {'date': 1579669200, 'high': 80.8324966430664, 'low': 76.22000122070312,

'open': 79.6449966430664, 'close': 79.42250061035156, 'volume': 677016000, 'adjclose': 78.74247741699219, 'formatted_date': '2020-01-22'}, {'date': 1580274000, 'high': 81.9625015258789, 'low': 75.55500030517578,

'open': 81.11250305175781, 'close': 79.7125015258789, 'volume': 853162800, 'adjclose': 79.02999877929688, 'formatted_date': '2020-01-29'}, {'date': 1580878800, 'high': 81.30500030517578, 'low': 78.4625015258789,

'open': 80.87999725341797, 'close': 79.90249633789062, 'volume': 545608400, 'adjclose': 79.21836853027344,

194 Financial Market Data Access in Python

'formatted_date': '2020-02-05'}, {'date': 1581483600, 'high': 81.80500030517578, 'low': 78.65249633789062, 'open': 80.36750030517578, 'close': 79.75, 'volume': 441122800, 'adjclose': 79.25482177734375, 'formatted_ date': '2020-02-12'}, {'date': 1582088400, 'high': 81.1624984741211, 'low': 71.53250122070312, 'open': 80.0, 'close': 72.0199966430664, 'volume': 776972800, 'adjclose': 71.57282257080078, 'formatted_date': '2020- 02-19'}, {'date': 1582693200, 'high': 76.0, 'low': 64.09249877929688, 'open': 71.63249969482422, 'close': 72.33000183105469, 'volume': 1606418000, 'adjclose': 71.88089752197266, 'formatted_date': '2020-02-

26'}, {'date': 1583298000, 'high': 75.8499984741211, 'low': 65.75, 'open': 74.11000061035156, 'close': 71.33499908447266, 'volume': 1204962800, 'adjclose': 70.89207458496094, 'formatted_date': '2020-03-04'}, {'date': 1583899200, 'high': 70.3050003051757 ...

  1. en, we check the monthly data results:

hist = aapl.get_historical_price_data('2020-01-01', '2020-12-31', 'monthly') print(hist)

e output is as follows:

{'AAPL': {'eventsData': {'dividends': {'2020-05-01': {'amount': 0.205, 'date': 1588944600, 'formatted_

date': '2020-05-08'}, '2020-08-01': {'amount': 0.205, 'date': 1596807000, 'formatted_date': '2020-08-07'}, '2020-02-01': {'amount': 0.1925, 'date': 1581085800, 'formatted_date': '2020-02-07'}, '2020-11-01': {'amount': 0.205, 'date': 1604673000, 'formatted_date': '2020-11- 06'}}, 'splits': {'2020-08-01': {'date': 1598880600, 'numerator': 4, 'denominator': 1, 'splitRatio': '4:1', 'formatted_date': '2020-08-31'}}}, 'firstTradeDate': {'formatted_date': '1980-12-12', 'date': 345479400}, 'currency': 'USD', 'instrumentType': 'EQUITY',

'timeZone': {'gmtOffset': -18000}, 'prices': [{'date': 1577854800, 'high': 81.9625015258789, 'low': 73.1875, 'open': 74.05999755859375, 'close': 77.37750244140625, 'volume': 2934370400, 'adjclose': 76.7149887084961, 'formatted_date': '2020-01-01'}, {'date': 1580533200, 'high': 81.80500030517578, 'low': 64.09249877929688,

Exploring the yahoofinancials Python library 195

'open': 76.07499694824219, 'close': 68.33999633789062, 'volume': 3019851200, 'adjclose': 67.75486755371094, 'formatted_date': '2020-02-01'}, {'date': 1583038800, 'high': 76.0, 'low': 53.15250015258789, 'open': 70.56999969482422, 'close': 63 ...

  1. e nested JSON can easily be converted to a pandas’ DataFrame: import pandas as pd

hist_df = \

pd.DataFrame(hist['AAPL']['prices']).drop('date', axis=1).set_index('formatted_date')

print(hist_df) e output is as follows:

Figure 7.1 Nested JSON converted to a pandas’ DataFrame

196 Financial Market Data Access in Python

Notice the two columns adjclose and close. e adjusted close is the close price adjusted for dividends, stock splits, and other corporate events.

Real-time data retrieval

To get real-time stock price data, use the get_stock_price_data() function:

print(aapl.get_stock_price_data())

e output is as follows:

{'AAPL': {'quoteSourceName': 'Nasdaq Real Time Price', 'regularMarketOpen': 137.35, 'averageDailyVolume3Month': 107768827, 'exchange': 'NMS', 'regularMarketTime':

'2021-02-06 03:00:02 UTC+0000', 'volume24Hr': None, 'regularMarketDayHigh': 137.41, 'shortName': 'Apple Inc.', 'averageDailyVolume10Day': 115373562, 'longName': 'Apple

Inc.', 'regularMarketChange': -0.42500305, 'currencySymbol':

'$', 'regularMarketPreviousClose': 137.185, 'postMarketTime': '2021-02-06 06:59:58 UTC+0000', 'preMarketPrice':

None, 'exchangeDataDelayedBy': 0, 'toCurrency': None, 'postMarketChange': -0.0800018, 'postMarketPrice': 136.68, 'exchangeName': 'NasdaqGS', 'preMarketChange': None, 'circulatingSupply': None, 'regularMarketDayLow': 135.86, 'priceHint': 2, 'currency': 'USD', 'regularMarketPrice':

136.76, 'regularMarketVolume': 72317009, 'lastMarket': None, 'regularMarketSource': 'FREE_REALTIME', 'openInterest': None, 'marketState': 'CLOSED', 'underlyingSymbol': None, 'marketCap': 2295940513792, 'quoteType': 'EQUITY', 'volumeAllCurrencies': None, 'postMarketSource': 'FREE_REALTIME', 'strikePrice':

None, 'symbol': 'AAPL', 'postMarketChangePercent':

-0.00058498, 'preMarketSource': 'FREE_REALTIME', 'maxAge':

1, 'fromCurrency': None, 'regularMarketChangePercent': -0.0030980287}}

Real-time data for free data sources is usually delayed by 10 to 30 minutes.

Exploring the yahoofinancials Python library 197

As regards the retrieval of financial statements, let’s get financial statements for Apple’s stock the income statement, cash flow, and balance sheet:

statements = aapl.get_financial_stmts('quarterly',

['income', 'cash', 'balance']) print(statements)

e output is as follows:

{'incomeStatementHistoryQuarterly': {'AAPL': [{'2020-

12-26': {'researchDevelopment': 5163000000, 'effectOfAccountingCharges': None, 'incomeBeforeTax': 33579000000, 'minorityInterest': None, 'netIncome': 28755000000, 'sellingGeneralAdministrative': 5631000000, 'grossProfit': 44328000000, 'ebit': 33534000000, 'operatingIncome': 33534000000, 'otherOperatingExpenses': None, 'interestExpense': -638000000, 'extraordinaryItems': None, 'nonRecurring': None, 'otherItems': None, 'incomeTaxExpense': 4824000000, 'totalRevenue': 111439000000, 'totalOperatingExpenses': 77905000000, 'costOfRevenue': 67111000000, 'totalOtherIncomeExpenseNet': 45000000, 'discontinuedOperations': None, 'netIncomeFromContinuingOps': 28755000000, 'netIncomeApplicableToCommonShares':

28755000000}}, {'2020-09-26': {'researchDevelopment': 4978000000, 'effectOfAccountingCharges': None,

'incomeBeforeTax': 14901000000, 'minorityInterest': None, 'netIncome': 12673000000, 'sellingGeneralAdministrative': 4936000000, 'grossProfit': ...

ere are multiple uses of financial statement data in relation to algorithmic trading.

First, it can be used to determine the totality of stocks to trade in. Second, the creation of algorithmic trading signals from non-price data adds additional value.

Summary data retrieval

Summary data is accessible via the get_summary_data method:

print(aapl.get_summary_data())

198 Financial Market Data Access in Python

e output is as follows:

{'AAPL': {'previousClose': 137.185, 'regularMarketOpen': 137.35, 'twoHundredDayAverage': 119.50164, 'trailingAnnualDividendYield': 0.0058825673, 'payoutRatio': 0.2177, 'volume24Hr': None, 'regularMarketDayHigh': 137.41, 'navPrice': None, 'averageDailyVolume10Day': 115373562, 'totalAssets': None, 'regularMarketPreviousClose': 137.185, 'fiftyDayAverage': 132.86455, 'trailingAnnualDividendRate': 0.807, 'open': 137.35, 'toCurrency': None,

'averageVolume10days': 115373562, 'expireDate': '-',

'yield': None, 'algorithm': None, 'dividendRate':

0.82, 'exDividendDate': '2021-02-05', 'beta':

1.267876, 'circulatingSupply': None, 'startDate': '-', 'regularMarketDayLow': 135.86, 'priceHint': 2, 'currency': 'USD', 'trailingPE': 37.092484, 'regularMarketVolume': 72317009, 'lastMarket': None, 'maxSupply': None,

'openInterest': None, 'marketCap': 2295940513792, 'volumeAllCurrencies': None, 'strikePrice': None, 'averageVolume': 107768827, 'priceToSalesTrailing12Months': 7.805737, 'dayLow': 135.86, 'ask': 136.7, 'ytdReturn': None, 'askSize': 1100, 'volume': 72317009, 'fiftyTwoWeekHigh': 145.09, 'forwardPE': 29.410751, 'maxAge': 1, 'fromCurrency': None, 'fiveYearAvgDividendYield': 1.44, 'fiftyTwoWeekLow': 53.1525, 'bid': 136.42, 'tradeable': False, 'dividendYield': 0.0061000003, 'bidSize': 2900, 'dayHigh': 137.41}}

Summary data retrieved using this function is a summary of the financial statements function and the real-time data function.

Multiple-tickers retrieval

Multiple-tickers retrieval, also known as a bulk retrieval, is far more efficient and faster than single-ticker retrieval since most of the time associated with each download request is spent on establishing and closing the network connection.

Exploring the yahoofinancials Python library 199

Historical data retrieval

Let’s retrieve the historical prices for these FX pairs: EURCHF, USDEUR, and GBPUSD:

currencies = YahooFinancials(['EURCHF=X', 'USDEUR=X', 'GBPUSD=x']) print(currencies.get_historical_price_data('2020-01-01', '2020-12-31', 'weekly'))

e output is as follows:

{'EURCHF=X': {'eventsData': {}, 'firstTradeDate': {'formatted_ date': '2003-01-23', 'date': 1043280000}, 'currency': 'CHF', 'instrumentType': 'CURRENCY', 'timeZone': {'gmtOffset': 0}, 'prices': [{'date': 1577836800, 'high': 1.0877000093460083, 'low': 1.0818699598312378, 'open': 1.0872000455856323, 'close': 1.084280014038086, 'volume': 0, 'adjclose': 1.084280014038086, 'formatted_date': '2020-01-01'}, {'date': 1578441600,

'high': 1.083299994468689, 'low': 1.0758999586105347, 'open': 1.080530047416687, 'close': 1.0809999704360962, 'volume':

0, 'adjclose': 1.0809999704360962, 'formatted_date': '2020- 01-08'}, {'date': 1579046400, 'high': 1.0774999856948853,

'low': 1.0729299783706665, 'open': 1.076300024986267,

'close': 1.0744800567626953, 'volume': 0, 'adjclose': 1.0744800567626953, 'formatted_date': '2020-01-15'},

{'date': 1579651200, 'high': 1.0786099433898926, 'low': 1.0664700269699097, 'open': 1.0739500522613525, 'close': 1.068600058555603, 'volume': 0, 'adjclose': 1.068600058555603, 'formatted_date': '2020-01-22'}, {'date': 1580256000, 'high': 1.0736199617385864, 'low': 1.0663000345230103, 'open': 1.0723999738693237, 'close': 1.0683200359344482, 'volume': 0, 'adjclose': 1.068320035 ...

We see that the historical data does not contain any data from the financial statements.

e full list of methods supported by the library at the time of writing this book is as follows:

get_200day_moving_avg()

get_50day_moving_avg()

get_annual_avg_div_rate()

get_annual_avg_div_yield()

get_beta()

200 Financial Market Data Access in Python

get_book_value()

get_cost_of_revenue()

get_currency()

get_current_change() get_current_percent_change() get_current_price()

get_current_volume()

get_daily_dividend_data(start_date, end_date) get_daily_high()

get_daily_low()

get_dividend_rate()

get_dividend_yield()

get_earnings_per_share()

get_ebit()

get_exdividend_date()

get_financial_stmts(frequency, statement_type, reformat=True)

get_five_yr_avg_div_yield() get_gross_profit()

get_historical_price_data(start_date, end_date, time_ interval)

get_income_before_tax()

get_income_tax_expense()

get_interest_expense()

get_key_statistics_data()

get_market_cap()

get_net_income() get_net_income_from_continuing_ops() get_num_shares_outstanding(price_type='current') get_open_price()

get_operating_income()

Exploring the pandas_datareader Python library 201

get_payout_ratio()

get_pe_ratio()

get_prev_close_price() get_price_to_sales() get_research_and_development() get_stock_earnings_data(reformat=True) get_stock_exchange() get_stock_price_data(reformat=True) get_stock_quote_type_data() get_summary_data(reformat=True) get_ten_day_avg_daily_volume() get_three_month_avg_daily_volume() get_total_operating_expense() get_total_revenue()

get_yearly_high()

get_yearly_low()

We will explore the pandas_datareader library in the next section.

Exploring the pandas_datareader Python library

pandas_datareader is one of the most advanced libraries for financial data and offers access to multiple data sources.

Some of the data sources supported are as follows:

Yahoo Finance

e Federal Reserve Bank of St Louis’ FRED IEX

Quandl

Kenneth French’s data library

World Bank

202 Financial Market Data Access in Python

OECD

Eurostat

Econdb

Nasdaq Trader symbol definitions

Refer to https://pandas-datareader.readthedocs.io/en/latest/ remote_data.html for a full list.

Installation is simple:

pip install pandas-datareader

Let’s now set up the basic data retrieval parameters:

from pandas_datareader import data start_date = '2010-01-01'

end_date = '2020-12-31'

e general access method for downloading the data is data.DataReader(ticker, data_source, start_date, end_date).

Access to Yahoo Finance

Let’s download the last 10 years’ worth of Apple stock prices:

aapl = data.DataReader('AAPL', 'yahoo', start_date, end_date)

aapl

High Low Open Close Volume Adj Close Date

2010-01-

04 7.660714 7.585000 7.622500 7.643214 493729600.0 6.593426

2010-01-

05 7.699643 7.616071 7.664286 7.656428 601904800.0 6.604825

2010-01-

06 7.686786 7.526786 7.656428 7.534643 552160000.0 6.499768

2010-01-

07 7.571429 7.466072 7.562500 7.520714 477131200.0 6.487752

2010-01-

08 7.571429 7.466429 7.510714 7.570714 447610800.0 6.530883

Exploring the pandas_datareader Python library 203

... ... ... ... ... ... ...

2020-12

-21 128.309998 123.449997 125.019997 128.229996 121251600.0 128.229996

2020-12-22 134.410004 129.649994 131.610001 131.880005 168904800.0 131.880005

2020-12-23 132.429993 130.779999 132.160004 130.960007 88223700.0 130.960007

2020-12-24 133.460007 131.100006 131.320007 131.970001 54930100.0 131.970001

2020-12-28 137.339996 133.509995 133.990005 136.690002 124182900.0 136.690002

e output is virtually identical to the output from the yahoofinancials library in the preceding section.

Access to EconDB

e list of available tickers is available at https://www.econdb.com/main- indicators.

Let’s download the time series of monthly oil production in the US for the last 10 years:

oilprodus = data.DataReader('ticker=OILPRODUS', 'econdb', start_date, end_date) oilprodus

Reference Area United States of America

Energy product Crude oil

Flow breakdown Production

Unit of measure Thousand Barrels per day (kb/d) TIME_PERIOD

2010-01-01 5390

2010-02-01 5548

2010-03-01 5506

2010-04-01 5383

2010-05-01 5391

... ...

2020-04-01 11990

2020-05-01 10001

204 Financial Market Data Access in Python

2020-06-01 10436 2020-07-01 10984

2020-08-01 10406

Each data source has different output columns.

Access to the Federal Reserve Bank of St Louis' FRED

e list of available data, along with tickers, can be inspected at https://fred. stlouisfed.org/.

Let’s download the last 10 years of real gross domestic product of the USA:

import pandas as pd pd.set_option('display.max_rows', 2)

gdp = data.DataReader('GDP', 'fred', start_date, end_date) gdp

We restricted the output to just two rows:

GDP DATE 2010-01-01 14721.350 ... ... 2020-07-01 21170.252 43 rows × 1 columns

Now, let’s study 5 years of the 20-year constant maturity yields on U.S. government bonds:

gs10 = data.get_data_fred('GS20') gs10

GS20

DATE

2016-01-01 2.49

... ...

2020-11-01 1.40

59 rows × 1 columns

e Federal Reserve Bank of St Louis’ FRED data is one of the cleanest data sources available, offering complimentary support.

Exploring the pandas_datareader Python library 205

Caching queries

One of the key advantages of the library is its implementation of caching the results of queries, thereby saving bandwidth, speeding up code execution, and preventing the banning of IPs due to the overuse of APIs.

By way of an example, let’s download the entire history of Apple stock:

import datetime

import requests_cache

session = \ requests_cache.CachedSession(cache_name='cache', backend='sqlite',

expire_after = \ datetime.timedelta(days=7))

aapl_full_history = \ data.DataReader("AAPL",'yahoo',datetime.datetime(1980,1,1), datetime.datetime(2020, 12, 31), session=session)

aapl_full_history

High Low Open Close Volume Adj Close

Date

1980-12-

12 0.128906 0.128348 0.128348 0.128348 469033600.0 0.101087

... ... ... ... ... ... ...

2020-12-28 137.339996 133.509995 133.990005 136.690002 124182900.0 136.690002

Let’s now access just one data point:

aapl_full_history.loc['2013-01-07']

High 18.903572

...

Adj Close 16.284145

Name: 2013-01-07 00:00:00, Length: 6, dtype: float64

Caching can be enabled for all previous examples, too.

206 Financial Market Data Access in Python

Exploring the Quandl data source

Quandl is one of the largest repositories of economic/financial data on the internet. Its data sources can be accessed free of charge. It also offers premium data sources, for which there is a charge.

Installation is straightforward:

pip install quandl

To access the data, you have to provide an access key (apply for one at https://quandl.com):

import quandl

quandl.ApiConfig.api_key = 'XXXXXXX'

To find a ticker and data source, use https://www.quandl.com/search. Let’s now download the Monthly average consumer prices in

metropolitan France - Apples (1 Kg); EUR data:

papple = quandl.get('ODA/PAPPLE_USD') papple

Value

Date

1998-01-31 1.735999

... ...

2020-11-30 3.350000

275 rows × 1 columns

Let’s now download Apple’s fundamental data:

aapl_fundamental_data = quandl.get_table('ZACKS/FC', ticker='AAPL')

m_ticker ticker comp_name comp_name_2 exchange currency_ code per_end_date per_type per_code per_fisc_

year ... stock_based_compsn_qd cash_flow_oper_activity_

qd net_change_prop_plant_equip_qd comm_stock_div_paid_

qd pref_stock_div_paid_qd tot_comm_pref_stock_div_qd wavg_ shares_out wavg_shares_out_diluted eps_basic_net eps_ diluted_net

None

Exploring the IEX Cloud data source 207

0 AAPL AAPL APPLE INC Apple Inc. NSDQ USD 2018-09-30 A None 2018 ... NaN NaN NaN NaN None NaN 19821.51 20000.44 3.000 2.980

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

4 AAPL AAPL APPLE INC Apple Inc. NSDQ USD 2018-12-31 Q None 2019 ... 1559.0 26690.0 -3355.0 -3568.0 None -3568.0 18943.28 19093.01 1.055 1.045

5 rows × 249 columns

e difference between Yahoo and Quandl data is that the Quandl data is more reliable

and more complete.

Exploring the IEX Cloud data source

IEX Cloud is one of the commercial offerings. It offers a plan for individuals at USD 9 per month. It also offers a free plan, with a limit of 50,000 API calls per month.

e installation of the Python library is standard:

pip install iexfinance

e full library’s documentation is available at https://addisonlynch.github. io/iexfinance/stable/index.html.

e following code is designed to retrieve all symbols:

from iexfinance.refdata import get_symbols get_symbols(output_format='pandas', token="XXXXXX")

symbol exchange exchangeSuffix exchangeName name date type iexId region currency isEnabled figi cik lei

0 A NYS UN NEW YORK STOCK EXCHANGE, INC. Agilent Technologies Inc. 2020-12-29 cs IEX_46574843354B2D52 US USD True BBG000C2V3D6 0001090872 QUIX8Y7A2WP0XRMW7G29

... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

9360 ZYXI NAS NASDAQ CAPITAL MARKET Zynex Inc 2020-12-29 cs IEX_4E464C4C4A462D52 US USD True BBG000BJBXZ2

0000846475 None

9361 rows × 14 columns

208 Financial Market Data Access in Python

e following code is designed to obtain Apple’s balance sheet (not available for free accounts):

from iexfinance.stocks import Stock aapl = Stock("aapl", token="XXXXXX") aapl.get_balance_sheet()

e following code is designed to get the current price (not available for free accounts):

aapl.get_price()

e following code is designed to get the sector performance report (not available for free accounts):

from iexfinance.stocks import get_sector_performance get_sector_performance(output_format='pandas',

token =token)

e following code is designed to get historical market data for Apple:

from iexfinance.stocks import get_historical_data

get_historical_data("AAPL", start="20190101", end="20200101",

output_format='pandas', token=token)

close high low open symbol volume id key subkey updated ... uLow uVolume fOpen fClose fHigh fLow fVolume label change changePercent

2019-01-02 39.48 39.7125 38.5575 38.7225 AAPL 148158948 HISTORICAL_PRICES AAPL 1606830572000

... 154.23 37039737 37.8227 38.5626 38.7897 37.6615 148158948 Jan 2, 19 0.045 0.0011

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

2019-12-31 73.4125 73.42 72.38 72.4825 AAPL 100990500 HISTORICAL_PRICES AAPL 1606830572000 ... 289.52 25247625 71.8619 72.7839 72.7914 71.7603 100990500

Dec 31, 19 0.5325 0.0073

252 rows × 25 columns

We can see that each data source offers a slightly different set of output columns.

Exploring the MarketStack data source 209

Exploring the MarketStack data source

MarketStack offers an extensive database of real-time, intra-day, and historical market data across major global stock exchanges. It offers free access for up to 1,000 monthly API requests.

While there is no official MarketStack Python library, the REST JSON API provides comfortable access to all its data in Python.

Let’s download the adjusted close data for Apple:

import requests

params = {

'access_key': 'XXXXX' }

api_result = \

requests.get('http://api.marketstack.com/v1/tickers/aapl/eod', params)

api_response = api_result.json()

print(f"Symbol = {api_response['data']['symbol']}") for eod in api_response['data']['eod']:

print(f"{eod['date']}: {eod['adj_close']}") Symbol = AAPL

2020-12-28T00:00:00+0000: 136.69 2020-12-24T00:00:00+0000: 131.97 2020-12-23T00:00:00+0000: 130.96 2020-12-22T00:00:00+0000: 131.88 2020-12-21T00:00:00+0000: 128.23 2020-12-18T00:00:00+0000: 126.655 2020-12-17T00:00:00+0000: 128.7 2020-12-16T00:00:00+0000: 127.81 2020-12-15T00:00:00+0000: 127.88 2020-12-14T00:00:00+0000: 121.78 2020-12-11T00:00:00+0000: 122.41 2020-12-10T00:00:00+0000: 123.24 2020-12-09T00:00:00+0000: 121.78

210 Financial Market Data Access in Python

2020-12-08T00:00:00+0000: 124.38 2020-12-07T00:00:00+0000: 123.75 2020-12-04T00:00:00+0000: 122.25

Let’s now download all tickers on the Nasdaq stock exchange:

api_result = !

requests.get('http://api.marketstack.com/v1/exchanges/XNAS/ tickers', params)

api_response = api_result.json()

print(f"Exchange Name = {api_response['data']['name']}") for ticker in api_response['data']['tickers']:

print(f"{ticker['name']}: {ticker['symbol']}") Exchange Name = NASDAQ Stock Exchange

Microsoft Corp: MSFT

Apple Inc: AAPL

Amazoncom Inc: AMZN

Alphabet Inc Class C: GOOG

Alphabet Inc Class A: GOOGL

Facebook Inc: FB

Vodafone Group Public Limited Company: VOD

Intel Corp: INTC

Comcast Corp: CMCSA

PepsiCo Inc: PEP

Adobe Systems Inc: ADBE

Cisco Systems Inc: CSCO

NVIDIA Corp: NVDA

Netflix Inc: NFLX

e ticket universe retrieval function is one of the most valuable functions of MarketStack. One of the first steps for all backtesting is determining the universe (that is, the complete list) of the stocks to trade. en, you restrict yourself to a subset of that list, for example, by trading only stocks with certain trends, or certain volumes.

Summary 211

Summary

In this chapter, we have outlined different ways to obtain financial and economic data in Python. In practice, you usually use multiple data sources at the same time. We explored the yahoofinancials Python library and saw single- and multiple-tickers retrievals.

We then explored the pandas_datareader Python library to access Yahoo Finance, EconDB, and Fed s Fred data and cache queries. We then explored the Quandl, IEX Cloud and MarketStack data sources.

In the next chapter, we introduce the backtesting library, Zipline, as well as the trading portfolio performance and risk analysis library, PyFolio.

댓글