7
Financial Market Data Access in Python
is chapter outlines several key market data sources, ranging from free to paid data sources. A more complete list of available resources can be obtained from https://github.com/wilsonfreitas/awesome-quant#data-sources.
e quality of algorithmic trading models’ signals fundamentally depends on the quality
of market data being analyzed. Has the market data been cleaned of erroneous records and is there a quality assurance process in place to rectify any errors as they occur? If there is a problem with the market data feed, how quickly can the data be corrected?
e following free data sources described are suitable for learning purposes, but not fit
for purpose as regards professional trading there may be a very low limit on the number of API calls per day, the APIs may be slow, and there is no support and no rectification of the data should it not be correct. In addition, when using any of these data providers, be aware of their terms of use.
In this chapter, we are going to cover the following key topics:
Exploring the yahoofinancials Python library
Exploring the pandas_datareader Python library
190 Financial Market Data Access in Python
Exploring the Quandl data source
Exploring the IEX Cloud data source
Exploring the MarketStack data source
Technical requirements
e Python code used in this chapter is available in the Chapter07/marketdata. ipynb notebook in the book’s code repository.
Exploring the yahoofinancials Python library
e yahoofinancials Python library provides free access to the market data available from Yahoo Finance, whose provider is ICE Data Services. e library repository is available at https://github.com/JECSand/yahoofinancials.
It provides access to historical and, for most assets, also real-time pricing data for the following:
Currencies
Indexes
Stocks
Commodities
ETFs
Mutual funds
US Treasuries
Cryptocurrencies
To find the right ticker, use the lookup at https://finance.yahoo.com/.
ere is a very strict limit on the number of calls per hour per IP address (about 1,000-2,000 requests per hour per IP address) and once you reach it, your IP address gets blocked for an extensive period of time. In addition, the functionality provided constantly changes.
Installation of the library is standard:
pip install yahoofinancials
Access to the data is very straightforward, as follows:
from yahoofinancials import YahooFinancials
e library supports both single-ticker retrieval and multiple-tickers retrieval.
Exploring the yahoofinancials Python library 191
Single-ticker retrieval
e steps regarding single-ticker retrieval are as follows:
- First, we define the AAPL ticker object:
aapl = yf.Ticker("AAPL")
- en, there is the issue of historical data retrieval. Let’s print all historical daily price data for the year of 2020:
hist = aapl.get_historical_price_data('2020-01-01', '2020-12-31', 'daily') print(hist)
e output starts with the following:
{'AAPL': {'eventsData': {'dividends': {'2020-02-07': {'amount': 0.1925, 'date': 1581085800, 'formatted_
date': '2020-02-07'}, '2020-05-08': {'amount': 0.205, 'date': 1588944600, 'formatted_date': '2020-05-08'}, '2020-08-07': {'amount': 0.205, 'date': 1596807000, 'formatted_date': '2020-08-07'}, '2020-11-06':
{'amount': 0.205, 'date': 1604673000, 'formatted_
date': '2020-11-06'}}, 'splits': {'2020-08-31':
{'date': 1598880600, 'numerator': 4, 'denominator': 1, 'splitRatio': '4:1', 'formatted_date': '2020-08-31'}}}, 'firstTradeDate': {'formatted_date': '1980-12-12',
'date': 345479400}, 'currency': 'USD', 'instrumentType': 'EQUITY', 'timeZone': {'gmtOffset': -18000}, 'prices': [{'date': 1577975400, 'high': 75.1500015258789, 'low': 73.79750061035156, 'open': 74.05999755859375, 'close': 75.0875015258789, 'volume': 135480400, 'adjclose': 74.4446029663086, 'formatted_date': '2020-01-02'},
{'date': 1578061800, 'high': 75.1449966430664,
'low': 74.125, 'open': 74.2874984741211, 'close': 74.35749816894531, 'volume': 146322800, 'adjclose': 73.72084045410156, 'formatted_date': '2020-01-03'}, {'date': 1578321000, 'high': 74.98999786376953,
'low': 73.1875, 'open': 73.44750213623047, 'close': 74.94999694824219, 'volume': 118387200, 'adjclose': 74.30826568603516, 'formatted_date': '2020-01-06'}, {'date': 1578407400, 'high': 75.2249984741211, 'low': 74.37000274658203, 'open': 74.95999908447266, 'close': 74.59750366210938, 'volume': 108872000, 'adjclose':
192 Financial Market Data Access in Python
73.95879364013672, 'formatted_date': '2020-01-07'}, {'date': 1578493800, 'high': 76.11000061035156, 'low': 74.29000091552734, 'open': 74.29000091552734, 'close': 75.79750061035156, 'volume': 132079200, 'adjclose': 75.14852142333984, 'formatted_date': '2020-01-08'}, {'date': 1578580200, 'high': 77.60749816894531,
'low': 76.55000305175781, 'open': 76.80999755859375, 'close': 77.40750122070312, 'volume': 170108400, 'adjclose': 76.7447280883789, 'formatted_date': '2020- 01-09'}, {'date': 1578666600, 'high': 78.1675033569336, 'low': 77.0625, 'open': 77.6500015258789, 'close': 77.5824966430664, 'volume': 140644800, 'adjclose': 76.91822052001953, 'formatted_date': '2020-01-10'}, {'date': 1578925800, 'high': 79.26750183105469, 'low': 77.7874984741211, 'open': 77.91000366210938, 'close': 79.23999786376953, 'volume': 121532000, 'adjclose': 78.56153106689453, 'formatted_date': '2020-01-13'}, {'date': 1579012200, 'high': 79.39250183105469,
'low': 78.0425033569336, 'open': 79.17500305175781, 'close': 78.16999816894531, 'volume': 161954400, 'adjclose': 77.50070190429688, 'formatted_date': '2020- 01-14'}, {'date': 1579098600, 'high': 78.875, 'low': 77.38749694824219, 'open': 77.9625015258789, 'close': 77.83499908447266, 'volume': 121923600, 'adjclose': 77.16856384277344, 'formatted_date': '2020-01-15'}, {'date': 1579185000, 'high': 78.92500305175781, 'low': 78.02249908447266, 'open': 78.39749908447266, 'close': 78.80999755859375, 'volume': 108829200, 'adjclose': 78.13522338867188, 'formatted_date': '2020-01-16'}, {'date': 1579271400, 'high': 79.68499755859375,
'low': 78.75, 'open': 79.06749725341797, 'close': 79.68250274658203, 'volume': 137816400, 'adjclose': 79.000244140625, 'formatted_date': '2020-01-17'},
{'date': 1579617000, 'high': 79.75499725341797,
'low': 79.0, 'open': 79.29750061035156, 'close': 79.14250183105469, 'volume': 110843200, 'adjclose': 78.46488189697266, 'formatted_date': '2020-01-21'}, {'date': 1579703400, 'high': 79.99749755859375, 'low': 79.32749938964844, 'open': 79.6449966430664, 'close': 79.42500305175781, 'volume': 101832400, 'adjclose': 78.74495697021484, 'formatted_date': '2020-01-22'}, ...
Note
You can change the frequency from 'daily' to 'weekly' or 'monthly'.
Exploring the yahoofinancials Python library 193
- Now, let’s inspect the weekly data results:
hist = aapl.get_historical_price_data('2020-01-01', '2020-12-31', 'weekly') print(hist)
e output is as follows:
{'AAPL': {'eventsData': {'dividends': {'2020-02-05': {'amount': 0.1925, 'date': 1581085800, 'formatted_
date': '2020-02-07'}, '2020-05-06': {'amount': 0.205, 'date': 1588944600, 'formatted_date': '2020-05-08'}, '2020-08-05': {'amount': 0.205, 'date': 1596807000, 'formatted_date': '2020-08-07'}, '2020-11-04': {'amount': 0.205, 'date': 1604673000, 'formatted_date': '2020-11- 06'}}, 'splits': {'2020-08-26': {'date': 1598880600, 'numerator': 4, 'denominator': 1, 'splitRatio': '4:1', 'formatted_date': '2020-08-31'}}}, 'firstTradeDate': {'formatted_date': '1980-12-12', 'date': 345479400}, 'currency': 'USD', 'instrumentType': 'EQUITY',
'timeZone': {'gmtOffset': -18000}, 'prices': [{'date': 1577854800, 'high': 75.2249984741211, 'low': 73.1875, 'open': 74.05999755859375, 'close': 74.59750366210938, 'volume': 509062400, 'adjclose': 73.95879364013672, 'formatted_date': '2020-01-01'}, {'date': 1578459600, 'high': 79.39250183105469, 'low': 74.29000091552734,
'open': 74.29000091552734, 'close': 78.16999816894531, 'volume': 726318800, 'adjclose': 77.50070190429688, 'formatted_date': '2020-01-08'}, {'date': 1579064400, 'high': 79.75499725341797, 'low': 77.38749694824219,
'open': 77.9625015258789, 'close': 79.14250183105469, 'volume': 479412400, 'adjclose': 78.46488189697266, 'formatted_date': '2020-01-15'}, {'date': 1579669200, 'high': 80.8324966430664, 'low': 76.22000122070312,
'open': 79.6449966430664, 'close': 79.42250061035156, 'volume': 677016000, 'adjclose': 78.74247741699219, 'formatted_date': '2020-01-22'}, {'date': 1580274000, 'high': 81.9625015258789, 'low': 75.55500030517578,
'open': 81.11250305175781, 'close': 79.7125015258789, 'volume': 853162800, 'adjclose': 79.02999877929688, 'formatted_date': '2020-01-29'}, {'date': 1580878800, 'high': 81.30500030517578, 'low': 78.4625015258789,
'open': 80.87999725341797, 'close': 79.90249633789062, 'volume': 545608400, 'adjclose': 79.21836853027344,
194 Financial Market Data Access in Python
'formatted_date': '2020-02-05'}, {'date': 1581483600, 'high': 81.80500030517578, 'low': 78.65249633789062, 'open': 80.36750030517578, 'close': 79.75, 'volume': 441122800, 'adjclose': 79.25482177734375, 'formatted_ date': '2020-02-12'}, {'date': 1582088400, 'high': 81.1624984741211, 'low': 71.53250122070312, 'open': 80.0, 'close': 72.0199966430664, 'volume': 776972800, 'adjclose': 71.57282257080078, 'formatted_date': '2020- 02-19'}, {'date': 1582693200, 'high': 76.0, 'low': 64.09249877929688, 'open': 71.63249969482422, 'close': 72.33000183105469, 'volume': 1606418000, 'adjclose': 71.88089752197266, 'formatted_date': '2020-02-
26'}, {'date': 1583298000, 'high': 75.8499984741211, 'low': 65.75, 'open': 74.11000061035156, 'close': 71.33499908447266, 'volume': 1204962800, 'adjclose': 70.89207458496094, 'formatted_date': '2020-03-04'}, {'date': 1583899200, 'high': 70.3050003051757 ...
- en, we check the monthly data results:
hist = aapl.get_historical_price_data('2020-01-01', '2020-12-31', 'monthly') print(hist)
e output is as follows:
{'AAPL': {'eventsData': {'dividends': {'2020-05-01': {'amount': 0.205, 'date': 1588944600, 'formatted_
date': '2020-05-08'}, '2020-08-01': {'amount': 0.205, 'date': 1596807000, 'formatted_date': '2020-08-07'}, '2020-02-01': {'amount': 0.1925, 'date': 1581085800, 'formatted_date': '2020-02-07'}, '2020-11-01': {'amount': 0.205, 'date': 1604673000, 'formatted_date': '2020-11- 06'}}, 'splits': {'2020-08-01': {'date': 1598880600, 'numerator': 4, 'denominator': 1, 'splitRatio': '4:1', 'formatted_date': '2020-08-31'}}}, 'firstTradeDate': {'formatted_date': '1980-12-12', 'date': 345479400}, 'currency': 'USD', 'instrumentType': 'EQUITY',
'timeZone': {'gmtOffset': -18000}, 'prices': [{'date': 1577854800, 'high': 81.9625015258789, 'low': 73.1875, 'open': 74.05999755859375, 'close': 77.37750244140625, 'volume': 2934370400, 'adjclose': 76.7149887084961, 'formatted_date': '2020-01-01'}, {'date': 1580533200, 'high': 81.80500030517578, 'low': 64.09249877929688,
Exploring the yahoofinancials Python library 195
'open': 76.07499694824219, 'close': 68.33999633789062, 'volume': 3019851200, 'adjclose': 67.75486755371094, 'formatted_date': '2020-02-01'}, {'date': 1583038800, 'high': 76.0, 'low': 53.15250015258789, 'open': 70.56999969482422, 'close': 63 ...
- e nested JSON can easily be converted to a pandas’ DataFrame: import pandas as pd
hist_df = \
pd.DataFrame(hist['AAPL']['prices']).drop('date', axis=1).set_index('formatted_date')
print(hist_df) e output is as follows:
Figure 7.1 Nested JSON converted to a pandas’ DataFrame
196 Financial Market Data Access in Python
Notice the two columns adjclose and close. e adjusted close is the close price adjusted for dividends, stock splits, and other corporate events.
Real-time data retrieval
To get real-time stock price data, use the get_stock_price_data() function:
print(aapl.get_stock_price_data())
e output is as follows:
{'AAPL': {'quoteSourceName': 'Nasdaq Real Time Price', 'regularMarketOpen': 137.35, 'averageDailyVolume3Month': 107768827, 'exchange': 'NMS', 'regularMarketTime':
'2021-02-06 03:00:02 UTC+0000', 'volume24Hr': None, 'regularMarketDayHigh': 137.41, 'shortName': 'Apple Inc.', 'averageDailyVolume10Day': 115373562, 'longName': 'Apple
Inc.', 'regularMarketChange': -0.42500305, 'currencySymbol':
'$', 'regularMarketPreviousClose': 137.185, 'postMarketTime': '2021-02-06 06:59:58 UTC+0000', 'preMarketPrice':
None, 'exchangeDataDelayedBy': 0, 'toCurrency': None, 'postMarketChange': -0.0800018, 'postMarketPrice': 136.68, 'exchangeName': 'NasdaqGS', 'preMarketChange': None, 'circulatingSupply': None, 'regularMarketDayLow': 135.86, 'priceHint': 2, 'currency': 'USD', 'regularMarketPrice':
136.76, 'regularMarketVolume': 72317009, 'lastMarket': None, 'regularMarketSource': 'FREE_REALTIME', 'openInterest': None, 'marketState': 'CLOSED', 'underlyingSymbol': None, 'marketCap': 2295940513792, 'quoteType': 'EQUITY', 'volumeAllCurrencies': None, 'postMarketSource': 'FREE_REALTIME', 'strikePrice':
None, 'symbol': 'AAPL', 'postMarketChangePercent':
-0.00058498, 'preMarketSource': 'FREE_REALTIME', 'maxAge':
1, 'fromCurrency': None, 'regularMarketChangePercent': -0.0030980287}}
Real-time data for free data sources is usually delayed by 10 to 30 minutes.
Exploring the yahoofinancials Python library 197
As regards the retrieval of financial statements, let’s get financial statements for Apple’s stock the income statement, cash flow, and balance sheet:
statements = aapl.get_financial_stmts('quarterly',
['income', 'cash', 'balance']) print(statements)
e output is as follows:
{'incomeStatementHistoryQuarterly': {'AAPL': [{'2020-
12-26': {'researchDevelopment': 5163000000, 'effectOfAccountingCharges': None, 'incomeBeforeTax': 33579000000, 'minorityInterest': None, 'netIncome': 28755000000, 'sellingGeneralAdministrative': 5631000000, 'grossProfit': 44328000000, 'ebit': 33534000000, 'operatingIncome': 33534000000, 'otherOperatingExpenses': None, 'interestExpense': -638000000, 'extraordinaryItems': None, 'nonRecurring': None, 'otherItems': None, 'incomeTaxExpense': 4824000000, 'totalRevenue': 111439000000, 'totalOperatingExpenses': 77905000000, 'costOfRevenue': 67111000000, 'totalOtherIncomeExpenseNet': 45000000, 'discontinuedOperations': None, 'netIncomeFromContinuingOps': 28755000000, 'netIncomeApplicableToCommonShares':
28755000000}}, {'2020-09-26': {'researchDevelopment': 4978000000, 'effectOfAccountingCharges': None,
'incomeBeforeTax': 14901000000, 'minorityInterest': None, 'netIncome': 12673000000, 'sellingGeneralAdministrative': 4936000000, 'grossProfit': ...
ere are multiple uses of financial statement data in relation to algorithmic trading.
First, it can be used to determine the totality of stocks to trade in. Second, the creation of algorithmic trading signals from non-price data adds additional value.
Summary data retrieval
Summary data is accessible via the get_summary_data method:
print(aapl.get_summary_data())
198 Financial Market Data Access in Python
e output is as follows:
{'AAPL': {'previousClose': 137.185, 'regularMarketOpen': 137.35, 'twoHundredDayAverage': 119.50164, 'trailingAnnualDividendYield': 0.0058825673, 'payoutRatio': 0.2177, 'volume24Hr': None, 'regularMarketDayHigh': 137.41, 'navPrice': None, 'averageDailyVolume10Day': 115373562, 'totalAssets': None, 'regularMarketPreviousClose': 137.185, 'fiftyDayAverage': 132.86455, 'trailingAnnualDividendRate': 0.807, 'open': 137.35, 'toCurrency': None,
'averageVolume10days': 115373562, 'expireDate': '-',
'yield': None, 'algorithm': None, 'dividendRate':
0.82, 'exDividendDate': '2021-02-05', 'beta':
1.267876, 'circulatingSupply': None, 'startDate': '-', 'regularMarketDayLow': 135.86, 'priceHint': 2, 'currency': 'USD', 'trailingPE': 37.092484, 'regularMarketVolume': 72317009, 'lastMarket': None, 'maxSupply': None,
'openInterest': None, 'marketCap': 2295940513792, 'volumeAllCurrencies': None, 'strikePrice': None, 'averageVolume': 107768827, 'priceToSalesTrailing12Months': 7.805737, 'dayLow': 135.86, 'ask': 136.7, 'ytdReturn': None, 'askSize': 1100, 'volume': 72317009, 'fiftyTwoWeekHigh': 145.09, 'forwardPE': 29.410751, 'maxAge': 1, 'fromCurrency': None, 'fiveYearAvgDividendYield': 1.44, 'fiftyTwoWeekLow': 53.1525, 'bid': 136.42, 'tradeable': False, 'dividendYield': 0.0061000003, 'bidSize': 2900, 'dayHigh': 137.41}}
Summary data retrieved using this function is a summary of the financial statements function and the real-time data function.
Multiple-tickers retrieval
Multiple-tickers retrieval, also known as a bulk retrieval, is far more efficient and faster than single-ticker retrieval since most of the time associated with each download request is spent on establishing and closing the network connection.
Exploring the yahoofinancials Python library 199
Historical data retrieval
Let’s retrieve the historical prices for these FX pairs: EURCHF, USDEUR, and GBPUSD:
currencies = YahooFinancials(['EURCHF=X', 'USDEUR=X', 'GBPUSD=x']) print(currencies.get_historical_price_data('2020-01-01', '2020-12-31', 'weekly'))
e output is as follows:
{'EURCHF=X': {'eventsData': {}, 'firstTradeDate': {'formatted_ date': '2003-01-23', 'date': 1043280000}, 'currency': 'CHF', 'instrumentType': 'CURRENCY', 'timeZone': {'gmtOffset': 0}, 'prices': [{'date': 1577836800, 'high': 1.0877000093460083, 'low': 1.0818699598312378, 'open': 1.0872000455856323, 'close': 1.084280014038086, 'volume': 0, 'adjclose': 1.084280014038086, 'formatted_date': '2020-01-01'}, {'date': 1578441600,
'high': 1.083299994468689, 'low': 1.0758999586105347, 'open': 1.080530047416687, 'close': 1.0809999704360962, 'volume':
0, 'adjclose': 1.0809999704360962, 'formatted_date': '2020- 01-08'}, {'date': 1579046400, 'high': 1.0774999856948853,
'low': 1.0729299783706665, 'open': 1.076300024986267,
'close': 1.0744800567626953, 'volume': 0, 'adjclose': 1.0744800567626953, 'formatted_date': '2020-01-15'},
{'date': 1579651200, 'high': 1.0786099433898926, 'low': 1.0664700269699097, 'open': 1.0739500522613525, 'close': 1.068600058555603, 'volume': 0, 'adjclose': 1.068600058555603, 'formatted_date': '2020-01-22'}, {'date': 1580256000, 'high': 1.0736199617385864, 'low': 1.0663000345230103, 'open': 1.0723999738693237, 'close': 1.0683200359344482, 'volume': 0, 'adjclose': 1.068320035 ...
We see that the historical data does not contain any data from the financial statements.
e full list of methods supported by the library at the time of writing this book is as follows:
get_200day_moving_avg()
get_50day_moving_avg()
get_annual_avg_div_rate()
get_annual_avg_div_yield()
200 Financial Market Data Access in Python
get_book_value()
get_cost_of_revenue()
get_currency()
get_current_change() get_current_percent_change() get_current_price()
get_current_volume()
get_daily_dividend_data(start_date, end_date) get_daily_high()
get_daily_low()
get_dividend_rate()
get_dividend_yield()
get_earnings_per_share()
get_ebit()
get_exdividend_date()
get_financial_stmts(frequency, statement_type, reformat=True)
get_five_yr_avg_div_yield() get_gross_profit()
get_historical_price_data(start_date, end_date, time_ interval)
get_income_before_tax()
get_income_tax_expense()
get_interest_expense()
get_key_statistics_data()
get_market_cap()
get_net_income() get_net_income_from_continuing_ops() get_num_shares_outstanding(price_type='current') get_open_price()
get_operating_income()
Exploring the pandas_datareader Python library 201
get_payout_ratio()
get_pe_ratio()
get_prev_close_price() get_price_to_sales() get_research_and_development() get_stock_earnings_data(reformat=True) get_stock_exchange() get_stock_price_data(reformat=True) get_stock_quote_type_data() get_summary_data(reformat=True) get_ten_day_avg_daily_volume() get_three_month_avg_daily_volume() get_total_operating_expense() get_total_revenue()
get_yearly_high()
get_yearly_low()
We will explore the pandas_datareader library in the next section.
Exploring the pandas_datareader Python library
pandas_datareader is one of the most advanced libraries for financial data and offers access to multiple data sources.
Some of the data sources supported are as follows:
Yahoo Finance
e Federal Reserve Bank of St Louis’ FRED IEX
Quandl
Kenneth French’s data library
202 Financial Market Data Access in Python
OECD
Eurostat
Econdb
Nasdaq Trader symbol definitions
Refer to https://pandas-datareader.readthedocs.io/en/latest/ remote_data.html for a full list.
Installation is simple:
pip install pandas-datareader
Let’s now set up the basic data retrieval parameters:
from pandas_datareader import data start_date = '2010-01-01'
end_date = '2020-12-31'
e general access method for downloading the data is data.DataReader(ticker, data_source, start_date, end_date).
Access to Yahoo Finance
Let’s download the last 10 years’ worth of Apple stock prices:
aapl = data.DataReader('AAPL', 'yahoo', start_date, end_date)
aapl
High Low Open Close Volume Adj Close Date
2010-01-
04 7.660714 7.585000 7.622500 7.643214 493729600.0 6.593426
2010-01-
05 7.699643 7.616071 7.664286 7.656428 601904800.0 6.604825
2010-01-
06 7.686786 7.526786 7.656428 7.534643 552160000.0 6.499768
2010-01-
07 7.571429 7.466072 7.562500 7.520714 477131200.0 6.487752
2010-01-
08 7.571429 7.466429 7.510714 7.570714 447610800.0 6.530883
Exploring the pandas_datareader Python library 203
... ... ... ... ... ... ...
2020-12
-21 128.309998 123.449997 125.019997 128.229996 121251600.0 128.229996
2020-12-22 134.410004 129.649994 131.610001 131.880005 168904800.0 131.880005
2020-12-23 132.429993 130.779999 132.160004 130.960007 88223700.0 130.960007
2020-12-24 133.460007 131.100006 131.320007 131.970001 54930100.0 131.970001
2020-12-28 137.339996 133.509995 133.990005 136.690002 124182900.0 136.690002
e output is virtually identical to the output from the yahoofinancials library in the preceding section.
Access to EconDB
e list of available tickers is available at https://www.econdb.com/main- indicators.
Let’s download the time series of monthly oil production in the US for the last 10 years:
oilprodus = data.DataReader('ticker=OILPRODUS', 'econdb', start_date, end_date) oilprodus
Reference Area United States of America
Energy product Crude oil
Flow breakdown Production
Unit of measure Thousand Barrels per day (kb/d) TIME_PERIOD
2010-01-01 5390
2010-02-01 5548
2010-03-01 5506
2010-04-01 5383
2010-05-01 5391
... ...
2020-04-01 11990
204 Financial Market Data Access in Python
2020-06-01 10436 2020-07-01 10984
2020-08-01 10406
Each data source has different output columns.
Access to the Federal Reserve Bank of St Louis' FRED
e list of available data, along with tickers, can be inspected at https://fred. stlouisfed.org/.
Let’s download the last 10 years of real gross domestic product of the USA:
import pandas as pd pd.set_option('display.max_rows', 2)
gdp = data.DataReader('GDP', 'fred', start_date, end_date) gdp
We restricted the output to just two rows:
GDP DATE 2010-01-01 14721.350 ... ... 2020-07-01 21170.252 43 rows × 1 columns
Now, let’s study 5 years of the 20-year constant maturity yields on U.S. government bonds:
gs10 = data.get_data_fred('GS20') gs10
GS20
DATE
2016-01-01 2.49
... ...
2020-11-01 1.40
59 rows × 1 columns
e Federal Reserve Bank of St Louis’ FRED data is one of the cleanest data sources available, offering complimentary support.
Exploring the pandas_datareader Python library 205
Caching queries
One of the key advantages of the library is its implementation of caching the results of queries, thereby saving bandwidth, speeding up code execution, and preventing the banning of IPs due to the overuse of APIs.
By way of an example, let’s download the entire history of Apple stock:
import datetime
import requests_cache
session = \ requests_cache.CachedSession(cache_name='cache', backend='sqlite',
expire_after = \ datetime.timedelta(days=7))
aapl_full_history = \ data.DataReader("AAPL",'yahoo',datetime.datetime(1980,1,1), datetime.datetime(2020, 12, 31), session=session)
aapl_full_history
High Low Open Close Volume Adj Close
Date
1980-12-
12 0.128906 0.128348 0.128348 0.128348 469033600.0 0.101087
... ... ... ... ... ... ...
2020-12-28 137.339996 133.509995 133.990005 136.690002 124182900.0 136.690002
Let’s now access just one data point:
aapl_full_history.loc['2013-01-07']
High 18.903572
...
Adj Close 16.284145
Name: 2013-01-07 00:00:00, Length: 6, dtype: float64
Caching can be enabled for all previous examples, too.
206 Financial Market Data Access in Python
Exploring the Quandl data source
Quandl is one of the largest repositories of economic/financial data on the internet. Its data sources can be accessed free of charge. It also offers premium data sources, for which there is a charge.
Installation is straightforward:
pip install quandl
To access the data, you have to provide an access key (apply for one at https://quandl.com):
import quandl
quandl.ApiConfig.api_key = 'XXXXXXX'
To find a ticker and data source, use https://www.quandl.com/search. Let’s now download the Monthly average consumer prices in
metropolitan France - Apples (1 Kg); EUR data:
papple = quandl.get('ODA/PAPPLE_USD') papple
Value
Date
1998-01-31 1.735999
... ...
2020-11-30 3.350000
275 rows × 1 columns
Let’s now download Apple’s fundamental data:
aapl_fundamental_data = quandl.get_table('ZACKS/FC', ticker='AAPL')
m_ticker ticker comp_name comp_name_2 exchange currency_ code per_end_date per_type per_code per_fisc_
year ... stock_based_compsn_qd cash_flow_oper_activity_
qd net_change_prop_plant_equip_qd comm_stock_div_paid_
qd pref_stock_div_paid_qd tot_comm_pref_stock_div_qd wavg_ shares_out wavg_shares_out_diluted eps_basic_net eps_ diluted_net
Exploring the IEX Cloud data source 207
0 AAPL AAPL APPLE INC Apple Inc. NSDQ USD 2018-09-30 A None 2018 ... NaN NaN NaN NaN None NaN 19821.51 20000.44 3.000 2.980
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4 AAPL AAPL APPLE INC Apple Inc. NSDQ USD 2018-12-31 Q None 2019 ... 1559.0 26690.0 -3355.0 -3568.0 None -3568.0 18943.28 19093.01 1.055 1.045
5 rows × 249 columns
e difference between Yahoo and Quandl data is that the Quandl data is more reliable
and more complete.
Exploring the IEX Cloud data source
IEX Cloud is one of the commercial offerings. It offers a plan for individuals at USD 9 per month. It also offers a free plan, with a limit of 50,000 API calls per month.
e installation of the Python library is standard:
pip install iexfinance
e full library’s documentation is available at https://addisonlynch.github. io/iexfinance/stable/index.html.
e following code is designed to retrieve all symbols:
from iexfinance.refdata import get_symbols get_symbols(output_format='pandas', token="XXXXXX")
symbol exchange exchangeSuffix exchangeName name date type iexId region currency isEnabled figi cik lei
0 A NYS UN NEW YORK STOCK EXCHANGE, INC. Agilent Technologies Inc. 2020-12-29 cs IEX_46574843354B2D52 US USD True BBG000C2V3D6 0001090872 QUIX8Y7A2WP0XRMW7G29
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9360 ZYXI NAS NASDAQ CAPITAL MARKET Zynex Inc 2020-12-29 cs IEX_4E464C4C4A462D52 US USD True BBG000BJBXZ2
0000846475 None
208 Financial Market Data Access in Python
e following code is designed to obtain Apple’s balance sheet (not available for free accounts):
from iexfinance.stocks import Stock aapl = Stock("aapl", token="XXXXXX") aapl.get_balance_sheet()
e following code is designed to get the current price (not available for free accounts):
aapl.get_price()
e following code is designed to get the sector performance report (not available for free accounts):
from iexfinance.stocks import get_sector_performance get_sector_performance(output_format='pandas',
token =token)
e following code is designed to get historical market data for Apple:
from iexfinance.stocks import get_historical_data
get_historical_data("AAPL", start="20190101", end="20200101",
output_format='pandas', token=token)
close high low open symbol volume id key subkey updated ... uLow uVolume fOpen fClose fHigh fLow fVolume label change changePercent
2019-01-02 39.48 39.7125 38.5575 38.7225 AAPL 148158948 HISTORICAL_PRICES AAPL 1606830572000
... 154.23 37039737 37.8227 38.5626 38.7897 37.6615 148158948 Jan 2, 19 0.045 0.0011
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2019-12-31 73.4125 73.42 72.38 72.4825 AAPL 100990500 HISTORICAL_PRICES AAPL 1606830572000 ... 289.52 25247625 71.8619 72.7839 72.7914 71.7603 100990500
Dec 31, 19 0.5325 0.0073
252 rows × 25 columns
We can see that each data source offers a slightly different set of output columns.
Exploring the MarketStack data source 209
Exploring the MarketStack data source
MarketStack offers an extensive database of real-time, intra-day, and historical market data across major global stock exchanges. It offers free access for up to 1,000 monthly API requests.
While there is no official MarketStack Python library, the REST JSON API provides comfortable access to all its data in Python.
Let’s download the adjusted close data for Apple:
import requests
params = {
'access_key': 'XXXXX' }
api_result = \
requests.get('http://api.marketstack.com/v1/tickers/aapl/eod', params)
api_response = api_result.json()
print(f"Symbol = {api_response['data']['symbol']}") for eod in api_response['data']['eod']:
print(f"{eod['date']}: {eod['adj_close']}") Symbol = AAPL
2020-12-28T00:00:00+0000: 136.69 2020-12-24T00:00:00+0000: 131.97 2020-12-23T00:00:00+0000: 130.96 2020-12-22T00:00:00+0000: 131.88 2020-12-21T00:00:00+0000: 128.23 2020-12-18T00:00:00+0000: 126.655 2020-12-17T00:00:00+0000: 128.7 2020-12-16T00:00:00+0000: 127.81 2020-12-15T00:00:00+0000: 127.88 2020-12-14T00:00:00+0000: 121.78 2020-12-11T00:00:00+0000: 122.41 2020-12-10T00:00:00+0000: 123.24 2020-12-09T00:00:00+0000: 121.78
210 Financial Market Data Access in Python
2020-12-08T00:00:00+0000: 124.38 2020-12-07T00:00:00+0000: 123.75 2020-12-04T00:00:00+0000: 122.25
Let’s now download all tickers on the Nasdaq stock exchange:
requests.get('http://api.marketstack.com/v1/exchanges/XNAS/ tickers', params)
api_response = api_result.json()
print(f"Exchange Name = {api_response['data']['name']}") for ticker in api_response['data']['tickers']:
print(f"{ticker['name']}: {ticker['symbol']}") Exchange Name = NASDAQ Stock Exchange
Microsoft Corp: MSFT
Apple Inc: AAPL
Amazoncom Inc: AMZN
Alphabet Inc Class C: GOOG
Alphabet Inc Class A: GOOGL
Facebook Inc: FB
Vodafone Group Public Limited Company: VOD
Intel Corp: INTC
Comcast Corp: CMCSA
PepsiCo Inc: PEP
Adobe Systems Inc: ADBE
Cisco Systems Inc: CSCO
NVIDIA Corp: NVDA
Netflix Inc: NFLX
e ticket universe retrieval function is one of the most valuable functions of MarketStack. One of the first steps for all backtesting is determining the universe (that is, the complete list) of the stocks to trade. en, you restrict yourself to a subset of that list, for example, by trading only stocks with certain trends, or certain volumes.
Summary 211
Summary
In this chapter, we have outlined different ways to obtain financial and economic data in Python. In practice, you usually use multiple data sources at the same time. We explored the yahoofinancials Python library and saw single- and multiple-tickers retrievals.
We then explored the pandas_datareader Python library to access Yahoo Finance, EconDB, and Fed s Fred data and cache queries. We then explored the Quandl, IEX Cloud and MarketStack data sources.
In the next chapter, we introduce the backtesting library, Zipline, as well as the trading portfolio performance and risk analysis library, PyFolio.
'Finance' 카테고리의 다른 글
Fundamental Algorithmic Trading Strategies (0) | 2023.03.20 |
---|---|
Introduction to Zipline and PyFolio (0) | 2023.03.20 |
Statistical Estimation, Inference, and Prediction (0) | 2023.03.20 |
Data Visualization Using Matplotlib (0) | 2023.03.20 |
Data Manipulation and Analysis with pandas (0) | 2023.03.20 |
댓글