Numeric Data

Calcbench extracts all of the GAAP numbers in section 8, face statments and footnotes, of the 10-K/Qs.

Standardized

Calcbench standardizes +1000 metrics to handle differences filers’s tagging. The list of stardized points is @ https://www.calcbench.com/home/standardizedmetrics

calcbench.standardized_data(company_identifiers=[], metrics=[], start_year=None, start_period=None, end_year=None, end_period=None, entire_universe=False, filing_accession_number=None, point_in_time=False, year=None, period=None, all_history=False, period_type=None, trace_hyperlinks=False, use_fiscal_period=False)

Standardized Data.

Metrics are standardized by economic concept and time period.

The data behind the multi-company page, https://www.calcbench.com/multi.

Parameters
  • company_identifiers (sequence) – Tickers/CIK codes. eg. [‘msft’, ‘goog’, ‘appl’, ‘0000066740’]

  • metrics (sequence) – Standardized metrics. Full list @ https://www.calcbench.com/home/standardizedmetrics eg. [‘revenue’, ‘accountsreceivable’]

  • start_year (int) – first year of data

  • start_period (int) – first quarter to get, for annual data pass 0, for quarters pass 1, 2, 3, 4

  • end_year (int) – last year of data

  • end_period (int) – last_quarter to get, for annual data pass 0, for quarters pass 1, 2, 3, 4

  • entire_universe (bool) – Get data for all companies, this can take a while, talk to Calcbench before you do this in production.

  • accession_id (int) – Filing Accession ID from the SEC’s Edgar system.

  • year (int) – Get data for a single year, defaults to annual data.

  • period_type (str) – Either “annual” or “quarterly”.

Returns

Dataframe with the periods as the index and columns indexed by metric and ticker

Return type

Dataframe

Usage:

>>> calcbench.standardized_data(company_identifiers=['msft', 'goog'], metrics=['revenue', 'assets'], all_history=True, period_type='annual')

Point-In-Time

Our standardized data with timestamps. Useful for backtesting quantitative strategies.

calcbench.point_in_time(company_identifiers=[], all_footnotes=False, update_date=None, metrics=[], all_history=False, entire_universe=False, start_year=None, start_period=None, end_year=None, end_period=None, period_type=None, use_fiscal_period=False, include_preliminary=False, all_face=False, include_xbrl=True)

Point-in-Time Data.

Standardized data with a timestamp when it was published by Calcbench

Usage:

>>> calcbench.point_in_time(company_identifiers=["msft", "goog"], all_history=True, all_face=True, all_footnotes=True)

https://github.com/calcbench/notebooks/blob/master/Point_In_Time_Face.ipynb

Raw XBRL Data

Data as reported in the XBRL documents

calcbench.raw_xbrl_raw(company_identifiers=[], entire_universe=False, clauses=[])

Data as reported in the XBRL documents

Parameters
  • company_identifiers (list(str)) – list of tickers or CIK codes

  • entire_universe (bool) – Search all companies

  • clauses (list(dict)) – a sequence of dictionaries which the data is filtered by. A clause is a dictionary with “value”, “parameter” and “operator” keys. See the parameters that can be passed @ https://www.calcbench.com/api/rawdataxbrlpoints

Usage:
>>> clauses = [
>>>     {"value": "Revenues", "parameter": "XBRLtag", "operator": 10},
>>>     {"value": "Y", "parameter": "fiscalPeriod", "operator": 1},
>>>     {"value": "2018", "parameter": "fiscalYear", "operator": 1}
>>> ]
>>> cb.raw_xbrl_raw(company_identifiers=['mmm'], clauses=clauses)

Dimensional

Segments: geographic and operating, and other dimensionalized tabular data.

calcbench.dimensional_raw(company_identifiers=None, metrics=[], start_year=None, start_period=None, end_year=None, end_period=None, period_type='annual')

Segments and Breakouts

The data behind the breakouts/segment page, https://www.calcbench.com/breakout.

Parameters
  • company_identifiers (sequence) – Tickers/CIK codes. eg. [‘msft’, ‘goog’, ‘appl’, ‘0000066740’]

  • metrics (sequence) – list of dimension tuple strings, get the list @ https://www.calcbench.com/api/availableBreakouts, pass in the “databaseName”

  • start_year (int) – first year of data to get

  • start_period (int) – first period of data to get. 0 for annual data, 1, 2, 3, 4 for quarterly data.

  • end_year (int) – last year of data to get

  • end_period (int) – last period of data to get. 0 for annual data, 1, 2, 3, 4 for quarterly data.

  • period_type (str) – ‘quarterly’ or ‘annual’, only applicable when other period data not supplied.

Returns

A list of points. The points correspond to the lines @ https://www.calcbench.com/breakout. For each requested metric there will be a the formatted value and the unformatted value denote bya _effvalue suffix. The label is the dimension label associated with the values.

Return type

sequence

Usage::
>>> cb.dimensional_raw(company_identifiers=['fdx'], metrics=['OperatingSegmentRevenue'], start_year=2018)