Downloaders

Utilities to get data in bulk

calcbench.downloaders.iterate_and_save_pandas(arguments, f, file_name, write_index=True, columns=None, write_mode='w')

Apply arguments to a function that returns a DataFrame and save to a .csv file.

Parameters:
  • arguments (Sequence[TypeVar(T)]) – Each item in this sequence will be passed to f

  • f (Callable[[TypeVar(T)], DataFrame]) – Function that generates a pandas dataframe that will be called on arguments

  • file_name (Union[str, Path]) – Name of the file to write

  • write_index (bool) – Write the pandas index to the csv file

  • columns (Optional[Sequence[str]]) – which columns to write. If this is set the index is not written

  • write_mode (Literal['w', 'a']) – set the initial write mode. “a” to append, “w” to overwrite. Useful for resuming downloading.

Usage:

>>> %pip install calcbench-api-client[Pandas,Backoff,tqdm]
>>> from calcbench.downloaders import iterate_and_save_pandas
>>> import calcbench as cb
>>> tickers = cb.tickers(entire_universe=True)
>>> iterate_and_save_pandas(
>>>    arguments=tickers,
>>>    f=lambda ticker: cb.standardized(company_identifiers=[ticker], point_in_time=True),
>>>    file_name="calcbench_standardized_PIT.csv",
>>> )
calcbench.downloaders.iterate_and_save_parquet(arguments, f, root_path, partition_cols=['ticker'], write_mode='w', parquet_file=None, csv_root=None)

Apply the arguments to a function a save to a pyarrow dataset. :type arguments: Sequence[TypeVar(T)] :param arguments: Each item in this sequence will be passed to f :type f: Callable[[TypeVar(T)], DataFrame] :param f: Function that generates a pandas dataframe that will be called on arguments :type root_path: Union[str, Path] :param root_path: folder in which to write the pyarrow dataset :param partion_cols: what to name the files in the dataset :type write_mode: Literal['w', 'a'] :param write_mode: “w” to start by deleting the dataset directory, “a” to add files. :type parquet_file: Union[str, Path, None] :param parquet_file: If supplied, create a single parquet file after we have all the data. :type csv_root: Union[str, Path, None] :param csv_root: folder in which to write the data set as csv files.

Usage:

>>> %pip install calcbench-api-client[Pandas,Backoff,tqdm,pyarrow]
>>> tickers = sorted(cb.tickers(entire_universe=True), key=lambda ticker: hash(ticker)) # randomize the order so the time estimate is better
>>> iterate_and_save_pyarrow_dataset(
>>>     arguments=tickers,
>>>     f=lambda ticker: cb.standardized(company_identifiers=[ticker], point_in_time=True),
>>>     root_path="~/standardized_PIT_arrow/",
>>>     partition_cols=["ticker"],
>>> )
>>> # Read the dataset
>>> import pyarrow.parquet as pq
>>> import pyarrow.compute as pc
>>> table = pq.read_table(<root_path>)
>>> expr = pc.field("ticker") == "MSFT"
>>> msft_data = table.filter(expr).to_pandas()
>>>
>>> # Write the data
>>> pq.write_table(table, "C:/Users/andre/Downloads/standardized_data.parquet")
>>> from pyarrow import csv
>>> csv.write_csv(table, "C:/Users/andre/Downloads/entire_universe_standardized_PIT.csv")
calcbench.downloaders.iterate_and_save_pyarrow_dataset(arguments, f, root_path, partition_cols=['ticker'], write_mode='w', parquet_file=None, csv_root=None)

Apply the arguments to a function a save to a pyarrow dataset. :type arguments: Sequence[TypeVar(T)] :param arguments: Each item in this sequence will be passed to f :type f: Callable[[TypeVar(T)], DataFrame] :param f: Function that generates a pandas dataframe that will be called on arguments :type root_path: Union[str, Path] :param root_path: folder in which to write the pyarrow dataset :param partion_cols: what to name the files in the dataset :type write_mode: Literal['w', 'a'] :param write_mode: “w” to start by deleting the dataset directory, “a” to add files. :type parquet_file: Union[str, Path, None] :param parquet_file: If supplied, create a single parquet file after we have all the data. :type csv_root: Union[str, Path, None] :param csv_root: folder in which to write the data set as csv files.

Usage:

>>> %pip install calcbench-api-client[Pandas,Backoff,tqdm,pyarrow]
>>> tickers = sorted(cb.tickers(entire_universe=True), key=lambda ticker: hash(ticker)) # randomize the order so the time estimate is better
>>> iterate_and_save_pyarrow_dataset(
>>>     arguments=tickers,
>>>     f=lambda ticker: cb.standardized(company_identifiers=[ticker], point_in_time=True),
>>>     root_path="~/standardized_PIT_arrow/",
>>>     partition_cols=["ticker"],
>>> )
>>> # Read the dataset
>>> import pyarrow.parquet as pq
>>> import pyarrow.compute as pc
>>> table = pq.read_table(<root_path>)
>>> expr = pc.field("ticker") == "MSFT"
>>> msft_data = table.filter(expr).to_pandas()
>>>
>>> # Write the data
>>> pq.write_table(table, "C:/Users/andre/Downloads/standardized_data.parquet")
>>> from pyarrow import csv
>>> csv.write_csv(table, "C:/Users/andre/Downloads/entire_universe_standardized_PIT.csv")
calcbench.downloaders.iterate_to_dataframe(arguments, f)

Apply arguments to a function that returns a DataFrame append to a dataframe and return.

Usage:

>>> %pip install calcbench-api-client[Pandas,Backoff,tqdm]
>>> from calcbench.downloaders import iterate_to_dataframe
>>> import calcbench as cb
>>> tickers = cb.tickers(entire_universe=True)
>>> d = iterate_and_save_pandas(
>>>    tickers,
>>>    lambda ticker: cb.point_in_time(
>>>        all_face=True,
>>>        all_footnotes=False,
>>>        company_identifiers=[ticker],
>>>        all_history=True,
>>>        include_preliminary=True,
>>>        include_xbrl=True,
>>>    ),
>>> )
Return type:

DataFrame