Downloaders¶
Utilities to get data in bulk
- calcbench.downloaders.iterate_and_save_pandas(arguments, f, file_name, write_index=True, columns=None, write_mode='w')¶
Apply arguments to a function that returns a DataFrame and save to a .csv file.
- Parameters:
arguments (
Sequence
[TypeVar
(T
)]) – Each item in this sequence will be passed to ff (
Callable
[[TypeVar
(T
)],DataFrame
]) – Function that generates a pandas dataframe that will be called on argumentsfile_name (
Union
[str
,Path
]) – Name of the file to writewrite_index (
bool
) – Write the pandas index to the csv filecolumns (
Optional
[Sequence
[str
]]) – which columns to write. If this is set the index is not writtenwrite_mode (
Literal
['w'
,'a'
]) – set the initial write mode. “a” to append, “w” to overwrite. Useful for resuming downloading.
Usage:
>>> %pip install calcbench-api-client[Pandas,Backoff,tqdm] >>> from calcbench.downloaders import iterate_and_save_pandas >>> import calcbench as cb >>> tickers = cb.tickers(entire_universe=True) >>> iterate_and_save_pandas( >>> arguments=tickers, >>> f=lambda ticker: cb.standardized(company_identifiers=[ticker], point_in_time=True), >>> file_name="calcbench_standardized_PIT.csv", >>> )
- calcbench.downloaders.iterate_and_save_parquet(arguments, f, root_path, partition_cols=['ticker'], write_mode='w', parquet_file=None, csv_root=None)¶
Apply the arguments to a function a save to a pyarrow dataset. :type arguments:
Sequence
[TypeVar
(T
)] :param arguments: Each item in this sequence will be passed to f :type f:Callable
[[TypeVar
(T
)],DataFrame
] :param f: Function that generates a pandas dataframe that will be called on arguments :type root_path:Union
[str
,Path
] :param root_path: folder in which to write the pyarrow dataset :param partion_cols: what to name the files in the dataset :type write_mode:Literal
['w'
,'a'
] :param write_mode: “w” to start by deleting the dataset directory, “a” to add files. :type parquet_file:Union
[str
,Path
,None
] :param parquet_file: If supplied, create a single parquet file after we have all the data. :type csv_root:Union
[str
,Path
,None
] :param csv_root: folder in which to write the data set as csv files.Usage:
>>> %pip install calcbench-api-client[Pandas,Backoff,tqdm,pyarrow] >>> tickers = sorted(cb.tickers(entire_universe=True), key=lambda ticker: hash(ticker)) # randomize the order so the time estimate is better >>> iterate_and_save_pyarrow_dataset( >>> arguments=tickers, >>> f=lambda ticker: cb.standardized(company_identifiers=[ticker], point_in_time=True), >>> root_path="~/standardized_PIT_arrow/", >>> partition_cols=["ticker"], >>> ) >>> # Read the dataset >>> import pyarrow.parquet as pq >>> import pyarrow.compute as pc >>> table = pq.read_table(<root_path>) >>> expr = pc.field("ticker") == "MSFT" >>> msft_data = table.filter(expr).to_pandas() >>> >>> # Write the data >>> pq.write_table(table, "C:/Users/andre/Downloads/standardized_data.parquet") >>> from pyarrow import csv >>> csv.write_csv(table, "C:/Users/andre/Downloads/entire_universe_standardized_PIT.csv")
- calcbench.downloaders.iterate_and_save_pyarrow_dataset(arguments, f, root_path, partition_cols=['ticker'], write_mode='w', parquet_file=None, csv_root=None)¶
Apply the arguments to a function a save to a pyarrow dataset. :type arguments:
Sequence
[TypeVar
(T
)] :param arguments: Each item in this sequence will be passed to f :type f:Callable
[[TypeVar
(T
)],DataFrame
] :param f: Function that generates a pandas dataframe that will be called on arguments :type root_path:Union
[str
,Path
] :param root_path: folder in which to write the pyarrow dataset :param partion_cols: what to name the files in the dataset :type write_mode:Literal
['w'
,'a'
] :param write_mode: “w” to start by deleting the dataset directory, “a” to add files. :type parquet_file:Union
[str
,Path
,None
] :param parquet_file: If supplied, create a single parquet file after we have all the data. :type csv_root:Union
[str
,Path
,None
] :param csv_root: folder in which to write the data set as csv files.Usage:
>>> %pip install calcbench-api-client[Pandas,Backoff,tqdm,pyarrow] >>> tickers = sorted(cb.tickers(entire_universe=True), key=lambda ticker: hash(ticker)) # randomize the order so the time estimate is better >>> iterate_and_save_pyarrow_dataset( >>> arguments=tickers, >>> f=lambda ticker: cb.standardized(company_identifiers=[ticker], point_in_time=True), >>> root_path="~/standardized_PIT_arrow/", >>> partition_cols=["ticker"], >>> ) >>> # Read the dataset >>> import pyarrow.parquet as pq >>> import pyarrow.compute as pc >>> table = pq.read_table(<root_path>) >>> expr = pc.field("ticker") == "MSFT" >>> msft_data = table.filter(expr).to_pandas() >>> >>> # Write the data >>> pq.write_table(table, "C:/Users/andre/Downloads/standardized_data.parquet") >>> from pyarrow import csv >>> csv.write_csv(table, "C:/Users/andre/Downloads/entire_universe_standardized_PIT.csv")
- calcbench.downloaders.iterate_to_dataframe(arguments, f)¶
Apply arguments to a function that returns a DataFrame append to a dataframe and return.
Usage:
>>> %pip install calcbench-api-client[Pandas,Backoff,tqdm] >>> from calcbench.downloaders import iterate_to_dataframe >>> import calcbench as cb >>> tickers = cb.tickers(entire_universe=True) >>> d = iterate_and_save_pandas( >>> tickers, >>> lambda ticker: cb.point_in_time( >>> all_face=True, >>> all_footnotes=False, >>> company_identifiers=[ticker], >>> all_history=True, >>> include_preliminary=True, >>> include_xbrl=True, >>> ), >>> )
- Return type:
DataFrame