Welcome to Cached Historical Data Fetcher documentation!

Installation & Usage

Cached Historical Data Fetcher

CI Status Documentation Status Test coverage percentage

Poetry black pre-commit

PyPI Version Supported Python versions License

Python utility for fetching any historical data using caching. Suitable for acquiring data that is added frequently and incrementally, e.g. news, posts, weather, etc.

Installation

Install this via pip (or your favourite package manager):

pip install cached-historical-data-fetcher

Features

  • Uses cache built on top of joblib, lz4 and aiofiles.

  • Ready to use with asyncio, aiohttp, aiohttp-client-cache. Uses asyncio.gather for fetching chunks in parallel. (For performance reasons, only using aiohttp-client-cache is probably not a good idea when fetching large number of chunks (web requests).)

  • Based on pandas and supports MultiIndex.

Usage

Override get_one method to fetch data for one chunk. update method will call get_one for each chunk and concatenate results.

from cached_historical_data_fetcher import HistoricalDataCacheWithFixedChunk
from pandas import DataFrame, Timedelta, Timestamp

class MyCacheWithFixedChunk(HistoricalDataCacheWithFixedChunk):
    delay_seconds: float = 0 # delay between chunks
    interval: Timedelta = Timedelta(days=1) # chunk interval
    start_init: Timestamp = Timestamp.utcnow().floor("10D") # start date

    async def get_one(self, start: Timestamp, *args: Any, **kwargs: Any) -> DataFrame:
        return DataFrame({"day": [start.day]}, index=[start])

df = await MyCacheWithFixedChunk().update()
                           day
2023-09-30 00:00:00+00:00   30
2023-10-01 00:00:00+00:00    1
2023-10-02 00:00:00+00:00    2

See example.ipynb for real-world example.

Contributors ✨

Thanks goes to these wonderful people (emoji key):

This project follows the all-contributors specification. Contributions of any kind welcome!