This will serve as a living collection of planned improvements over the next year. It is an expanded version of the Roadmap from README.md.
Performance and Scaling
- Memory Optimization: Use
numpy.memmap for handling large datasets within simulation methods, allowing parts of the data to be loaded on demand, reducing memory overhead. Opt for in-place operations (+=, *=) in numerical computations to avoid unnecessary data duplication and to minimize peak memory usage.
- Profiling for Optimization: Utilize Python profiling tools such as
cProfile and memray to identify performance bottlenecks. Analyze time complexity of critical functions and optimize by either improving algorithmic approaches or by utilizing more efficient data structures.
- Big Data Integration: Integrate with distributed computing frameworks like Apache Spark or Dask by adapting the
time_series_simulator.py module to partition data processing across multiple nodes.
Tuning and Automation
- Adaptive Block Length: Develop algorithms in
block_resampler.py that adjust block sizes dynamically based on the autocorrelation properties of the input data, optimizing the balance between bias and variance in bootstrap samples.
- Fractional Block Length: Modify the block length handling logic to accept and correctly process fractional lengths, providing finer granularity in block resampling.
- Adaptive Resampling: Implement adaptive resampling methods that modify the sampling technique based on real-time analysis of the dataset’s variance and skewness to improve the representativeness of bootstrap samples.
- Feedback-Driven Accuracy: Establish feedback loops in
bootstrap.py that compare statistical properties of the original and bootstrapped datasets and iteratively refine the resampling process to minimize errors.
Real-Time and Stream Data
- Real-Time Bootstrapping: Enable
bootstrap.py to process data in real-time by incorporating event-driven programming or reactive frameworks that handle data streams efficiently.
Enhanced Composability with sktime
- Evaluation and Comparison Tools: Develop a standardized evaluation module within
tsbootstrap to leverage sktime's comparison metrics (MASE, MAP, etc.), enabling detailed performance analytics between bootstrapped and original time series data.
- Shared Datasets and Benchmarks: Establish a shared repository of time series datasets commonly used in both
tsbootstrap and sktime. Then, create a suite of benchmark tests that automatically apply both resampling methods from tsbootstrap and forecasters from sktime to these datasets, allowing users to directly compare methodologies under identical conditions.
- Documentation and Examples: Create comprehensive documentation and tutorials that illustrate how
tsbootstrap can be integrated with sktime, offering practical examples and best practices in leveraging the combined strengths of both libraries.
- Integration with Arbitrary
sktime Forecasters: Enable the use of any sktime forecaster in forecaster-based bootstraps within tsbootstrap.
- Distribution and Sampler-like Object: Use
tsbootstrap bootstraps to create a distribution or sampler-like object, enhancing the probabilistic forecasting capabilities.
API Extension
- DataFrame Support: Adapt core functionalities to accept
pd.DataFrame inputs, ensuring outputs maintain the original index and columns to seamlessly integrate with pandas workflows.
- Handling Panels and Hierarchical Data: Extend API to support panel data and hierarchical time series, broadening the applicability of the library.
- Exogenous Data Integration: Enhance handling of exogenous variables within bootstraps to support complex forecasting models.
- Update and Streaming Capabilities: Develop methods to update and stream data through the bootstrapping process, facilitating real-time data analysis.
- Model State Management: Differentiate between fittable or pretrained models within the API, providing users with flexible model deployment options.
Adjacent Areas
- Time Series Augmentation: Explore and implement time series augmentation techniques to enrich training datasets and improve model robustness.
- Full Probabilistic Models: Develop full probabilistic models that can be sampled from, expanding the predictive capabilities of
tsbootstrap.
This will serve as a living collection of planned improvements over the next year. It is an expanded version of the Roadmap from
README.md.Performance and Scaling
numpy.memmapfor handling large datasets within simulation methods, allowing parts of the data to be loaded on demand, reducing memory overhead. Opt for in-place operations(+=, *=)in numerical computations to avoid unnecessary data duplication and to minimize peak memory usage.cProfileandmemrayto identify performance bottlenecks. Analyze time complexity of critical functions and optimize by either improving algorithmic approaches or by utilizing more efficient data structures.time_series_simulator.pymodule to partition data processing across multiple nodes.Tuning and Automation
block_resampler.pythat adjust block sizes dynamically based on the autocorrelation properties of the input data, optimizing the balance between bias and variance in bootstrap samples.bootstrap.pythat compare statistical properties of the original and bootstrapped datasets and iteratively refine the resampling process to minimize errors.Real-Time and Stream Data
bootstrap.pyto process data in real-time by incorporating event-driven programming or reactive frameworks that handle data streams efficiently.Enhanced Composability with
sktimetsbootstrapto leveragesktime's comparison metrics (MASE, MAP, etc.), enabling detailed performance analytics between bootstrapped and original time series data.tsbootstrapandsktime. Then, create a suite of benchmark tests that automatically apply both resampling methods fromtsbootstrapand forecasters fromsktimeto these datasets, allowing users to directly compare methodologies under identical conditions.tsbootstrapcan be integrated withsktime, offering practical examples and best practices in leveraging the combined strengths of both libraries.sktimeForecasters: Enable the use of anysktimeforecaster in forecaster-based bootstraps withintsbootstrap.tsbootstrapbootstraps to create a distribution or sampler-like object, enhancing the probabilistic forecasting capabilities.API Extension
pd.DataFrameinputs, ensuring outputs maintain the original index and columns to seamlessly integrate with pandas workflows.Adjacent Areas
tsbootstrap.