CausalImpact frame1 (user guide)

User Guide for CausalImpact Analysis Dashboard

This guide is designed to help you navigate and utilize the CausalImpact Analysis Dashboard effectively. By following these steps and understanding the underlying statistical concepts, you can perform robust causal impact analyses and interpret the results with confidence.

Welcome to the CausalImpact Analysis Dashboard, a powerful statistical tool designed for analyzing the causal impact of interventions using Bayesian Structural Time Series (BSTS) models. This guide will help you navigate the app’s interface and understand the underlying statistical methods.

Flowchart

pdfresizer.com-pdf-crop-33

Navigation

Main Menu

The main menu on the left sidebar is divided into several sections:

Analysis
– Causal Impact Analysis: Perform the primary causal impact analysis.
– Intro: Introduction and overview of the app.
– Progress Details: View detailed progress of ongoing analyses.
– MCMC Diagnostics: Inspect diagnostics of the MCMC process.
– Impact Plot: Visualize the impact analysis results.
– Ridgeline Plot: View ridgeline plots of MCMC densities.

Data
– Data Plot: Plot the time series data.
– Descriptive Statistics: View descriptive statistics of the data.
– Statistical Tests: Perform and view results of statistical tests.
– Time Series Plots: Visualize various time series plots.
– Animated Timeseries: View an animated timeseries plot.

Reports
– Analysis Report: Generate and view detailed analysis reports.
– Inferential Logic: Understand the inferential logic used in the analysis.
– Original Paper: Access the original paper by Brodersen et al. (2015).
– References: View references and additional reading material.

Visualizations
– Inferential Framework: Understand the inferential framework.
– Logic Diagram: Visualize the causal impact logic diagram.
– Correlation Network: Explore correlation networks in the data.

Contact
– Get in touch with the authors and maintainers of the app.

Varia
– Access miscellaneous features and additional content.

Analysis Steps

Parameterize Model

– Pre-Intervention Start Date: Select the start date of the pre-intervention period.
– Pre-Intervention End Date: Select the end date of the pre-intervention period.
– Post-Intervention Start Date: Select the start date of the post-intervention period.
– Post-Intervention End Date: Select the end date of the post-intervention period.
– MCMC Iterations: Set the number of MCMC iterations for the analysis.

Analyze Data

Number of Seasons

The seasonal component in time series analysis accounts for periodic fluctuations that occur at regular intervals. For instance, monthly data might exhibit annual seasonality (e.g., higher sales in December). Specifying the number of seasons is crucial as it defines the length of the cycle. For monthly data with annual seasonality, the number of seasons would be 12. Accurately modeling seasonality helps in capturing recurring patterns, improving the model’s predictive power and the accuracy of causal impact assessment.

State Component

The state component determines the underlying structure of the time series model. Common components include:

– AddLocalLevel: Captures local fluctuations around a level that can change over time.
– AddLocalLinearTrend: Models a trend that evolves linearly over time, allowing for changes in the slope.
– AddSeasonal: Accounts for seasonality by including periodic components.

Choosing the appropriate state component is vital for capturing the true data-generating process, thereby enhancing the model’s accuracy and robustness in estimating the causal impact.

Burn-in Period

The burn-in period refers to the initial iterations of the MCMC algorithm that are discarded to ensure the samples used for inference are drawn from the stationary distribution. This helps eliminate the influence of the starting values. A typical burn-in period ranges from 10% to 50% of the total iterations, depending on the complexity of the model and the convergence behavior. Properly setting the burn-in period ensures the reliability and accuracy of the posterior estimates.

Confidence Interval Level

The confidence interval level in Bayesian analysis, often referred to as the credible interval, represents the range within which the true parameter value lies with a certain probability. For example, a 95% credible interval means there is a 95% probability that the true value lies within this range. Setting this level is crucial for interpreting the uncertainty in the causal impact estimates. Higher confidence levels provide wider intervals, reflecting greater uncertainty, while lower levels provide narrower intervals.

Number of Simulations

The number of simulations, or MCMC iterations, determines the number of posterior samples generated. More simulations provide a better approximation of the posterior distribution, improving the reliability of the estimates. However, this also increases computational time. A typical range might be from 1,000 to 10,000 iterations, with the choice depending on the model’s complexity and the required precision of the estimates. Adequate sampling is essential for robust inference and accurate estimation of the causal impact.

Inspect Diagnostics

Trace Plots

Trace plots display the sampled values of the MCMC chains across iterations. They are used to assess the convergence of the chains. Ideally, the plots should show a stable, horizontal pattern with no apparent trends, indicating that the chains have mixed well and are sampling from the target posterior distribution. Convergence is crucial for ensuring that the posterior estimates are reliable and representative of the true parameter distributions.

Density Plots

Density plots illustrate the distribution of the sampled values for each parameter. They provide a visual assessment of the posterior distribution’s shape and central tendency. Smooth, unimodal density plots indicate good mixing and convergence, while multimodal or irregular shapes may suggest issues with the sampling process. Density plots help in understanding the uncertainty and variability in the parameter estimates.

Autocorrelation Plots

Autocorrelation plots show the correlation of the MCMC samples with their lagged values. High autocorrelation indicates that the samples are not independent, which can be a sign of poor mixing and slow convergence. Ideally, the autocorrelation should drop to zero quickly as the lag increases. Low autocorrelation is essential for ensuring that the effective sample size is large enough for reliable inference.

Running Mean Plots

Running mean plots track the cumulative mean of the MCMC samples over iterations. They help in assessing the stability and convergence of the chains. The running mean should stabilize around a constant value, indicating that the chain has reached the target distribution. Stability in the running mean is a sign of good mixing and convergence.

Cumulative Quantile Plots

Cumulative quantile plots display the cumulative distribution of the MCMC samples for different quantiles. They provide a comprehensive view of the distribution’s behavior over iterations. These plots help in assessing whether the sampled distribution remains consistent and stable, which is crucial for reliable posterior inference.

View and Download Results

Interactive Impact Plot

Interactive impact plots visualize the observed and predicted series, allowing for dynamic exploration of the causal impact results. They provide an intuitive way to compare the actual data with the counterfactual predictions, facilitating a deeper understanding of the intervention’s effect.

Predicted vs. Observed

The predicted vs. observed plot compares the predicted values from the model with the actual observed data. This comparison helps in assessing the model’s accuracy and the validity of the causal impact estimates. Discrepancies between the two series highlight the areas where the model may need improvement.

Static Impact Plot

A static impact plot provides a snapshot of the causal impact analysis, showing the observed data, counterfactual predictions, and the estimated impact. This plot is useful for reporting and documenting the analysis results, offering a clear visual representation of the intervention’s effect.

Impact Summary

The impact summary presents a concise overview of the causal impact analysis, including key statistics such as the estimated effect size, credible intervals, and p-values. This summary helps in quickly understanding the main findings and their statistical significance.

Impact Report

The impact report generates a detailed HTML document that includes all aspects of the analysis, from the model specification and diagnostics to the results and interpretation. This comprehensive report serves as a valuable reference for documenting the analysis process and findings, facilitating transparency and reproducibility.

Data Analysis

Assumptions

In Bayesian Structural Time Series (BSTS) models, certain assumptions underpin the validity of the causal inferences drawn:

– Pre-Intervention and Post-Intervention Periods: Properly defining these periods ensures that the comparison between observed and counterfactual outcomes is meaningful. The pre-intervention period should ideally capture the underlying trend and seasonality unaffected by the intervention.
– Stationarity: The time series data must be stationary or appropriately transformed to remove trends and seasonality. Stationarity ensures that the properties of the series do not change over time, which is crucial for valid statistical inference.
– Model Specification: Choosing the correct state components and seasonal effects is essential for accurately modeling the underlying data-generating process. Components like local level, local linear trend, and seasonal factors should be considered based on the data characteristics.

Diagnostics

Bayesian inference relies heavily on the quality of the posterior distributions sampled via MCMC. Therefore, diagnostics are vital:

– Convergence: Assessing convergence of the MCMC chains ensures that the sampled distributions are representative of the true posterior distribution. Convergence diagnostics such as Gelman-Rubin statistics help determine if multiple chains have converged to the same distribution.
– Effective Sample Size: Evaluating the effective sample size helps ensure that the number of independent samples is sufficient for reliable inference. This accounts for autocorrelation within the chains.
– Autocorrelation: Inspecting autocorrelation plots of the MCMC chains helps identify dependencies in the samples. High autocorrelation can indicate poor mixing and slow convergence.
– Stationarity: Using Heidelberger and Welch’s diagnostics checks for stationarity within the chains, ensuring that the sampled distributions are stable over iterations.

Analysis

The core of this analysis lies in Bayesian inference, which combines prior beliefs with observed data to form posterior distributions:

– Bayesian Structural Time Series (BSTS): This model incorporates priors about the state components and uses MCMC to estimate posterior distributions of the parameters. The model is flexible and can include various components like local levels, trends, and seasonality.
– CausalImpact Package: This package performs the causal impact analysis by comparing observed data with counterfactual predictions. The counterfactual prediction represents what would have happened in the absence of the intervention.
– Posterior Predictive Checks: These checks validate the model by comparing observed data with data simulated from the posterior distribution. They help assess the model’s fit and predictive power.

Results Interpretation

Interpreting the results involves comparing the observed outcomes with the counterfactual predictions to draw valid causal inferences:

– Observed vs. Predicted: Comparing the observed data with the predicted values helps assess the intervention’s impact. The difference between these series indicates the causal effect.
– Confidence Intervals: Confidence intervals provide a range of plausible values for the causal effect, reflecting the uncertainty in the estimates. Interpreting these intervals is crucial for understanding the reliability of the conclusions.
– Cumulative Impact: Analyzing the cumulative impact plot reveals the overall effect of the intervention over time. This visualization helps understand the long-term consequences of the intervention.

Drawing Logically Valid Inferences

The beauty of Bayesian inference lies in its ability to update beliefs in the light of new data. When interpreting the results, consider the following principles:

1. Prior and Posterior: Understand how the prior distributions influence the posterior estimates. Strong priors can significantly shape the posterior, especially with limited data.
2. Credible Intervals: Unlike frequentist confidence intervals, Bayesian credible intervals have a direct probabilistic interpretation. A 95% credible interval means there is a 95% probability that the true parameter lies within this range.
3. Model Fit and Predictive Power: Use posterior predictive checks to assess the model’s fit. Good predictive power indicates that the model can accurately capture the underlying data-generating process.
4. Sensitivity Analysis: Perform sensitivity analyses to understand how robust the results are to different priors and model specifications. This helps ensure that the conclusions are not unduly influenced by specific assumptions.

Contact and Support

For any questions or support, please visit the Contact section of the app. Here, you can find a form to reach out to the authors and maintainers.

References

– Brodersen, K. H., Gallusser, F., Koehler, J., Remy, N., & Scott, S. L. (2015). Inferring causal impact using Bayesian structural time-series models. The Annals of Applied Statistics, 9(1). https://doi.org/10.1214/14-AOAS788

– Kruschke, J. (2014). Doing Bayesian Data Analysis. Academic Press.