Regression Discontinuity : Data Analysis Explained

The term “Regression Discontinuity” refers to a statistical technique used in data analysis. It is a method that allows researchers to estimate the causal effect of a particular variable on an outcome, under certain assumptions. This method is particularly useful in situations where random assignment is not possible, making it a popular choice in fields such as economics, political science, and psychology.

Regression Discontinuity (RD) design is based on the idea that there is a discontinuity in the probability of treatment assignment at a certain threshold of a covariate. This threshold is often determined by policy or practical considerations, and individuals just above and below this threshold are assumed to be similar in all respects except for the treatment assignment. This assumption allows for a credible estimation of the causal effect of the treatment.

Table of Contents

Understanding the Basics of Regression Discontinuity

Before diving into the intricacies of Regression Discontinuity, it is essential to understand some fundamental concepts that underpin this method. The first of these is the idea of a ‘treatment’ and a ‘control’ group. In RD design, individuals or units are divided into these two groups based on whether they fall above or below a certain threshold.

The ‘treatment’ group consists of those who receive the intervention or policy under study, while the ‘control’ group is made up of those who do not. The key assumption here is that, apart from the treatment, there are no systematic differences between the two groups around the threshold. This assumption is what allows us to estimate the causal effect of the treatment.

Threshold and Assignment Variable

The threshold in RD design is a crucial component. It is the value of the assignment variable at which the probability of treatment assignment changes. This could be a score on a test, an age limit, or any other measurable criterion. The assignment variable, on the other hand, is the variable used to determine whether an individual or unit receives the treatment or not.

For example, in a study looking at the effect of a scholarship program on student performance, the assignment variable could be the score on a qualifying exam, and the threshold could be the cut-off score required to receive the scholarship. Students who score just above the threshold would be in the treatment group (receiving the scholarship), while those who score just below would be in the control group (not receiving the scholarship).

Local Randomization and Counterfactual Outcomes

Another fundamental concept in RD design is the idea of local randomization. This refers to the assumption that, around the threshold, the assignment to the treatment or control group is as good as random. In other words, individuals or units just above and below the threshold are assumed to be comparable in all respects except for the treatment assignment. This assumption is what allows us to estimate the causal effect of the treatment.

The counterfactual outcome, meanwhile, is what would have happened to the individuals or units in the treatment group had they not received the treatment. In RD design, the counterfactual outcomes are estimated based on the outcomes of the control group. This is because, under the assumption of local randomization, the control group serves as a good approximation of what the treatment group would have looked like in the absence of the treatment.

Types of Regression Discontinuity Designs

There are several types of Regression Discontinuity designs, each with its own set of assumptions and analytical techniques. The three most common types are the Sharp RD design, the Fuzzy RD design, and the Kink RD design.

Sharp RD design is the simplest and most straightforward type. In this design, the probability of treatment assignment jumps from 0 to 1 at the threshold. In other words, all individuals or units above the threshold receive the treatment, and all those below do not. This design is often used when the assignment to treatment is determined by a strict rule, such as a cut-off score on a test.

Fuzzy RD Design

In contrast to the Sharp RD design, the Fuzzy RD design allows for some degree of non-compliance with the treatment assignment rule. In this design, the probability of treatment assignment increases at the threshold, but not from 0 to 1. This means that some individuals or units above the threshold may not receive the treatment, and some below the threshold may receive it.

This design is often used in situations where the treatment assignment is not strictly enforced, such as when individuals can opt out of a program or policy. The key assumption in Fuzzy RD design is the monotonicity assumption, which states that the probability of receiving the treatment increases at the threshold. This assumption allows for the estimation of the Local Average Treatment Effect (LATE), which is the average causal effect of the treatment for the subpopulation of compliers.

Kink RD Design

The Kink RD design is a variant of the RD design where the slope of the regression function, rather than the level, changes at the threshold. This design is often used in situations where the treatment effect is expected to vary with the value of the assignment variable.

For example, in a study looking at the effect of a progressive tax policy on income, the assignment variable could be the income level, and the threshold could be the income level at which the tax rate increases. The treatment effect in this case would be the change in the slope of the income-tax relationship at the threshold. The key assumption in Kink RD design is the continuity of the regression function at the threshold, which allows for the estimation of the treatment effect.

Estimation and Identification in Regression Discontinuity

Estimation and identification are two key steps in any RD analysis. Estimation refers to the process of calculating the treatment effect based on the observed data, while identification refers to the process of isolating the causal effect of the treatment from other potential confounding factors.

In RD design, the treatment effect is typically estimated by fitting a regression model to the data on either side of the threshold and comparing the predicted outcomes at the threshold. This is often done using local linear regression, which fits a separate linear regression line to the data on each side of the threshold. The difference in the predicted outcomes at the threshold is then taken as the estimate of the treatment effect.

Identification Assumptions

The key identification assumption in RD design is the continuity assumption. This assumption states that, in the absence of the treatment, the expected outcome would have followed a smooth function of the assignment variable across the threshold. In other words, any discontinuity in the observed outcome at the threshold is attributed to the treatment effect.

This assumption is crucial for isolating the causal effect of the treatment from other potential confounding factors. However, it is not directly testable, and its validity must be assessed based on the context of the study and the plausibility of the assumption. If the continuity assumption is violated, the estimated treatment effect may be biased.

Robustness Checks

Given the importance of the continuity assumption, it is common practice to conduct robustness checks to assess the sensitivity of the estimated treatment effect to the choice of the bandwidth (the range of the assignment variable around the threshold used for estimation) and the functional form of the regression model.

One common robustness check is the placebo test, which involves shifting the threshold and re-estimating the treatment effect. If the estimated treatment effect is significantly different from zero at these placebo thresholds, this may suggest a violation of the continuity assumption. Another common robustness check is the falsification test, which involves applying the RD design to a variable that should not be affected by the treatment. If a significant treatment effect is found for this variable, this may also suggest a violation of the continuity assumption.

Applications of Regression Discontinuity in Business Analysis

Regression Discontinuity has a wide range of applications in business analysis. It can be used to evaluate the impact of various business policies and interventions, such as pricing strategies, marketing campaigns, and human resource practices, among others.

For example, a company might want to evaluate the impact of a new pricing strategy on sales. If the new pricing strategy is implemented for products above a certain price point, the company could use RD design to compare the sales of products just above and below this price point. Similarly, a company might want to evaluate the impact of a new training program on employee performance. If the training program is offered to employees above a certain performance level, the company could use RD design to compare the performance of employees just above and below this performance level.

Advantages and Limitations

One of the main advantages of RD design is that it provides a credible way to estimate causal effects in situations where random assignment is not possible. This makes it a powerful tool for business analysis, as it allows companies to evaluate the impact of their policies and interventions based on observational data.

However, RD design also has some limitations. One of the main limitations is that it only provides a local estimate of the treatment effect, i.e., the effect for individuals or units around the threshold. This means that the estimated effect may not be generalizable to other individuals or units. Another limitation is that it requires a clear and measurable threshold for treatment assignment, which may not always be available or easy to define.

Conclusion

Regression Discontinuity is a powerful statistical technique that allows for the estimation of causal effects based on observational data. While it has some limitations, its ability to provide credible estimates of treatment effects in situations where random assignment is not possible makes it a valuable tool in many fields, including business analysis.

By understanding the basics of RD design, the different types of RD designs, and the key steps in RD analysis, business analysts can leverage this method to evaluate the impact of various business policies and interventions. Furthermore, by being aware of the key assumptions and potential pitfalls of RD design, they can ensure the validity and robustness of their analyses.