Sunday, February 14, 2010

The third variable problem

In statistics, two variables may be correlated (simple correlation or causal relationship), or not related. When value of one variable (dependent variable) changes according to change in the other variable (independent variable), there is likelihood of a correlation. The degree of correlation may be positive or negative. When one variable’s increase causes a concurrent increase in the other variable, then both variables are positively related (e.g. number of hours of study leading to scoring greater in exams). However if one variable’s increase causes the other to fall, both are negatively related (resistance to disease with advancing age).

Two “sympathetic” variables may have mere correlation or they may have a causal relationship. Consider the case of inflation and unemployment. It is generally agreed that when inflation is more, unemployment tends to be more too and when inflation is low, unemployment also tends to be low. This tells us that one causes the other. When one variable causes the other one to vary, the relationship is causal.

Now consider another example. In a country it is found that when ice cream consumption goes up, the number of drownings go up too. Here we cannot assume that ice cream consumption leads to more number of drownings. Actually in this case an unintentional third variable causes a “random and coincidental” relationship between the two variables. Such examples of “unintentional third variable causing a “random and coincidental” relationship between the two variables is called “the third variable problem”.

No comments:

Post a Comment

DSPM, Data Security Posture Management, Data Observability

DATA SECURITY POSTURE MANAGEMENT DSPM, or Data Security Posture Management, is a practice that involves assessing and managing the security ...