Well, I think Mike McCoy's answer is "the right answer," but here's another way of thinking about it: the linear regression is looking for an approximation (up to the error $\epsilon$) for $y$ as a function of $x$. That is, we're given a non-noisy $x$ value, and from it we're computing a $y$ value, possibly with some noise. This situation is not symmetric in the variables -- in particular, flipping $x$ and $y$ means that the error is now in the independent variable, while our dependent variable is measured exactly. Show One could, of course, find the equation of the line that minimizes the sum of the squares of the (perpendicular) distances from the data points. My guess is that the reason that this isn't done is related to my first paragraph and "physical" interpretations in which one of the variables is treated as dependent on the other. Incidentally, it's not hard to think up silly examples for which $B_x$ and $B_y$ don't satisfy anything remotely like $B_x \cdot B_y = 1$. The first one that pops to mind is to consider the least-squares line for the points {(0, 1), (1, 0), (-1, 0), (0, -1)}. (Or fudge the positions of those points slightly to make it a shade less artificial.) Another possible reason that the perpendicular distances method is nonstandard is that it doesn't guarantee a unique solution -- see for example the silly example in the preceding paragraph. (N.B.: I don't actually know anything about statistics.) Regression line is not (always) the same as true relationshipYou may have some 'true' causal relationship with an equation in a linear form $a+bx$ like $$y := a + bx + \epsilon$$ Where the $:=$ means that the value of $a+bx$ with some added noise $\epsilon$ is assigned to $y$. The fitted regression lines More precise relationship between slopesFor two switched simple linear regressions: $$Y = a_1 + b_1 X\\X = a_2 + b_2 Y$$ you can relate the slopes as following: $$b_1 = \rho^2 \frac{1}{b_2} \leq \frac{1}{b_2}$$ So the slopes are not each other inverse. IntuitionThe reason is that
You can imagine that the conditional probability relates to the strength of the relationship. Regression lines reflect this and the slopes of the lines may be both shallow when the strength of the relationship is small or both steep when the strength of the relationship is strong. The slopes are not simply each others inverse. ExampleIf two variables $X$ and $Y$ relate to each other by some (causal) linear relationship $$Y = \text{a little bit of $X + $ a lot of error}$$ Then you can imagine that it would not be good to entirely reverse that relationship in case you wish to express $X$ based on a given value of $Y$. Instead of $$X = \text{a lot of $Y + $ a little of error}$$ it would be better to also use $$X = \text{a little bit of $Y + $ a lot of error}$$ See the following example distributions with their respective regression lines. The distributions are multivariate normal with $\Sigma_{11} \Sigma_{22}=1$ and $\Sigma_{12} = \Sigma_{21} = \rho$ The conditional expected values (what you would get in a linear regression) are $$\begin{array}{} E(Y|X) &=& \rho X \\ E(X|Y) &=& \rho Y \end{array}$$ and in this case with $X,Y$ a multivariate normal distribution, then the conditional distributions are $$\begin{array}{} Y|X & \sim & N(\rho X,1-\rho^2) \\ X|Y & \sim & N(\rho Y,1-\rho^2) \end{array}$$ So you can see the variable Y as being a part $\rho X$ and a part noise with variance $1-\rho^2$. The same is true the other way around. The larger the correlation coefficient $\rho$, the closer the two lines will be. But the lower the correlation, the less strong the relationship, the less steep the lines will be (this is true for both
lines Recommended textbook solutions
The Practice of Statistics for the AP Exam5th EditionDaniel S. Yates, Daren S. Starnes, David Moore, Josh Tabor 2,433 solutions
Introduction to Statistics and Data Analysis4th EditionChris Olsen, Jay L. Devore, Roxy Peck 552 solutions
Stats: Data and Models2nd EditionRichard D. De Veaux 731 solutions Introduction to the Practice of Statistics, Extended Edition6th EditionBruce Craig, David Moore, George McCabe 411 solutions Can X and Y be interchanged in correlation?With correlation, the X and Y variables are interchangeable. Regression assumes X is fixed with no error, such as a dose amount or temperature setting. With correlation, X and Y are typically both random variables*, such as height and weight or blood pressure and heart rate.
Is regression of y on x is the same as regression of x on y?If Y depends on X then the regression line is Y on X. Y is dependent variable and X is independent variable. If X depends on Y, then regression line is X on Y and X is dependent variable and Y is independent variable. The regression equation Y on X is Y = a + bx, is used to estimate value of Y when X is known.
Can dependent and independent variables be switched?No. The value of a dependent variable depends on an independent variable, so a variable cannot be both independent and dependent at the same time.
Is correlation of X and Y same as Y and X?The Pearson correlation coefficient of x and y is the same, whether you compute pearson(x, y) or pearson(y, x).
|