Files
wiki/education/statistics/5.html

130 lines
73 KiB
HTML

<!--
title: 5
description:
published: true
date: 2026-02-11T14:43:25.309Z
tags:
editor: ckeditor
dateCreated: 2026-02-11T14:42:59.270Z
-->
<h2><span style="font-family:Arial, Helvetica, sans-serif;">General</span></h2>
<h3><span style="font-family:Arial, Helvetica, sans-serif;">Simple Linear Regression</span></h3>
<p><span style="font-family:Arial, Helvetica, sans-serif;">Normal equations</span></p>
<figure class="image"><img src=""></figure>
<figure class="image"><img src=""></figure>
<p>&nbsp;</p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">Least-squares calculations</span></p>
<figure class="image"><img src=""></figure>
<figure class="image"><img src=""></figure>
<figure class="image"><img src=""></figure>
<figure class="image"><img src=""></figure>
<figure class="image"><img src=""></figure>
<p>&nbsp;</p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">Solution for normal equations</span></p>
<figure class="image"><img src=""></figure>
<figure class="image"><img src=""></figure>
<p>&nbsp;</p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">Standard error of estimate</span></p>
<figure class="image"><img src=""></figure>
<p>&nbsp;</p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">Test Statistic for inferences concerning β</span></p>
<figure class="image"><img src=""></figure>
<p><span style="font-family:Arial, Helvetica, sans-serif;">Or it can be written as</span></p>
<figure class="image"><img src=""></figure>
<p>&nbsp;</p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">Confidence interval for regression coefficient β</span></p>
<figure class="image"><img src=""></figure>
<p>&nbsp;</p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">Confidence interval for the mean of&nbsp;y when x = x0</span></p>
<figure class="image"><img src=""></figure>
<p>&nbsp;</p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">Coefficient of correlation</span></p>
<figure class="image"><img src=""></figure>
<p>&nbsp;</p>
<h3><span style="font-family:Arial, Helvetica, sans-serif;">Multiple Regression</span></h3>
<figure class="image"><img src=""></figure>
<figure class="image"><img src=""></figure>
<p>&nbsp;</p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">Estimated Equation</span></p>
<figure class="image"><img src=""></figure>
<p>&nbsp;</p>
<figure class="image"><img src=""></figure>
<figure class="image"><img src=""></figure>
<figure class="image"><img src=""></figure>
<p><span style="font-family:Arial, Helvetica, sans-serif;">Where</span></p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">SST is the total sum of squares,</span></p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">SSR is the sum of squares due to regression,</span></p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">SSE is the sum of squares due to error.</span></p>
<p>&nbsp;</p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">Multiple Coefficient of Determination</span></p>
<figure class="image"><img src=""></figure>
<p><span style="font-family:Arial, Helvetica, sans-serif;">Which is the % of variation of y can be explained by the sample regression line.</span></p>
<p>&nbsp;</p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">Adjusted Multiple Coefficient of Determination</span></p>
<figure class="image"><img src=""></figure>
<p><span style="font-family:Arial, Helvetica, sans-serif;">R<sub>a</sub><sup>2</sup> will always be smaller than&nbsp;R<sup>2</sup>.</span></p>
<p>&nbsp;</p>
<h3><span style="font-family:Arial, Helvetica, sans-serif;">Assumptions</span></h3>
<h4><span style="font-family:Arial, Helvetica, sans-serif;">Linearity</span></h4>
<p><span style="font-family:Arial, Helvetica, sans-serif;">The relationship between the explanatory X and the response variable Y should be linear.</span></p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">Methods for fitting a model to non-linear relationships exist but are beyond the scope of this course.</span></p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">Check using a scatterplot of the data, or a residuals plot.</span></p>
<p>&nbsp;</p>
<h4><span style="font-family:Arial, Helvetica, sans-serif;">Nearly Normal Residuals</span></h4>
<p><span style="font-family:Arial, Helvetica, sans-serif;">The residuals should be nearly normal.</span></p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">This condition may not be satisfied when there are unusual observations that do not follow the trend of the rest of the data.</span></p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">Check using histogram or normal probability plot of residuals.</span></p>
<p>&nbsp;</p>
<h4><span style="font-family:Arial, Helvetica, sans-serif;">Constant variability</span></h4>
<p><span style="font-family:Arial, Helvetica, sans-serif;">The variability of points around the least squares line should be roughly constant.</span></p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">This implies that the variability of residuals around the 0 line should be roughly constant as well.</span></p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">It is also called homoscedasticity.</span></p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">Check using histogram or normal probability plot of residuals.</span></p>
<p>&nbsp;</p>
<h3><span style="font-family:Arial, Helvetica, sans-serif;">Testing for Significance</span></h3>
<h4><span style="font-family:Arial, Helvetica, sans-serif;">Whole Model</span></h4>
<p><span style="font-family:Arial, Helvetica, sans-serif;">To determine whether a significant relationship exists between the dependent variable y and the set of all the independent variables x.</span></p>
<p>&nbsp;</p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">Setting&nbsp;H0 and H1:</span></p>
<figure class="image"><img src=""></figure>
<figure class="image"><img src=""></figure>
<p><span style="font-family:Arial, Helvetica, sans-serif;">&nbsp;</span></p>
<figure class="image"><img src=""></figure>
<figure class="image"><img src=""></figure>
<p>&nbsp;</p>
<h4><span style="font-family:Arial, Helvetica, sans-serif;">Coefficient of Individual X</span></h4>
<p><span style="font-family:Arial, Helvetica, sans-serif;">Setting&nbsp;H0 and H1:</span></p>
<figure class="image"><img src=""></figure>
<figure class="image"><img src=""></figure>
<figure class="image"><img src=""></figure>
<figure class="image"><img src=""></figure>
<p>&nbsp;</p>
<h4><span style="font-family:Arial, Helvetica, sans-serif;">Multicollinearity</span></h4>
<p><span style="font-family:Arial, Helvetica, sans-serif;">The correlation among the independent variables.</span></p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">When the independent variables are highly correlated, say&nbsp;|r| &gt; 0.7, it is not possible to determine the separate effect on the dependent variable.</span></p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">Every attempt should be made to avoid including independent variables that are highly correlated.</span></p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">Two predictor variables are said to be collinear when they are correlated, and this collinearity complicates model estimation.</span></p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">Predictors that are associated with each other are not preferred to be added into the model, as often the addition of such variables brings nothing to the table. Instead, the simplest model is preferred or say the parsimonious model.</span></p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">While it is not possible to avoid collinearity from arising in observational data, experiments are usually designed to prevent correlations among predictors.</span></p>
<p>&nbsp;</p>
<h4><span style="font-family:Arial, Helvetica, sans-serif;">Qualitative Independent Variables</span></h4>
<p><span style="font-family:Arial, Helvetica, sans-serif;">Such as genders (male, female), method of payment (cash, check, credit card).</span></p>
<p>&nbsp;</p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">For example,&nbsp;X<sub>1</sub> might represent gender where&nbsp;X<sub>1</sub> = 0&nbsp;indicates male and&nbsp;X<sub>1</sub> = 1 indicates female.</span></p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">In this case,&nbsp;X<sub>1</sub>&nbsp;is called a dummy or indicator variable.</span></p>
<p>&nbsp;</p>
<h4><span style="font-family:Arial, Helvetica, sans-serif;">More Complex Qualitative Variables</span></h4>
<p><span style="font-family:Arial, Helvetica, sans-serif;">If a qualitative variable has&nbsp;k&nbsp;levels,&nbsp;k - 1&nbsp;dummy variables are required, with each dummy variable being coded as 0 or 1.</span></p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">For example, a variable with levels A, B, and C could be represented by&nbsp;X<sub>1</sub>&nbsp;and X<sub>2</sub>&nbsp;values of (0, 0) for A, (1, 0) for B, and (0, 1) for C.</span></p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">Care must be taken in defining and interpreting the dummy variables.</span></p>
<p>&nbsp;</p>
<h4><span style="font-family:Arial, Helvetica, sans-serif;">Residual Analysis</span></h4>
<p><span style="font-family:Arial, Helvetica, sans-serif;">In multiple regression analysis it is preferable to use the residual plot against&nbsp;ŷ to determine if the model assumptions are satisfied.</span></p>
<p>&nbsp;</p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">Standardized residuals are frequently used in residual plots.</span></p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">Identifying outliers (typically, standardized residuals&nbsp;&lt; -2 or&nbsp;&gt; 2)</span></p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">Providing insight into the assumption that the error term&nbsp;e&nbsp;has a normal distribution.</span></p>
<p>&nbsp;</p>
<p><span style="font-family:Arial, Helvetica, sans-serif;">The computation of the standardized residuals in multiple regression analysis is too complex to be done by hand, Excel regression tool can be used.</span></p>