Big data, linear regression model, analysis techniques, data type, statistics, simple regression model, multiple regression model, Ordinary least squares, Test of Joint Significance, common violations
Big data is the massive volume of both structured and unstructured data that are difficult to manage, process, and analyze using traditional data-processing tools.
[...] It is the ration of the dxplained variation of the dependent variable to its total variation. Total variation in y = total sum of squares = SST = SSR + SSE SSR = R2 = SSR/SST = 1 - (SSE/SST) = the proportion of the sample variation in the dependent variable explained by the sample regression equation Between 0 and The closer to 1 the better fit R au carré 4 Adjusted R2 We cannot use R2 for model comparison when the competing models do not include the same number pf independent variables. [...]
[...] It allows to calculate the predicted value of the dependent variable for any given values of independent variables. The difference between the observed and the predicted values is the residual : e = y - Big data : CM1 2 The linear regression model Ordinary least squares (OLS) : chooses the samples regression equation by minimising the sum of squared errors Formulas : Standard error of the model : same units of measurement as the dependent variable Sample Variance : Se au carré = average squared deviation between the observed and predicted values Big data : CM1 3 SSE : sum of squared errors k : number of independent variables n : sample size Increasing the nb of independent variables decreases the numerator and denominator. [...]
[...] The net effect : determine if the added independent variable improve the fit When comparing models with the same dependent variable, the model with the smaller Se is prefered. R au carré def : quantifies the sample variation in the dependent variable that is explained by the sample regression equation. [...]
[...] Big data Big data : CM1 Descriptive : what has happened, identify problems and solutions 3 types of analysis techniques Predictive : What could happen, historical data, estimate, etc Prescriptive : what should we do, optimize and simulate, explore, build, etc Big data : massive volume of both structured and unstructured data that are difficult to manage, process, and analyze using traditional data-processing tools. 3 Vs : Statistical data types Volume Velocity Variety Cross-sectional data : for a given sort of entity for a single period of time Time-series data : for a single entity for multiple periods of time Panel Data : for multiple entities for multiple periods of time The linear regression model Def : postulates that the relationship between the dependent and independent variable is linear Dependent variable = y Independent = X1, X2, etc, Xn A regression mondel treats all independent variable as numerical Big data : CM1 1 Dummy variable : used to describe 2 categories of a categorical variable, d d = 1 for 1 of the categories d = 0 for the other(s) Simple regression model : Multiple regression model : predicted value of the dependent variable This equation is the model. [...]
[...] R2 nevers decreases as we add more variables and in general increases k : nb of independent variables n : sample size The higher the adjusted the better the model As n increases, R2 gets closer to R2 When comparing models with the same dependent dependent variable, the model with the higher R2 is preferred Test of Joint Significance def : test of the overall usefulness of a regression, determines whether the independent variables have a joint statistical influence on the dependent variable All the slope coef = 0 : al the independent variables drop out; none of the independent variables have a linear relationship with the dependent variable At least one of the slope coef not = 0 : at leat one of the independent variable influences the dependent variable The test statistic measures how well the sample regression equation explains the variability in the dependent variable. [...]
APA Style reference
For your bibliographyOnline reading
with our online readerContent validated
by our reading committee