Models

Blas MOLA-YUDEGO

Modeling as a tool

What is a model? How is a linear regression fit and assessed? How we construct a model and make predictions with it? In our case, a model is simply a (simplified) representation of reality, using mathematical language. This section will deal with regression linear models, starting with a single variable (predictor, independent) used to predict another one (response, dependent). The relation between both variables must be modeled, using a line and some properties of the normal distribution to fit the parameters that define that line. The model is assessed to measure the predictive power as well as if we incur in any violation of the premises concerning the way the line was fit. We will review the assumptions of Normality of the residues, linearity, constant variance (homoscedasticity) and independence. From that basis, we will expand to add more variables, check the effects of multi-collinearity and how to deal from there.

Objectives

To understand the main assumptions of linear regression models
To fit linear regression models
Model assessment and evaluation

Lecture notes

Modeling with simple regression. Blas Mola (2025) [PDF]. Lines as a tool. Simple regression turns a cloud of points into a quantified relationship. By pairing slopes and intercepts with their standard errors, t-tests, and indices, it converts that relationship into a scientific tool that helps assess assumptions, good design, and incorporate uncertainty. Depite its simplicity, simple regression is a powerful methodology that helps make predictions, explain mechanisms and frame the discussion within the boundaries the data can actually support.

Materials

Excel for task [xls]

Excel for practice [xls]

Simple regression

Exercise pre-exam [PDF]

Exercise height-volume [PDF]

Exercise heigh-diameter-volume [PDF] [solutions]

Exercise wheat production [PDF]

Multiple regression exercises

Exercise height-diameter-value [PDF]

Exercise barley yields [PDF]

Videos and tutorials

Simple linear regression [youtube]

How to make our own sandbox model [video]

Visual representation of non-sense model

Visual representation of p-hacking

[https://www.explainxkcd.com]

Tasks

What are the consequences of violating the main assumptions of linear regression models?

What is the role of the interception (β₀) in a model?

How do you decide the variables to be included in a model?

Exercise

We propose you to try the following tasks to practice the concepts explain in those lectures:

Create a large sequence of numbers following a normal distribution with defined mean=0 and std deviation (σ) using excel/R. That will be the noise in the model.
Create a sequence of numbers, either random, systematic (e.g. 1 to 100) or following a normal distribution. That will be the x in the model.
Create a model. For instance y=2+3x. In this case, β0=2 and β1=3. This is the true model of your data. If you try to make a figure, it will look like a perfect line, with that exact formula and R2=1
Add the noise (error). That is, to add to y=2+3x the values of step 1.
Now check how the model behaves in the figure. Increase the noise (increase the std deviation (σ) of step 1). How is the R2 changing? Are you being fooled by randomness? Do you see a “better picture” with a larger sample?

How to do it?

Generate random numbers in excel: =RAND()
Generate numbers following a normal distribution with mean=100 and st dev=10: =NORMINV(RAND(),100,10)

In R:

Check here.

For more instructions, google (as I do)!