Discuss
basic ideas of linear regression and correlation.
_
Create and interpret a line of best fit.
_
Calculate and interpret the correlation coefficient.
_
Calculate and interpret outliers.
1.1 Introduction
Professionals
often want to know how two or more variables are related. For example, is there
a relationship between the grade on the second math exam a student takes and
the grade on the final exam? If there is a relationship, what is it and how
strong is the relationship?
In
another example, your income may be determined by your education, your
profession, your years of experience, and your ability. The amount you pay a
repair person for labor is often determined by an initial amount plus an hourly
fee. These are all examples in which regression can be used. The type of data
described in the examples is bivariate data - "bi" for two variables. In reality, statisticians use
multivariate data, meaning many variables.
In
this chapter, you will be studying the simplest form of regression,
"linear regression" with one independent variable (x). This involves
data that fits a line in two dimensions. You will also study correlation which measures
how strong the relationship is.
1.2 Linear Equations
Linear
regression for two variables is based on a linear equation with one independent
variable. It has the form:
y = a + bx
where a and b are constant numbers.
x is the
independent variable, and is the
dependent variable. Typically, you choose a
value to substitute for the independent variable and then solve for the
dependent variable.
Example 1
The
following examples are linear equations.
y = 3 + 2x
y = 0.01 + 1.2x
The
graph of a linear equation of the form y = a + bx is a straight
line. Any line that is not vertical can be described by this equation.
Example
2
Figure
1: Graph of the equation y
= -1 + 2x.
Linear
equations of this form occur in applications of life sciences, social sciences,
psychology, business etc.
Example 2
Svetlana
tutors to make extra money for college. For each tutoring session, she charges
a one time fee of $25 plus $15 per hour of tutoring. A linear equation that
expresses the total amount of money Svetlana earns for each session she tutors
is y = 25 + 15x.
What
are the independent and dependent variables? What is the y-intercept and what
is the slope? Interpret them using complete sentences.
Solution
The
independent variable (x) is the number of hours Svetlana tutors each session.
The dependent variable (y) is the amount, in dollars, Svetlana earns for each
session.
The
y-intercept is 25 (a = 25). At the start of the tutoring session, Svetlana
charges a one-time fee of $25 (this is when x = 0). The slope is 15 (b = 15).
For each session, Svetlana earns $15 for each hour she tutors.
1.3 Scatter Plots
Before
we take up the discussion of linear regression and correlation, we need to
examine a way to display the relation between two variables x and y. The most
common and easiest way is a scatter plot. The following example illustrates a
scatter plot.
1.4 Slope and
Y-Intercept of a Linear Equation
For
the linear equation y = a + bx, b = slope and a = y-intercept. From algebra
recall that the slope is a number that describes the steepness of a line and
the y-intercept is the y coordinate of the point (0, a) where the line crosses
the y-axis.
1.5 Facts about the
Correlation Coefficient for Linear Regression
_ A
positive r means that when x increases, y increases and when x decreases, y
decreases (positive correlation).
_ A
negative r means that when x increases, y decreases and when x decreases, y
increases (negative correlation).
_ An
r of zero means there is absolutely no linear relationship between x and y (no
correlation).
_ High correlation does not suggest that x causes y or
y causes x. We say "correlation does not imply causation." For example, every person who learned math in the 17th
century is dead. However, learning math does not necessarily cause death!
No comments:
Write comments