Stats 4#

.pdf

School

University of Notre Dame *

*We aren’t endorsed by this school

Course

2476

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

7

Uploaded by savirkris on coursehero.com

STAT 1430 Recitation 4 Correlation and Regression Part 1: Correlation 1. Write down a data set that contains 5ive (x, y) points whose correlation is EXACTLY NEGATIVE ONE (-1). Hint: A scatterplot might help. Properties of correlation. See your lecture notes. Looking at the formula on the formula sheet for r may help, but also consider the statistical reasons for your answers, not solely math. 2. Correlation has . a) The same units as the data b) No units 3. When 5inding the correlation between two quantitative variables, you will get the same answer if you switch X and Y. Explain brie5ly. (See lecture notes) Yes you will get the same answer regardless of axis labels. The commutative property ensures that mixing up the axis won’t effect the outcome ( 5 times 9 and 9 time 5 or both 45 regardless) 4. Scatterplots examine relationships between what type(s) of variables? a) Categorical b) Quantitative c) Both categorical and quantitative variables 5. Correlation is affected by outliers. Explain why, brie5ly. Yes, the correlation is affected by the outliers since the formula revolves around means and standard deviations which are heavily inOluenced by outliers and skewness 6. Correlation is a measure of the strength and direction of what type of relationship between two quantitative variables? _______Linear_______________________ 7. Correlation of a sample (r) will always be a number between -1 and 1 Bob wants to use house size to predict house price in Columbus. He starts out by making a scatterplot of data from 100 homes randomly selected from the current Columbus market. In this case: 8. On Bob’s scatterplot, which variable makes the most sense to appear on the X axis? Data Point X value Y value 1 5 9 2 6 8 3 7 7 4 8 6 5 9 5 1
STAT 1430 Recitation 4 Correlation and Regression House size would be best Oit for the x - axis The following scatter plot shows the relationship between vocabulary size and age. The variables vocabulary size and age have a correlation of 0.96. Answer the following questions based on the known facts. 9. True/false: Having a correlation of positive 0.96, means that there is a 96% chance that a subject at age 4 will have a vocabulary size of 1500 words. a) True b) False 10. True/false: The correlation of positive 0.96, means that 96% of the subjects observed had a vocabulary size that was greater than their age. a) True b) False NOTE: Correlation does NOT ALWAYS imply Causation, which says that a correlation between two variables does not always mean that a change in one variable causes a change in the other (there could be other reasons the data has a good correlation). If you control for the other variables, yes, a causal relationship can be established. That would mean doing a well-designed experiment. Otherwise, no. 11. Please interpret the correlation for this problem. (Remember the three things, and use the context of the problem) Very strong positive linear relationship 12. By looking at this scatter plot we can see that higher ages correspond with a higher vocabulary size. But just because two variables are correlated does not mean one CAUSES the other. Explain why this case is a good example of that. Age Vocabulary size 6 5 4 3 2 1 2500 2000 1500 1000 500 0 Scatterplot of Vocabulary size vs Age 2
STAT 1430 Recitation 4 Correlation and Regression Socioeconomic status could be a great example. Being from a higher socioeconomic background could have a direct correlation to subject possessing a greater arsenal of vocabulary. Part 2: Regression For this recitation (and all future recitations), be sure to show all formulas, calculations, and units, where appropriate. We are comparing years of education and hours on the internet in the last month, to see if a relationship exists. If a relationship does exist, we want to predict Internet use using education level. The output is given below. Assume a scatterplot shows a linear pattern. Descriptive Statistics: Education, Internet Variable Mean StDev Variance Minimum Maximum Educatio 11.000 1.920 3.687 7.000 17.000 Internet 26.316 9.411 88.570 2.000 54.000 Pearson’s Correlation: 0.642 1. Would a line fit this data well? Interpret the correlation between Education and Internet use. The correlation is 0.642 which is moderate. 2. Define which variable is X and which one is Y. Use applicable units. X = education level Y = Internet Usage 3. Find the slope of the best fitting line. Best slope = B1 B = (r) (sy/sx) B1 = (0.642) (9.411/1.920) 4. Identify what the units of the slope are for the regression line that was calculated here. Hours spent on the internet / Years of education 5. Find the Y-intercept of the best fitting line. B= y- b1x B = (26.316) - (3.15)(11) B = -8.298 6. Find the equation of the best fitting line. Y = b0 + b1x Y (hat) = -8.298 + 3.15x 7. Use the line to predict Internet use for someone with 16 years of education. 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help