For grading purposes, this particular discussion posting area runs from Sunday Feb 21 through SATURDAY Feb 27, inclusively.
We explore so-called Two Variable Statistics this Week. This includes linear correlation, simple linear regression, the coefficient of determination, “correlation versus causation,” scatter plots, and more !
Please don’t forget to use an “outside” resource as part of the content and documentation for your first Post – the Post which is due on or before Wednesday of the Week – the Post where you make the most major contribution to the Weekly discussion posting area and attempt to address the discussion prompts / cues for the Week. It could possibly include a web site that you discovered on the internet at large, so long as the web site is relevant and substantial and does not violate the Chamberlain University policy for prohibited web sites, and so forth. It could possibly include references / resources that you discover through making use of the online Chamberlain University Library ( please click Resources along the left and then click Library to discover the link to the Chamberlain University online Library ) .
Please check out the link below for some information about simple linear regression, the coefficient of determination, and the concept of standard error.
This is one kind of an example of using an “outside” source / resource to add to what is revealed in our Weekly Lesson in Modules and in our Weekly text book reading.
Please don’t forget to look over the Graded Discussion Posting Rubric each Week to be certain that you are meeting all of the Frequency requirements as well as all of the Quality requirements for graded discussion posting each Week.
Professor and Class,
Regression analysis is how we measure cause and affect relationships and determine if they are statistically sound or not. Correlation alone is not causation and that is why patterns and influence must be studied (Holmes, Illowsky, and Dean, 2017). If a regression analysis were done on BMI, there are many probable independent variables. The easiest one and most common to think of would be the patient’s diet. We could break this down and become more specific such as total cholesterol intact or total fat intact. Other variable to consider would be exercises or illnesses such as lipedema or lymphedema. Also, things such as COPD and CHF are important to consider. As mentioned in our lesson this week, correlation is not causation and conducting further experiments and statistics is needed to determine whether the results are based on influence or coincidence.
In a study published in Environmental Health Perspectives blood pressure, heart rate, and cardiac biomarkers and the correlation with air pollution was studied. The dependent variable being the blood pressure, heart rate, and biomarkers, and the independent variable was exposure to air pollution. This study took place between 1995-2013. The results state “We observed some evidence suggesting distributional effects of traffic-related pollutants on systolic blood pressure, heart rate variability, corrected QT interval, low density lipoprotein (LDL) cholesterol, triglyceride, and intercellular adhesion molecule-1 (ICAM-1)”. There conclusion also uses subjective words such as “may effect” (Bind, Peters, Koutrakis, Coull, Vokonas, and Schwartz, 2016). With this in mind and the lack of knowledge of other factors related to the participants health I would say it is difficult to exclude the possibility of coincidence in this specific study.
Bind, M., Peters, A., Koutrakis, P., Coull, B., Vokonas, P., & Schwartz, J. (2016). Quantile Regression Analysis of the Distributional Effects of Air Pollution on Blood Pressure, Heart Rate Variability, Blood Lipids, and Biomarkers of Inflammation in Elderly American Men: The Normative Aging Study. Environmental Health Perspectives. https://ehp.niehs.nih.gov/doi/10.1289/ehp.1510044#:~:text=Results%20%20%20%20Outcomes%20%20%20,%20%20%20%2014%20more%20rowsLinks to an external site.
Holmes, A., Illowsky, B., & Dean, S. (2017). Introductory Business Statistics. OpenStax.
Both correlation and regression analysis are applied in the determination of the relationship that exist between variables. In both cases, the variables need to be normally distributed and possess a normal distribution. However, there is the difference between the two statistical approaches (Kasuya, 2019). While correlation is only used to measure the association between two continuous variables, regression analysis is used to determine relationship between one dependent variable and one or more independent variables. Additionally, regression analysis is how we measure cause and affect relationships and determine if they are statistically sound or not. Correlation alone is not causation and that is why patterns and influence must be studied (Kasuya, 2019). Performing regression analysis in Body Mass Index (BMI) requires the consideration of different independent variables. Apart from diet and the rate of physical activities, another possible independent variable would be height of an individual or the study participants. Height is always considered in the computation of the BMI, therefore, it is one of the independent variables for the BMI. Also, the values of height are always continuous. However, data analyst need to ensure that there is a normal distribution.
Physical activities are known to reduce body mass index. In other words, continuous physical activities always aids in the breakdown of excessive body fast that contribute to the increase in BMI. Also, excessive or overeating and overconsumption of junk or fatty foods have been established as the major contributors to increase in BMI. Before undertaking correlation and regression analysis, there is always the need to undertake normality tests to ensure that both the dependent and independent variables meets the requirements for undertaking parametric tests or inferential statistical analysis.
Kasuya, E. (2019). On the use of r and r squared in correlation and regression (Vol. 34, No. 1, pp. 235-236). Hoboken, USA: John Wiley & Sons, Inc. Retrieved from: https://esj-journals.onlinelibrary.wiley.com/doi/abs/10.1111/1440-1703.1011Links to an external site.
Hi Professor and classmates,
Congrats to all for surviving this course! Wasn’t this like trying to lean a new language in 8 weeks?
I found an excellent article in our library that compared different regression models for the best approach to predicting BMI: “Factors associated with overweight: are the conclusions influenced by choice of the regression method?” (Juvanhol et al., 2016). The bottom line was the authors recommend using a combination of different approaches, as these furnish complementary information to the multifactorial predictors of obesity. The article was a little over my head as it discussed gamma regression, which I couldn’t find in our textbook, and quantiles, which also is not in our text but seems a lot like quartiles. But thanks to this course, I was able to understand more of this article than I would have before this course.
In this article, BMI distribution percentiles is on the x-axis of the following charts. The along the y-axis were the values of the estimated coefficients for age, physical inactivity, years of night-shift work, BMI at age 20, domestic overload (cleaning/cooking/laundry factored by number of residents at home) and self-rated health. According to Juvanhol et al., (2016), these were the explanatory variables. This is still a little confusing to me, as Holmes et al. (2018) stated that a multivariate model or system is where more than one independent variable is used to predict an outcome, and there can only be one dependent variable, but unlimited independent variables. So why did the authors refer to age, etc., as explanatory variables, which would made them independent variables, but not put them on the x-axis?
Anyway, the independent variables are along the y-axis, and are shown in units of the values of the coefficients estimated. Coefficients provide an estimate of the impact of a unit change in the independent variable on the dependent variable (Holmes et al., 2018). The coefficient we use in a linear regression is the slope, or the rise over the run. However, this week we learned about another kind of coefficient, the coefficient of determination which is the explained variation over the total variation (Chamberlain University, 2021). I am not sure which coefficient the authors are referring to in the article.
The grey shaded areas around each line show the 95% confidence interval for the quantile estimates. It is interesting to note the narrowness of the spread of the confidence interval around the line in the “Age” graph and the “BMI at age 20” graphs in comparison to the other four graphs even though they are all at the 95% confidence level. We all know now that a narrow confidence interval is preferred over a wide one (Holmes et al., 2018).
To answer the final question, which statistic would show the value of that regression line in understanding BMI, I’d give more weight (pardon the pun) to the statistics of “Age” and “BMI at age 20” due to the narrowness of the confidence intervals, but also interesting is the way the “Years worked at night” regression line jumps at about the 80th quantile showing a suddenly stronger association in the upper quantiles. That would be an interesting area to investigate.
Chamberlain University. (2021). MATH225. Week 8 Slide Deck [Online lesson]. Downers Grove, IL: Adtalem.
Holmes, A., Illowsky, B., & Dean, S. (2018). Introductory business statistics. OpenStax.
Juvanhol, L.L., Lana, R.M., Cabrelli, R., Bastos, L.S., Nobre, A.A., Rotenberg, L., Griep, R.H. (2016). Factors associated with overweight: are the conclusions influenced by choice of the regression method? BMC Public Health 16, 642. http://doi.org/10.1186/s12889-016-3340-2