CURTIN UNIVERSITY

School of Population Health

EPID6001 Quantitative Methods (Curtin)/ EPID6002 Quantitative Methods (OUA)

Assignment Case study application (CSA)

Session/Semester 2, 2023

PLEASE READ THE INSTRUCTIONS CAREFULLY BEFORE YOU COMMENCE.

INSTRUCTIONS:

• This assessment is a written assignment and due on Monday of Week 14, 23rd October 2023 by 3pm (AWST).

• This assignment has 55 total marks and counts towards 50% of your final mark for this unit. It is a requirement of this course that all assessments be completed on an independent basis (i.e., your own work).

• For completing the assignment

1) Step 1: Click Assessments ? Assessment 3 – Case Study Application (CSA), carefully READ the Assessment Information.

2) Step 2: Download the assignment CSA Assignment Sem2 2023.doc to your own computer. Open the assignment and Sign/type the Declaration form electronically.

3) Step 3:

a. For Case Study Application One: Download the given dataset to your own computer. Open the dataset using Stata. Answer all questions listed in the CSA assignment. You need to use Stata for completing this assignment. You also need to copy and paste relevant Stata outputs to the assignment against those questions, which request you to do so, but do not include more than one copy of each table/graph. Marks will be deducted for missing outputs. Do not submit Stata output separately with your assignment.

b. For Case Study Application Two: Download relevant paper (Please carefully READ the Assessment Information to understand which paper you should use) to your own computer. Open the paper and Answer all questions listed in the CSA assignment.

4) Step 4: Save your completed assignment as one Word document (other format will not be accepted for marking) by a name of “SURNAME_StudentID_CSA.docx”. For example, George Smith will save his assignment as “SMITH_12345678_CSA.docx”, and now you are ready for submission.

• For submitting your assignment

1) Step 1: Submit your assignment to Turnitin via the – Assignment (CSA) Sem2 2023-

2) Step 2: Make revision according to the Originality Report from Turnitin in Step 1. You need to resubmit a revised Assignment to -Assignment (CSA) Sem2 2023: Revision 1-.

3) Step 3: Make revision according to the Originality Report from Turnitin in Step 2. You need to resubmit a further revised Assignment to -Assignment (CSA) Sem2 2023: Final-. This is the final version for marking.

• Please note

1) Use of generative artificial intelligence (AI) is not permitted for this assessment.

2) The assignment will not be accepted unless the Declaration below is signed (or typed). All forms of plagiarism, cheating and unauthorised collusion are regarded seriously by the University and could result in penalties including failure and possible exclusion from the University. Please do make sure you always avoid plagiarisms! (check information: https://www.curtin.edu.au/students/essentials/rights/academic-integrity/).

3) Do plan well to avoid a late submission!

4) Late assessment policy (Details can be found from unit outline): if a student does not have an approved assessment extension (This ensures that the requirements for submission of assignments and other work to be assessed are fair, transparent, equitable, and that penalties will be consistently applied in this unit):

i. For assessment items submitted within the first 24 hours after the due date/time, students will be penalised by a deduction of 5% of the total marks allocated for the assessment task;

ii. For each additional 24 hour period commenced an additional penalty of 10% of the total marks allocated for the assessment item will be deducted; and

iii. Assessment items submitted more than 168 hours late (7 calendar days) will receive a mark of zero.

5) You are required to keep a copy of the completed assignment for your own record.

Please do contact your lecturer/tutor if you have any queries not covered in the explanations given: Dr Yun Zhao: y.zhao@curtin.edu.au

Declaration

As I type (sign) my name below,

? I declare that the submitted assignment/assessment is my own work and has not previously been submitted for assessment.

? I have conducted the analyses, interpreted and answered all questions in this assignment/assessment myself independently without any collaboration, collusion or academic misconduct!

? This work complies with Curtin University rules concerning plagiarism and copyright.

? I understand that all forms of plagiarism, cheating and unauthorised collusion are regarded seriously by the University and could result in penalties including failure and possible exclusion from the University.

? I unconditionally accept any action that may be taken should Curtin University consider that an infringement of the Statute No.10 – Student Discipline has occurred (see page 9 of unit outline).

__________________________ ______________________ _______________

Name & ID of student Signature of student Date

Case Study Application One

(All students need to complete this Case Study Application One)

(Total: 35 marks)

A researcher would like to identify factors associated with breastfeeding status at 6 months postpartum. Using a random sample (n=608) from a study, the researcher collected some relevant variables (including antenatal class attendance, delivery methods, pacifier use, infant formula use and age at when infant were given solid food) and saved in a dataset CSA DatasetForQ1 BF623 Sem2 2023.dta. The information of the variables in the dataset are given below:

Variable Name Description

BFStatus Breastfeeding status at 6th month (0 = NotBF, 1 = YesBF)

MumAge Mother’s age (1 = 25 yrs, 2 = 25 to 29 yrs, 3 =30+ yrs)

BWT Birth weight (1 = 2500 to 2999 grams, 2 = 3000 to 3499 grams, 3 =3500+ grams grams)

AntenatalClass Attended any classes about breastfeeding during pregnancy (1=Yes, 2=No)

DeliveryMethod Delivery Method (1= Normal delivery, 2 = C-section)

PacifierUse Pacifier use (0= No, 1 = Yes)

FormulaUse Infant formula used after delivery at hospital (0=No, 1 =Yes)

AgeSolids Infant age at when solids food was introduced (in weeks)

The main research question of this study is “whether infant age at when solids food was introduced is significantly associated with breastfeeding status at 6 months postpartum?” In addition, the researcher would like to understand the association between pacifier (or formula use) and breastfeeding status at 6 months postpartum.

Furthermore, the researcher would like to predict the probability of breastfeeding for infants with different personal characteristics. Use a 5% significance level for all statistical tests and conclusions. Use evidence (e.g., p values) from Stata outputs to support your answers.

Hint: You may find it helpful to follow the strategy for analyses given in computing lab Logistic Regression I &II.

1. (3 marks) Given this data, to answer the research questions, you need to help the researcher identify:

1.1. Which variable is the dependent variable (DV)?

1.2. Which variables are the independent variables (IV)?

1.3. Whether the researcher has a primary study variable of interest? If yes, which one?

1.4. Which kind of regression analysis that the researcher should use (you need to make a short justification for your suggestion)

Your Answer:

__________________________________________________________________________

__________________________________________________________________________

__________________________________________________________________________

2. (4 marks) The researcher would like to assess whether any one of the following variables (i.e., (i.e., AntenatalClass, DeliveryMethod, PacifierUse, FormulaUse) may confound the association between AgeSolids and the breastfeeding status at 6 months postpartum. You need to help the researcher assess the possible confounding effect using the steps covered in our lectures/labs and fill the table below. [Hint: assess the confounding effect of each variable separately. You can choose to answer any two of the questions, e.g., a) and d)].

Attach relevant Stata outputs here.

Question: whether any one of the following variables is a confounder Your conclusion/comments and supporting evidence based on your outputs

a) Is AntenatalClass a confounder?

b) Is DeliveryMethod a confounder?

c) Is PacifierUse a confounder?

d) Is FormulaUse a confounder?

3. (3 marks) The researcher would like to know whether the association between AgeSolids and the breastfeeding status at 6 months postpartum is modified by any one of the following variables (i.e., AntenatalClass, DeliveryMethod, PacifierUse, FormulaUse) independently. Help the researcher answer the following questions with evidence from your analysis [Hint: assess the effect modification of each variable separately. You can choose to answer any two of the questions, e.g., f) and g)].

Attach relevant Stata outputs here.

Question: whether the effect of AgeSolids on breastfeeding status at 6 months is modified by any one of the given IVs? Your conclusion/comments and supporting evidence based on your output

e) Do AntenatalClass and AgeSolids interact each other?

f) Do DeliveryMethod and AgeSolids interact each other?

g) Do PacifierUse and AgeSolids interact each other?

h) Do FormulaUse and AgeSolids interact each other?

4. (10 marks) To answer the main research question: “whether infant age at when solids food was introduced is significantly associated with breastfeeding status at 6 months postpartum?”, the researcher asks your help to build up an appropriate parsimonious regression model.

4.1 Perform the multiple regression analysis that you recommended (in Question 1) without consideration of any interactions. List the factors/predictors that are significantly associated with breastfeeding status at 6 months in your final parsimonious regression model in the table below. (4 marks)

Attach Stata outputs (Hint: reporting a table with Odds Ratios) here to show your modelling procedure step by step on achieving the final parsimonious model.

Factor name in your final model Adjusted odds ratio 95% Confidence Interval p-value

Note you need to round your figures to 3 decimal places and clearly indicate reference group.

4.2 (6 Marks) Interpret the adjusted odds ratios and corresponding 95% CI related to each relevant factor you listed in the above table and answer the following research questions. Please note you need to use evidence from the Stata output to support your answers.

Research question 1): “whether infant age at when solids food was introduced is significantly associated with breastfeeding status at 6 months postpartum?”

Your Answer:

___________________________________________________________

___________________________________________________________

___________________________________________________________

___________________________________________________________

Research question 2): “If pacifier use is significantly associated with breastfeeding status at 6 months postpartum?”

Your Answer:

___________________________________________________________

___________________________________________________________

___________________________________________________________

___________________________________________________________

Research question 3): “If feeding infant formula after delivery at hospital is significantly associated with breastfeeding status at 6 months postpartum?”

Your Answer:

___________________________________________________________

___________________________________________________________

___________________________________________________________

___________________________________________________________

5. (7 marks) Using your final parsimonious regression model in Question 4.1, the researcher would like to predict the probability of being breastfed at 6 months postpartum for some infants based on their own specific information. He asks your help to build up an appropriate regression model for this purpose.

Attach Stata output (Hint: reporting a table with Coefficients) here.

5.1 (2 marks) Develop the multiple regression equation (coefficients are round up to 3 decimal places) based on your Stata output. P is the probability of breastfeeding at 6 months postpartum.

=______________________________________________________________

5.2 (4 marks) Now based on the above model, help the researcher calculate the predicted probability P of being breastfed at 6 months postpartum.

a) For an infant who was not given any infant formula after delivery at hospital but used a pacifier and given solid food in week 26. Show your steps of the prediction and make a brief comment on your prediction (2 Marks)

Your answer:

___________________________________________________________________

___________________________________________________________________

b) For an infant who was given an infant formula after delivery at hospital, but never used a pacifier and was fed by solid food late in week 30. Show your steps of the prediction and make a brief comment on your prediction. (2 Marks)

Your answer:

___________________________________________________________________

___________________________________________________________________

5.3 (1 Mark) Two friends of the researcher have 5 years gap in their age, i.e., one is a 25-year-old and the other 30-year-old. However their infants were fed by solid food at the same week of age, and both infants had exactly same status related to formula and pacifier use. Using the model in Question 5.1, the researcher concluded that the probabilities of breastfeeding at 6 months postpartum of his two friends are different due to the 5 years gap in their age and their baby’s conditions. Do you agree with the researcher?

Yes. I agree. Justify your agreement.

_______________________________________________________________________

_______________________________________________________________________

No. I disagree. Justify your disagreement.

_______________________________________________________________________

_______________________________________________________________________

6. (8 marks) The researcher further categorized the continuous variable AgeSolids into a 3-level categorical AgeSolidsCat variable (see table below). He asks your help to build a new model by only replacing the continue variable AgeSolids with the categorical variable AgeSolidsCat in your parsimonious regression model (see Question 4.1) but other variables are retained the same in this new model.

Attach Stata output (eg., parameter estimation table with Odds Ratios) here.

Variable Name Description

AgeSolidsCat Categorized based on AgeSolids: (1 = = 20 weeks, 2 = 21-25 weeks, 3 = =26 weeks)

6.1 (1 Mark) Do you think AgeSolidsCat is still a significant predictor of Breastfeeding status at 6th month? Justify your answer. Attach Stata output here.

Your answer:

__________________________________________________________________

__________________________________________________________________

6.2 (3 Marks) The researcher would like to know “based on the current sample, from which infant age (or later) being fed by solid foods, the odds of breastfeeding at 6 months will be significantly increased?” Help the researcher answer this question by interpreting the adjusted odds ratios (and 95% CI) related to AgeSolidsCat. You need to use evidence from the Stata output to support your answer.

Your answer:

__________________________________________________________________

__________________________________________________________________

6.3 (2 Marks) The researcher is confused which of the models, [namely, the one with continuous AgeSolids (see Question 4.1) and the model with the categorical AgeSolidsCat (this question Q6)], is better. Convince the researcher on which model you would use to explain the association between AgeSolids and the breastfeeding status at 6 months postpartum. Justify your choice using evidence.

Your answer:

__________________________________________________________________

__________________________________________________________________

6.4 (2 Marks) The researcher is happy with your help on his data analysis. He would like to take your further advice on his new prospective cohort study with 6 months follow-up. This study aims to identify the factors associated with the length of stay (LoS) at hospital among patients with a type of lung disease. Variables collected include main outcomes [LoS in days, and discharge status], demographic and socioeconomic characteristics [age (in years), gender, smoking status, BMI, education, marital status, job type, insurance status], clinical information [symptoms of the disease, severity of the disease, comorbidities and medical history]. He needs your advice on

i. which methods he should use to describe the distribution of the LoS at hospital,

ii. which methods he should use to compare the LoS at hospital between groups (for example, between genders or between smoking groups),

iii. which of the regression models covered in EPID6001 is an appropriate model that he should use to identify the factors associated with the length of stay (LoS).

You need to make a short justification to your answers.

Your answer:

__________________________________________________________________

__________________________________________________________________

Case Study Application Two

(Only students with a Random Allocation number “1” need to complete this Case Study Application Two using the following paper CSA PaperForQ2 Sem2 2023#1.pdf).

(Total: 20 marks)

This case study application uses information from a published paper “Ekholuenetale M, Wegbom AI, Tudeme G, Onikan A. Household factors associated with infant and under-five mortality in sub-Saharan Africa countries. Int J Child Care Educ Policy. (2020) 14:1–15. doi: 10.1186/s40723-020-00075-1” (It is attached with the assignment as CSA PaperForQ2 Sem2 2023#1.pdf in Blackboard).

1. (6 marks) Briefly describe the study to answer the following questions:

1.1. What were the study design and research aim?

1.2. How were the participants recruited: where, when, and how many?

1.3. List and comment briefly two main strengths and two limitations of the study

Your Answers:

___________________________________________________________

___________________________________________________________

2. (6 marks) Based on the paper information, answer the following questions:

2.1 What are the events of interest in this paper? How the authors calculated the survival time?

2.2 List possible reasons for censored observations.

Your Answers:

___________________________________________________________

___________________________________________________________

2.3 Only consider the -under-five mortality”, complete the table below for 10 children in sub-Saharan Africa countries with different conditions, where censoring status is coded “1” for event and “0” for censored. Use 30 days = 4 weeks = 1 month, 12 months =1 year, 60 months = 5 years in the calculation.

id Conditions Survival time (month) Censoring status

1 Died after 55.5 months of birth

2 Died after 44 weeks birth

3 Still alive at the last follow-up time

4 Alive after 300 days after birth but mother refuses the follow-up

5 Lost to follow up because family moved to other city but still alive on 1.5 years after birth

6 Died at 2.25 years after birth

7 Still alive at 54 weeks after birth but lost contacts

8 Drop-out, i.e. the follow-up being discontinued but alive at 69 weeks after birth

9 Died at 28 days after birth

10 Alive and celebrate five-year birthday

3. (8 marks) Based on the paper information, answer the following questions:

3.1 Which statistical regression analysis was used in this paper to achieve its research objective? Do you think the authors used a correct model? And why?

Your Answers

___________________________________________________________

___________________________________________________________

3.2 Did the authors assess the assumption associated with the regression model used? List at least two methods you learnt from our unit.

Your Answers

___________________________________________________________

___________________________________________________________

3.3 Do you think “Household wealth quintiles” in Model II (see Table 3 and Table 4) is a significant factor? If you were the authors, which Stata syntax/command you can use to obtain an overall p value for this factor?

Your Answers

___________________________________________________________

___________________________________________________________

3.4 Only refer to Table 3 of this paper, choose one of the factors in Model II to interpret its effect (along with the corresponding 95% CI) on infant mortality using your own words.

Your Answers

___________________________________________________________

___________________________________________________________

3.5 Based on Table 3 and Table 4, do you think Model II and Model IV are parsimonious models? You need to make a justification using evidence from the two tables. List a regression method you learnt from our unit for building up a parsimonious model.

Your Answers

___________________________________________________________

___________________________________________________________

Case Study Application Two

(Only students with a Random Allocation number “2” need to complete this Case Study Application Two using the following paper CSA PaperForQ2 Sem2 2023#2.pdf).

(Total: 20 marks)

This case study application uses information from a published paper “David T Doku, Subas Neupane, Survival analysis of the association between antenatal care attendance and neonatal mortality in 57 low- and middle-income countries, International Journal of Epidemiology, Volume 46, Issue 5, October 2017, Pages 1668–1677, https://doi.org/10.1093/ije/dyx125” (It is attached with the assignment as CSA PaperForQ2 Sem2 2023#2.pdf in Blackboard).

1. (6 marks) Briefly describe the study to answer the following questions:

1.1 What were the study design and research aim?

1.2 How were the participants recruited: where, when, and how many?

1.3 List and comment briefly two main strengths and two limitations of the study

Your Answers:

___________________________________________________________

___________________________________________________________

2. (6 marks) Based on the paper information, answer the following questions:

2.1 What are the events of interest in this paper? How the authors calculated the survival time?

2.2 List possible reasons for censored observations.

Your Answers:

___________________________________________________________

___________________________________________________________

2.3 Complete the table below for 10 neonates in one of the 57 low- and middle-income countries with different conditions, where censoring status is coded “1” for event and “0” for censored. Use 30 days = 4 weeks = 1 month, 12 months =1 year, 60 months = 5 years in the calculation.

id Conditions Survival time (days) Censoring status

1 Died 2.5 days after birth

2 Died 6 hours after birth

3 Still alive at the last follow-up time

4 Alive after 120 hours after birth but mother refuses the follow-up

5 Lost to follow up because family moved to other city but still alive on 7th day after birth

6 Died at 12 hours after birth

7 Still alive at the last follow-up time

8 Drop-out, i.e. the follow-up being discontinued but alive on 4th day after birth

9 Died at 18 hours after birth

10 Died 108 hour after birth

3. (8 marks) Based on the paper information, answer the following questions:

3.1 Which statistical regression analysis was used in this paper to achieve its research objective? Do you think the authors used a correct model? And why?

Your Answers

___________________________________________________________

___________________________________________________________

3.2 Did the authors assess the assumption associated with the regression model used? List at least two methods you learnt from our unit.

Your Answers

___________________________________________________________

___________________________________________________________

3.3 Do you think “Number of ANC visits” (see Table 2) is a significant factor? If you were the authors, which Stata syntax/command you can use to obtain an overall p value for this factor?

Your Answers

___________________________________________________________

___________________________________________________________

3.4 Only refer to Table 2 of this paper, choose one of the factors to interpret its effect (along with the corresponding 95% CI) on neonatal mortality using your own words.

Your Answers

___________________________________________________________

___________________________________________________________

3.5 Based on Figure 3, the authors concluded that “The Europe and Central Asia region experienced better survival, whereas the South Asia region had the worst survival”. Do you agree with their conclusion? Do you think Africa experienced a better neonatal survival compared to other three regions (East Asia & Pacific, Latin America & Caribbean, and Middle East & North Africa)? You need to make a justification using evidence illustrated in Figure 3.

Your Answers