Problem # 1.1. In the 2018 election for Senate in California, a CNN exit poll of

Problem # 1.1. In the
2018 election for Senate in California, a CNN exit poll of 1882 voters stated
that 52.5% voted for the Democratic candidate, Diane Feinstein. Of all 11.1
million voters, 54.2% voted for Feinstein.
(a)   
What was the (i) subject, (ii) sample, (iii)
population? (a) Your answer goes her
Problem # 1.2. The Students data file  Students.csv responses of a class of 60 social
science graduate students at the University of Florida to a questionnaire that
asked about gender (1 = female, 0 = male), age, hsgpa = high school GPA (on a
four-point scale), cogpa = college GPA, dhome = distance (in miles) of the
campus from your home town, dres = distance (in miles) of the classroom from
your current residence, tv = average number of hours per week that you watch
TV, sport = average number of hours per week that you participate in sports or
have other physical exercise, news = number of times a week you read a
newspaper, aids = number of people you know who have died from AIDS or who are
HIV+, veg = whether you are a vegetarian (1 = yes, 0 = no), affil = political
affiliation (1 = Democrat, 2 = Republican, 3 = independent), ideol = political
ideology (1 = very liberal, 2 = liberal, 3 = slightly liberal, 4 = moderate, 5
= slightly conservative, 6 = conservative, 7 = very conservative), relig = how
often you attend religious services (0 = never, 1 = occasionally, 2 = most
weeks, 3 = every week), abor = opinion about whether abortion should be legal
in the first three months of pregnancy (1 = yes, 0 = no), affirm = support
affirmative action (1 = yes, 0 = no), and life = belief in life after death (1
= yes, 2 = no, 3 = undecided). You will use this data file for some exercises
in this book. (a) Practice accessing a data file for statistical analysis with
your software by going to the book’s website and copying and then displaying
this data file.
(a)   
Your answer goes her
(b)   
Using responses on abor, state a question that
could be addressed with (i) descriptive statistics, (ii) inferential statistics
Problem # 1.3. Identify each of the following variables as
categorical or quantitative: (a) Number of smartphones that you own; (b) County
of residence; (c) Choice of diet (vegetarian, nonvegetarian); (d) Distance, in
kilometers, commute to work 1 Your answer goes here
Problem # 1.4. Give an example of a variable that is (a)
categorical; (b) quantitative; (c) discrete; (d) continuous Your answer goes
her
Problem # 1.10. Analyze the Carbon_West  Carbon_West.csv  data file at the book’s website by
(a) constructing a frequency distribution and a histogram,
(b) finding the mean, me[1]dian, and standard
deviation. Interpret each.
Problem # 1.11. According to Statistics Canada, for the
Canadian population having income in 2019, annual income had a median of
$35,000 and mean of $46,700. What would you predict about the shape of the
distribution? Why? Your answer goes here
Problem # 1.13. A report indicates that public school
teacher’s annual salaries in New York city have an approximate mean of $69,000
and standard deviation of $6,000. If the distribution has approximately a bell
shape, report intervals that contain about (a) 68%, (b) 95%, (c) all or nearly
all salaries. Would a salary of $100,000 be unusual? Why? Your answer goes here
From the Murder data  Murder.csv
at the book’s website, use the variable murder, which is the murder rate (per
100,000 population) for each state in the U.S. in 2017 according to the FBI
Uniform Crime Reports. At first, do not use the observation for D.C. (DC).
Using software:
(a) Find the mean and standard deviation and interpret their
values.
(b) Find the five-number summary, and construct the
corresponding box plot. Interpret.
(c) Now include the observation for D.C. What is affected
more by this outlier: The mean or the median? The range or the inter-quartile
range?
Problem # 1.18. The Income data  Income.csv at the book’s website reports
annual income values in the U.S., in thousands of dollars.
(a) Using software, construct a histogram. Describe its
shape. (b) Find descriptive statistics to summarize the data. Interpret them.
(c) The kernel density estimation method finds a smooth-curve approximation for
a histogram. At each value, it takes into account how many observations are
nearby and their distance, with more weight given those closer. Increasing the
bandwidth increases the influence of observations further away. Plot a
smooth-curve approximation for the histogram of income values. Summarize the
impact of increasing and of decreasing the bandwidth substantially from the
default value. (d) Construct and interpret side-by-side box plots of income by
race (B = Black, H = Hispanic, W = White). Compare the incomes using numerical
descriptive statistic
Problem # 1.19. The Houses data  Houses.csv  at the book’s website lists the selling price
(thousands of dollars), size (square feet), tax bill (dollars), number of
bathrooms, number of bedrooms, and whether the house is new (1 = yes, 0 = no)
for 100 home sales in Gainesville, Florida. Let’s analyze the selling prices.
(a) Construct a
frequency distribution and a histogram. Describe the shape. (b) Find the
percentage of observations that fall within one standard deviation of the mean.
Why is this not close to 68%? (c) Construct a box plot, and interpret. 3 (d)
Use descriptive statistics to compare selling prices according to whether the
house is new.