STA 3064 SAS Project A Overview (15 points for idea discussion + 75 points for the written report = 90 points) Project Description The project provides an opportunity for you to collect, analyze, and interpret data using a simple linear regression model. You should find a topic of interest and provide motivation as to why using a statistical model might be important. Next you must locate a source for data related to the topic, input the observations into SAS in an appropriately structured way, summarize the data, analyze the data using appropriate modeling techniques, and provide discussion. Each of these elements will be addressed in a written report. The project will be completed by students individually. Please consider the following when planning for your project: 1. Where possible, data should be read into SAS from its original source (website, database, text, or Excel file). Externally read data is preferred over data lines in SAS. 2. All analyses will be run in SAS. 3. Only relevant code and output needs to be included into your report (less is more). 4. Your project should follow the entire process of describing a real-world problem, translating it into a statistical problem, solving the statistical problem, translating the solution into meaningful, discipline specific findings. 5. Make sure, to the best of your understanding, that the data has not been analyzed before in the way you plan to for your project. Choosing a Study This project will involve finding a topic for which secondary data may be obtained. Finding data can be one of the biggest challenges and can be the most time-intensive part of the project so you should start your search early. Start with topics that you are interested in/passionate about and then scan journals, websites, and other publications related to those topics. Sources of data are readily available on the web. Below are some places to help you start the process: Google data set search at https://datasetsearch.research.google.com/ Sports related sources are listed at http://community.amstat.org/sis/sportsdataresources Raw economic data can be produced through http://www.quandl.com/ Government maintained data sets can be found at http://catalog.data.gov/dataset Weather data: http://www.weatherbase.com/ Health related data: http://healthdata.gov/ Social research: https://usa.ipums.org/usa/ Links to public data: https://www.springboard.com/blog/free-public-data-sets-data-science-project/ If you study does not have readily available data, you may need to change topics to find one that does. The data set you find should have the following characteristics: 1. The data set should have one variable that is a measurable outcome (continuous response variable) that can serve as your response. This variable should represent an outcome to some event or output of a process. 2. The data set should also consist of one predictor variable which is also quantitative (continuous and measurable). This predictor variable should represent an input to some process or event that potentially impacts the response. 3. A good rule-of-thumb is to have at least 10 times the observations (rows) as you have predictor variables in the study. With just one predictor, you should have at least 10 observations. For this study, at least 20 observations are preferred. Of course, if more observations exist, use them! 4. The data should be relevant to some research question of interest. Research questions or objective statements for the study should be formulated in the context of the application
Posted inUncategorized