Project
100 points + 10 bonus points
Note: This is an individual assignment. Each student MUST complete the work on his/her own.
Any code sharing/plagiarism is not tolerated.
Overview
This project consists of three tasks. The goal is to apply what we have learned to solve real problems in Data Science and Machine Learning. Glance at “What to Submit” when you start working on a task so that you know what information to provide from each task.
Submission Example
csci333-project-XX
csci333-project-XX.doc Task1XX.py task2XX.py task3XX.py README.txt
What to Submit
1. One doc file “csci333-project-XX.doc” including the text source code and screenshots of the outputs of all programs. Please replace XX with your first name and last name. You can copy/paste the text source code from Pycharm or other IDEs into the doc file. Hopefully, based on the screen snapshots of the output, you can show that your programs passed tests and were well.
2. Python files for all programs. In well-defined programs, proper comments are required. For programs without comments, they will be deducted greatly in grade.
Task 1 (20 points): (Class) Write a class named Pet , which should have the following data attributes:
– name (for the name of a pet)
– animal
type (for the type of animal that a pet is. Example values are ‘Dog’, ‘Cat’, and
‘Bird’)
– age (for the pet’s age)
The Pet class should have an init method that creates these attributes. It should also have the following methods:
– set name – This method assigns a value to the name field.
– set animal type – This method assigns a value to the animal type field.
– set age – This method assigns a value to the age field.
– get name – This method returns the value of the
name field.
– get animal type – This method returns the value of the animal type field.
– get age – This method returns the value of the age field.
Once you have written the class, write a program that creates an object of the class and prompts the user to enter the name, type, and age of his or her pet. This data should be stored as the object’s attributes. Use the object’s accessor methods to retrieve the pet’s name, type, and age and display this data on the screen.
Grading Rubric
– 10 points for defining the class and all functions.
– 5 points for a runnable python program with correct testing and display
– 5 points for appropriate comments and screenshots of the output of this program
Task 2(30 points): (Intro to Data Science: Pandas-dataframes) Write a program that does the following tasks with pandas DataFrames:
(a) Create a DataFrame named temperatures from a dictionary of three temperature readings each for three people ’Maxine’, ’James’ and ’Amanda’.
(b) Recreate the DataFrame temperatures in Part (a) with custom indices using the index keyword argument and a list containing ’Morning’, ’Afternoon’ and ’Evening’.
(c) Select from temperatures the column of temperature readings for ’Maxine’.
(d) Select from temperatures the row of ’Morning’ temperature readings.
(e) Select from temperatures the rows for ’Morning’ and ’Evening’ temperature readings.
(f) Select from temperatures the columns of temperature readings for ’Amanda’ and ’Maxine’.
(g) Select from temperatures the elements for ’Amanda’ and ’Maxine’ in the ’Morning’ and
’Afternoon’.
(h) Use the describe() method to produce temperatures’ descriptive statistics.
(i) Transpose temperatures (One example can be found at https://www.geeksforgeeks.org/pythonpandas-dataframe-transpose/).
(j) Sort temperatures so that its column names are in alphabetical order.
Grading Rubric
– 10 points for defining functions.
– 5 points for finishing Task2(a)-(j).
– 5 points for appropriate comments and necessary screenshots of the program.
– 10 points for a runnable python program with correct data visualization.
Task 3 (50 points): (Classification with k-Nearest Neighbors and the Digits Dataset) Read the python program “CaseStudyDemo.py” to learn the algorithm of k-Nearest Neighbors with the Digits dataset for recognizing handwritten digits.
Re-write the python program by doing the following subtasks:
(a) Write code to display the two-dimensional array representing the sample image at index 35 and numeric value of the digit the image represents.
(b) Write code to display the image for the sample image at index 35 of the Digits dataset.
(c) For the Digits dataset, what numbers of samples would the following statement reserve fortraining and testing purposes?
Xtrain, X test, y SHAPE * MERGEFORMAT
train, y SHAPE * MERGEFORMAT
test =
traintestsplit(digits.data,
digits.target, random SHAPE * MERGEFORMAT
state=11,
test SHAPE * MERGEFORMAT
size=0.70)
1
2
(d) Write code to get and display the number of training examples and the number of testingexamples.
(e) Rewrite the list comprehension using a for loop. Hint: create an empty list and then usethe built-in function “append”.
#
wrong =
[(p, e) for (p, e) in zip(predicted, expected) if p != e]
1
2
(f) Explain row 3 of the confusion matrix presented in the example we have studied in the“Intro-to-MachineLearning-Part-II.mp4”:
[ 0, 1,
130, 0, 0, 0, 0, 1, 6, 0]
1
Grading Rubric
– 15 points for finishing Task3(a)-(f).
– 5 points for appropriate comments.
– 20 points for a runnable rewritten python program – 10 points for screen-shots of the program.
Challenges in This Project
1. For 10% extra credit, you are welcome to explore the design of each task. Note: You still have to finish all tasks required by this project.
2. You should configure your machine and PyCharm properly to facilitate the project development.
Reference: [1] Computer Science. https://en.wikipedia.org/wiki/Computer science
—————x———— Good Luck ————x————–