Dataset: https://www.kaggle.com/datasets/himanshu9712/nypd-… 1. Import the dat

Dataset: https://www.kaggle.com/datasets/himanshu9712/nypd-…

1. Import the data to clean

The data must imported using API code. We will cover most of these in class.

Kaggle API

o https://www.kaggle.com/code/donkeys/kaggle-python-… Pandas_datareader

o https://pypi.org/project/pandas-datareader/
Pandas read_ functions (read_html, read_csv, read_json, read_table)

o *Read_csv() should point to the ONLINE source, not a local file Twitter, reddit, and other APIs available to download from pypi.org

2. Cleanup

Identify and profile the data
o Are there incorrect columns names?

Standardize the data
o Uppercases, lowercases, dates and times, dashes in ssn or phone, etc

Deal with missing data o NANs

Remove unnecessary columns or rows
Report how many records were deleted or updated