Assignment Task

Introduction

Australia is formally defined by more than Statistical Area Level 2″ (SA2) distinct geographical re-regions, designed to represent communities of between 3000-25000 people “that interact together socially and economically”. In this assignment, we’ll focus on the 350+ SA2s within the Greater Sydney area, and you will be tasked with spatially integrating several datasets of various formats to calculate a score for each region.

The picture on the left, provided by State Records NSW, is set in the 1950s, and entitled “Bustling Sydney” – an interesting way to describe our city. In many respects, is quite “bustling” indeed, but the argument could easily be made that the appeal of Sydney that it doesn’t at all feel like a big city, given its close proximity to natural beauty (beaches, national parks, etc), overall low population density, and relatively small CBD area. Your task in this assignment is to develop “a “bustling” metric for each SA2 region of Greater Sydney, in an attempt to quantify just how busy the districts within our city are.

Preparation

Form a group of 2-3 students (within your enrolled tutorial where possible, or with your tutor’s permission otherwise). Initial data loading and cleaning should be completed in Python, then SQL should be used to merge datasets and produce scores. This code should be collated in a neat, concise Jupyter Notebook file.

This unit’s Week 8 tutorial covers instructions for managing spatial data and the installation of PostGIS (the spatial extension of PostgreSQL) on your local database server.A shapefile of the SA2 digital boundaries can be accessed on the ABS website here. Use these, alongside the data sources on Canvas, to complete the tasks below.

Tasks

1. Import all datasets (clean if required) into your PostgreSQL server, using a well-defined data schema. These sources include:

SA2 Regions: Statistical Area Level 2 (SA2) digital boundaries (feel free to filter this down to the “Greater Sydney” GCC). Businesses: Number of businesses by industry and SA2 region, reported by turnover size ranges.
Stops: Locations of all public transport stops (train and bus) in General Transit Feed Specification (GTFS) format.

Polls: Locations (and other premises details) of polling places for the 2019 Federal election. Schools? Geographical regions in which students séust live to attend primary, secondary and future Government schools.

Population: Estimates of the number of people living in each SA2 by age range (for “per capita” calculations). Income: Total earnings statistics by SA2 (for later correlation analysis).

2. Compute a score for how “well-resourced” each individual neighbourhood is according to the formula provided on the next page, where S is the sigmoid function, z is the normalised z-score, and ‘young people’ are defined as anyone aged 0-19. Feel free to only calculate scores for SA2 regions with a population of at least 100, and you are welcome to extend the scoring function however you deem necessary, so long as rational explanation is provided (e.g. other mathematical standardisation techniques, mitigating the impact of outliers, calculating some metrics per-capita or per-sqkm, etc).

As a small means of encouraging extensions of the basic suggested scoring function, note that the business definition is intentionally broad – select a cross-section of specific industries within the provided dataset (e.g. “Retail Trade”) that you believe will be the best reflection of how “bustling” the area is (describe your rationale in the report) and use this to calculate the component.

3. Extend the score by sourcing one additional dataset for each group member, and then incorporating all new datasets into your scoring function. For full marks, at least one dataset should be of spatial data, and at least one should be of a type not used so far in this assignment (e.g. JSON, XML, or collated via web scraping). Almost any subject matter is permissible, so long as it can be justified as relevant to the calculation of our “bustling” metric (e.g. public facilities, other census statistics, local wildlife, etc). I For either version of your scoring function (or both!), the following subtasks should also be achieved:

Visualise your score in an engaging way, and summarise key results in a table (ideally including a useful map-overlay visualisation, or an interactive graph).
Include in-depth analysis into your results. Note interesting findings, discuss their limitations, and summarise key conclusions.Determine if there is any correlation between your score and the median income of each region.
Ensure at least one useful index (ideally spatial) has been used for your calculations

Assignment Task

Related posts: