CSCI312:Conceptual modelling of a data warehouse, implementation of 0NF tables in HQL: Big Data Management Assignment, UOW, Singapore

Scope

The objectives of Assignment 2 include conceptual modelling of a data warehouse, implementation of 0NF tables in HQL, implementation of external tables in HQL, and querying a data cube.

Task 1

Consider the following conceptual schema of an operational database owned by a multinational real estate company. The database contains information about the real estate properties offered for sale, owners of the properties, potential buyers who are interested in the properties and real estate agents involved in the selling of the properties.

(1)   Find the total number of real estate properties sold per month, year, street, city, country, and agent involved.

(2)   Find an average asked price of real estate properties sold per month, year, street, city, country, and agent involved.

(3)   Find an average final price of real estate properties sold per month, year, street, city, country, and agent involved.

(4)   Find an average period of time on the market of real estate properties sold per month, year, street, city, country, and agent involved.

(5)   Find the total number of times each real estate property has been sold in a given period of time.

(6)   Find the total number of buyers interested in purchases of real estate properties sold per day, month, year, street, city, country, and agent involved.

Note, the operational database does not contain all information necessary to implement the classes of applications listed above. Additional information must be added when data is transferred from an operational database to a data warehouse.

(1)  Use a short explanation of a database domain and a conceptual schema given above, to find a data cube, that should be implemented by the multinational real estate company to create a data warehouse. In your specification of a data cube, list the facts, the measures, the names of dimensions and the hierarchies.

(2)   Pick any three dimensions from a data cube found in the previous step and at least 4 values in each dimension and one measure to draw a sample three-dimensional data cube in a perspective view similar to a view included in a presentation 09 Data Warehouse Concepts, slide 6.

 Deliverables

A file solution1.pdf that contains

(1)   a specification of data cube as a list of facts, measures, dimensions, and hierarchies obtained as a result of task (1),

(2)   a perspective drawing of a three-dimensional data cube as a result of task

Buy Custom Answer of This Assessment & Raise Your Grades
Get A Free Quote

Task 2 (5 marks)

Conceptual modelling of a data warehouse

An objective of this task is to create a conceptual schema of a sample data warehouse domain described below. Read and analyse the following specification of a data warehouse domain.

A person is represented as either a patient or a medical worker or an administration worker. Medical and administration workers work in medical facilities that have a name, address, and possibly (not obligatory) specialization. Each medical worker is described as a unique staff number at a facility, name, address, and phone number.

A patient visits a medical facility for service of a health problem. Each service involves a patient, a medical worker, and an administration worker. The service can be a diagnosis, treatment, or checkup. A description and date of each service are recorded. Time spent on service and the costs are recorded as well.

A patient is eligible for his or her company health care benefits. Patient data includes name, id number (social security number), address (street, city, state, zip), and phone.

A medical worker must hold one or more credentials that are granted to work in a particular medical facility. Doctors are allowed to deliver diagnosis and give treatment based on their specialization Paramedics are allowed to deliver only emergency diagnosis and treatment for any type of life-threatening problems. Nurses do not deliver diagnosis, but they do participate in treatment, particularly if the patient must be prepared for surgery or remain at the facility overnight.

The administration workers are concerned with personnel needs and assignments. Each medical worker must have at most one assignment at a facility. Several administration workers can be assigned to one assignment.

Medical facilities are located in different suburbs of different cities. A medical facility is uniquely identified by an address.

A data warehouse must be designed such it should be possible to easily implement the following classes of applications.

The management of the medical facilities would like to get from a data warehouse information about

the total number of medical services performed per medical facility, per year, per month per day, per city and per medical worker,
the total length of medical services per medical facility, per year, per month per day, per city and per medical worker,
the average length of medical services per medical facility, per year, per month per day, per city and per medical worker,
the total number of doctors/ paramedics/nurses involved in medical services, per year, per month per day, per medical facility, per city,
the average time spent on medical services per year, month, day,
the total costs of medical services per year, month, day, medical facility, and city.

To draw a conceptual schema, use a graphical notation explained to you in lecture 11 Conceptual Data Warehouse Design.

To create a conceptual schema of a sample data warehouse domain, follow the steps listed below.

Step 1  Find a fact entity, find the measures describing a fact entity.

Step 2  Find the dimensions.

Step 3  Find the hierarchies over the dimensions.

Step 4  Find the descriptions (attributes) of all entity types.

Step 5  Draw a conceptual schema.

To draw your diagram, you can use UMLet diagram drawing tool and apply a “Conceptual modelling” notation, Selection of a drawing notation is available in the right upper corner of the main menu of UMLet diagram drawing tool.

Deliverables

A file solution2.pdf with a drawing of a conceptual schema of a sample data warehouse domain.

Task 3 

Implementation of a table with a complex column type (0NF table) in Hive

Assume that we have a collection of semi-structured data with information about the employees (unique employee number and full name) the projects they are assigned to (project name and percentage of involvement) and their programming skills (the names of known programming languages). Some of the employees are on leave and they are not involved in any project. Also, some of the employees do not know any programming languages. Few sample records from the collection are listed below.

007|James Bond|DB/3:30,Oracle:25,SQL-2022:100|Java,C,C++

008,Harry Potter|DB/3:70,Oracle:75|

010,Robin Banks| |C,Rust

009,Robin Hood| |

(1)   Implement HQL script solution3.hql that creates an internal relational table to store information about the employees, the projects they are assigned to (project name and percentage of involvement) and their programming skills.

(2)   Include into the script INSERT statements that load sample data into the table. Insert at least 5 rows into the relational table created in the previous step. Two employees must participate in few projects and must know few programming languages. One employee must participate in few projects and must not know any programming languages. One employee must know few programming languages and must not participate in any projects. One employee must not know programming languages and must not participate in the projects.

(3)   Include into the script SELECT statements that list the contents of the table. When ready, use a command line interface beeline to process a script solution3.hql and to save a report from processing in a file solution3.rpt. If the processing of the file returns the errors, then you must eliminate the errors!

Deliverables

A file solution3.rpt with a report from the processing of HQL script solution3.hql. The report MUST NOT include any errors, and the report must list all SQL statements processed.

Task 4 

Implementation of a data warehouse as a collection of external tables in Hive

Consider the following two-dimensional data cube.

The data cube contains information about that parts that can be shipped by the suppliers. Download and unzip a file task4.zip. You should obtain a folder task4 with the following files: part.tbl, supplier.tbl, partsupp.tbl. Use an editor to examine the contents of *.tbl files. Note, that the contents of the files can be loaded into the relational tables obtained from the transformation of the two-dimensional data cube given above into the relational table PART, SUPPLIER, and PARTSUPP.

Transfer the files into HDFS.

Implement HQL script solution4.hql that creates the external tables obtained from a step of logical design performed earlier. The external tables must overlap on the files transferred to HDFS in the previous step. Note, that a header in each *.tbl file must be removed before creating the external tables.

Include into solution4.hql script SELECT statements that any 5 rows from each one of the external tables implemented in the previous step and the total number of rows included in each table.

When ready, use a command line interface beeline to process a script solution4.hql and to save a report from processing in a file solution4.rpt.

Deliverables

A file solution4.rpt with a report from the processing of HQL script solution4.hql.

Stuck with a lot of homework assignments and feeling stressed ?
Take professional academic assistance & Get 100% Plagiarism free papers
Get A Free Quote

The post CSCI312:Conceptual modelling of a data warehouse, implementation of 0NF tables in HQL: Big Data Management Assignment, UOW, Singapore appeared first on Singapore Assignment Help.