Description The final project for BUAN 406 is a chance for students to use metho

Description
The final project for BUAN 406 is a chance for students to use methods learned throughout the course regarding Business Analytic approaches. This can take whatever form is necessary given the data, and business question/goal and may consist of: 
Statistical hypothesis testing 
Testing particular models, and examination of p-values with regards to specific null hypotheses
Regression Analysis:(chp)
Using Regression model to estimate parameter, interpret the model result and coefficients, R_sqaure, F_statistics, chi-sqaure statitstic etc
Classification Analysis:
Using Clustering or Classification techniques, explain the purpose of the classification methods, and measure the classification performance
The outcome of the project should be a 2-4 page written report containing relevant figures and tables required to justify how you treated the data, the modeling approach you adopted, and the final results. Based on those final results, a business impact analysis is required given the data, and the system being investigated. 
The application of particular approaches can be driven by what you feel is the most appropriate to your chosen dataset, or what set of methods you think would be most useful to you in the future.
The following pages contain a skeleton framework (word limits are a guide only, but try to stick to them, please avoid novels or half-page notes) regarding the desired sections and content of the written report to be used as a guideline when preparing the report. 
Skeleton Report
Summary (150-250 words)
Summary conveying the aim of the report, the data being investigated, a broad overview of the methods applied, and the major findings of those analyses and your interpretation. In the summary, try to keep it as descriptive as possible, keeping the amount of numbers (parameter estimates, p-values, classification error, etc) to a bare minimum.
Introduction (200-400 words)
Section introducing the data in broad terms (i.e. when/where/how it was collected, and sample size), any hypotheses you might have about the data that you want to test, and why, and/or the rationale of the analysis approach. This should finish with a succinct set of business aims/goals/hypotheses that can be referred back to when examining the results.
Descript the sampling plan for the dataset(use your resonable assumption about the sample you saw.)
Methods (~1-2 pages, including tables and/or figures)
Section describing the analysis that you performed on the data, containing:
In-depth examination of the data 
A table showing sample size, types (continuous, discrete, factorial), and description of response variables and predictors might be useful. Summarize of descriptive statistics can also be helpful
How you dealt with the data 
Data cleaning(no missing value), Exploration of different transformations should go here, with the final results section only containing models on the most appropriate transform. 
Analysis background, methods, assumptions, and justification of approach relative to business aims
Background: Who is your, what’s your customer,What is your customer’s business ? Why is the problem important? How does it affect business?
Variable of your model: What is the target (the variable related to the problem)? What are the predictors of the model?
Method:What is the solution? How to deal with the target variable?What are the tools and models used in your solution?
Regression/classification model building/testing approach, diagnostics used to test assumptions if any, how/if predictor interactions were selected
Any technical challenge?
Post-processing steps
If models were used to obtain predictions, how was this done
Results (~1-2 pages, including tables and/or figures)
Section describing the results of the analysis results, including if there is any
Results of modelling approach regarding what predictors were important, direction/shape (if polynomial, or non-linear methods were used) of model effects, and/or statistical significance of predictors
Table giving parameter estimates (and SE or 95% CI), p-values and overall measures of model fit (if relevant)
Plots of raw data (Response ~ Predictors) and fitted model(s) (including model uncertainty)
Any metrics that have been used by you 
Evaluation: What is the metric?  How does it relate to business value ? What is the metric score? What is the benchmark? How good is it compared to the benchmark?
Impact: How will your customer use the results you generate? 
Overall try to keep the number of figures and tables to a minimum required to convey the main results of the business analysis – preferably 2 of each and no more than 3.
Discussion/Conclusions (200-500 words)
Section describing your major findings, your interpretation of those findings (correlation versus causation), what it means with regards to your initial hypotheses, and your understanding of the system. This could also include the impact on business, Any unexpected findings, future steps, and/or project shortcomings if relevant.
A Business Example:
Imagine you are a sales representative. Everyday, you get a long list of leads from the marketing team, and you will choose some of them to make phone calls. 
1. Who is your customer?
Sales representatives
2. What is your customer’s business?
They call leads to sell products
3. What is the problem bothering your customer?
Difficult to identify real customers who will purchase
4. Why is the problem important? How does it affect business?
Wasting time on people who will never purchase affecting the revenue they can generate
5. What is the target (the variable related to the problem)?
Whether a lead is a real customer
6. What is the solution? How to deal with the target variable?
Customer Identifier: predicting who are real customers
7. What is the metric? How does it relate to business value ?
Precision. Doubling Precision doubles revenue.
8. What is the benchmark?
Random selection of leads to call, precision equals to the base rate.
9. How will your customer use the results you generate?
Provide lists of high-ranked leads for the representatives, they will make phone calls
10. What are the tools and algorithms used in your solution?
Logistic Regression, Random Forest; SMOTE for imbalanced data; SHAP for getting insights
11. Any technical challenge? (Frequently asked in interviews. You need a story)
Data collection. The data were from different sources and in various formats. It was very challenging to understand the data and align them. I proactively communicated with the sales and engineering team to get all the data.
12. What is the metric score? How good is it compared to the benchmark?
The precision score is 20%, 100% higher than the traditional method.
13. What is the impact on business?
Doubled the revenue with the same investment on sales
14. Any unexpected findings? (Optional)
Found an unexpected motivation for customers to purchase.
Context: Our customers are sales representatives. They call leads to sell products.
Problem: However, on average, only 10% of the leads they call will purchase. 90% percent of the sales resources are wasted.
Solution: To solve this problem, I built a Customer Detector to help them identify the leads who will purchase.
Impact: The precision score is 20%. It means 2 out of 10 predicted customers will purchase. It doubled the revenue with the same investment on sales.