Question 1
Suppose you are a data scientist in IMDB and would like to develop a machine-learning model to do the sentiment analysis of movie reviews. Your goal is to classify the movie viewers’ reviews as either positive or negative. In this question, you would like to use the IMDB database, which contains 50,000 records of Internet movie reviews, as your dataset. You are to use the following code to download the dataset:
By default, half of the data are saved as the train_data while the remaining half are saved as the test_data. The targeted response variable is the label which you are going to predict. Being a positive review is labelled as 1 and being a negative review is labelled as 0.
Stuck with a lot of homework assignments and feeling stressed ?
Take professional academic assistance & Get 100% Plagiarism free papers
(a) (Python code) Suppose you would like to use a neural network method to build the machine learning model. The designed neural network architecture is summarised as follows:
• Input layer size: 5,000
• First hidden layer size: 32, followed by a ‘relu’ activation function.
• Second hidden layer size: 16, followed by a ‘relu’ activation function.
• Third hidden layer size: 8, followed by a ‘relu’ activation function.
• Output layer size: 1, followed by a ‘sigmoid’ activation function.
• Optimizer: ‘rmsprop’, loss: ‘binary_crossentropy’, metrics: ‘accuracy’
• Batch size: 512, epochs=10
Use the keras library to implement the above neural network.(b) Report the training losses, testing losses, training accuracy, and testing accuracy of the above model (results are rounded to three decimal points). Discuss your findings about the results.
(c) (Python code) Find the optimal value of “epochs” in your neural network model. [Remark: You are encouraged to use plots to improve the clarity of your explanation.]
(d) (Python code) Propose at least one way to improve your model’s performance. You need to report the accuracy of your improved model on the testing dataset, take a screenshot of your code, and discuss how the chosen method can improve the model’s performance.
(e) Suppose a confusion matrix with 100 samples is used to evaluate the performance of the machine learning model. The results are summarized as follows
(f) (Python code) You realize that the logistic regression model is also commonly used for a classification task. Implement the coding of the logistic regression model by using sklearn.linear_model library, report its testing accuracy and compare the suitability of logistic regression and the above neural network for this task.
Hire a Professional Essay & Assignment Writer for completing your Academic Assessments
The post FIN313: Machine Learning and AI for FinTech Assignment, SUSS, Singapore: Suppose you are a data scientist in IMDB and would like to develop a machine-learning model to do the sentiment appeared first on My Assignment Help SG.