|ANL305:Association and Clustering|
A research study reported that one in five hotel bookings are canceled before the guest arrives at the hotel. A last-minute cancellation or no-show means the room would be left unoccupied, which results in a loss of revenue for the day. Even if the hotel manages to sell the room again, it would probably be at a much lower rate because it is too near to the check-in date.
For a hotel, a high cancellation ratio not only impacts the revenue but also leads to poor marketing and low market share. To improve business, the marketing manager of hotel H would like to learn more about the booking cancellation issue.
As a data analyst in the marketing department, you are given a relevant dataset Hotel_Booking_Cancellation.csv, which contains a total of 2,417 hotel booking records. Your job is to help the hotel understand the possible factors that would affect the booking cancellation and assist your manager in evaluating the usefulness of the constructed models for the given problem. The details of the dataset are as following:
(a) Describe the business objective and data mining objective of the given business problem. Appraise whether the Association Rule Mining method is appropriate to study this problem.
(b) Construct an Apriori model on the dataset using IBM SPSS Modeler. The model details and interpretation of the results should include the following:
(i) Based on the given business problem, discuss how to set the measurement and role setting of the fields and report the screenshot of the final settings. (Note: exclude variable lead_time from the association analysis).
(ii) Set the Minimum Support = 10%, Minimum Confidence = 60%, Maximum number of antecedents = 5. Report the number of rules generated, and give a screenshot of the rules.
(iii) Analyse the generated rules by considering various evaluation measures, is there any pattern to help understand booking cancellations?
(iv) Change the parameter setting to: Minimum Support = 1%, Minimum Confidence = 10%, Maximum number of antecedents = 5, Evaluation measure = Confidence Difference, and Evaluation measure lower bound = 50. Report the generated rules, compare and discuss the similarity and (or) differences among them and the rules generated in part (iii).
(c) Suppose your line manager would like to study the possible impact of lead_time using the Apriori algorithm, discuss whether data preparation is required. If your answer is Yes, illustrate the details of data preparation and report the processed result.
(Note: you may use IBM SPSS Modeler or any other tools to do the data preparation if any, but no need to implement association analysis)