Collusion: Price Pricing
March 24, 2022Modern Technology
March 24, 2022Introduction
The decision tree algorithm makes use of the general idea of Decision Trees. A Decision tree graphically represents all possible answers to a decision. They are used to extract knowledge by exploring different scenarios as a result of making certain sequential decisions.
The tree has a set of nodes (objects containing data). There is a root node that starts from the top. The root node contains branches, which are scenarios that point to child nodes. The final child nodes to which there are no more scenarios are called leaf nodes and they contain answers to preceding scenarios.
How to build a decision tree (Cynthia Rudin 2012)
- Start at the top of the tree which contains the total dataset.
- Using an Attribute Selection Measure (ASM), determine the best attributes.
- Sub-divide the data into smaller sets depending on the best attributes
- Create the root decision node containing the best attribute.
- Recursively create decision nodes using the smaller sets of the data created. Then continue this process until a stage is reached where nodes can no longer be classified and these final nodes will be the leaf nodes.”
ATTRIBUTE SELECTION MEASURES
In the implementation of decision trees, the main problem is how attributes will be selected for the nodes. Two popular techniques used for selecting the best attributes are Information gain and the Gini index.
Information gain is a measurement of change in the amount of uncertainty or information after the sub-division of a dataset based on an attribute.
It calculates how much information an attribute provides us about the data, according to the value of the information gain, the node is split and the decision tree built.
A decision tree algorithm always tries to maximize the value of information gain, an attribute having the highest information gain is the one that will always be split first.
The information gain can be calculated using the following formula:
Information Gain=Entropy (dataset)-[Weighted Average*Entropy(each attribute)]
Where,
Weighted Average is where each data point value is multiplied by the assigned weight, which is then summed and divided by the number of data points.(Akhilesh Ghanti 2020)
Entropy is a measure of the uncertainty or impurity of the data and specifies the randomness in the data. The higher the entropy, the harder it is to draw any conclusions from that information, it is calculated as:
Where,
- Pi is the probability of an event i of state S
- S is the current state
Gini index ‘calculates the amount of probability of a specific feature that is classified incorrectly when selected randomly’ (Neelam Tagi 2020). It performs only binary splits thus it only works with categorical data.it is used in the classification and regression tree (CART) to weight attributes.
It is calculated using the following formula:
Where Pi denotes the probability of an attribute being classified for a distinct class(Neelam Tagi 2020)
Pruning or post pruning Decision trees
‘Pruning means simplifying/compressing and optimizing the decision tree’ (wikipedia 2020) by removing some sections of the tree that are redundant to the classification. The process of splitting, results in fully grown trees where every possible scenario is explored, in most of these cases the tree is likely to overfit(corresponds too closely to a particular set of data) the data leading to poor accuracy on new or unseen data.
Pre-pruning Decision trees
This technique tries to stop the tree building process early. At each stage of the splitting process, The cross validation error (estimation of the difference between expected value and the result ) is checked, when the error does not seem to be changing significantly after multiple splits, then the splitting is stopped. Albeit this may sometimes result in under-fitting if the splitting is stopped too early .
History
The work on tree-based models to solve problems can be traced back to 1963, where Morgan and Sonquist used it in their Automatic Interaction Detection program (AID) (Explorium Data Science Team 2019).
In 1966 there was a publication in psychology where the decision tree methods were used to model the concept of human learning. While doing this, researchers discovered that the algorithm was useful for programming (Explorium Data Science Team 2019).
In 1977, Breiman, Friedman, Olshen and Stone (1984) invented the first classification and regression tree (CART) version which became the universal standard for decision trees. (Explorium Data Science Team 2019).
In 1986 John Ross Quilan proposed a new concept:trees with multiple answers, unlike previous decision tree algorithms whose answers were binary. Quinlan invented ID3(Iterative Dichotomiser 3)which could cover non binary questions. ID3 wasn’t ideal, so its author continued to upgrade the algorithm structure. And this is how C4.5 was born. C4.5 addressed the shortcomings of its predecessor, ‘ranking number 1 in the Top 10 Algorithms in Data Mining pre-eminent paper’ (Springer LNCS, 2008). By that time the algorithm had existed for 15 years.(Explorium data Science Team 2019)
Usage
Decision tree classifier algorithms are well-known algorithms and are used in a wide variety of areas, their ease of use and easy visualization makes them one of the best algorithms for analyzing and understanding data. The algorithm is used in many fields to inform decision making and in making predictions. For instance; it is used to detect fraud in the insurance sector, in the medical sector to classify cancer and in the loaning sector to determine good loans. Interestingly, BP, the oil company, used the algorithm to separate oil and gas on offshore platforms which saved them millions (Cynthia Rudin 2012).e.t.c.
Pros
It is a simple algorithm to explain and interpret as it mirrors decision making in humans(Narula 2019). The algorithm is useful for analyzing and graphically showing how users interact with the streaming service. It can deal with both categorical data and numerical data. For the streaming service, it can answer different types of questions such as which type of users sign up the most and how many people stay on the platform for extended periods of time.
It is comprehensive as all scenarios are considered and all paths to a conclusion can be traced back so further analysis can be done. This feature allows the business to track and see what things discouraged users from signing up.
Decision trees are versatile and can be used to solve many business problems.
Cons
They are inaccurate as they tend to overfit (corresponds too closely to a particular set of data) data and when the data changes the predictions become inaccurate. This can be somewhat minimized by pruning or using a random forest.
The whole tree has to be recreated so as to incorporate new data (Narula 2019).
As the dataset becomes very large, the tree can correspondingly become extremely large and complex which would lead to slow classifications. For the streaming service, the nature of the dataset to be considered is not enormous.
THE PROBLEM
The problem focuses on the paid online services industry and is a relatively familiar problem for many subscription services. It is a marketing and sales problem where online subscription-based services get a low number of paid subscriptions.
The hypothetical online subscription business is trying to decide how to improve its user retention rate and increase sales by assessing different solutions like reducing the price, or adding/removing features, or finding a different way of generating revenue from the service like using ads. They need to assess these solutions and find out which solution will serve the business best.
Solving the sales and marketing problems will increase the revenue of the business. it will also guide the service on how it can be improved which will invite even more users to sign up.
Hypothetically, the problem can be decomposed into smaller categories like pricing, features and marketing. These categories can be decomposed further. For instance, the pricing category could be decomposed into; ‘decrease’, ‘free’ and ‘keep the same’. Each of these problems can be looked at and resolved which would lead to an overall better system and more user subscriptions.
The problem can be generalized into a problem of how to best attract customers and retain them which is a problem found in almost any business. The problem becomes complex to solve if the business is affected by outside factors over which it has no control over.
Impact of social networking on Australian employees’ behaviour
APPLYING DECISION TREE CLASSIFIER ALGORITHM
Imagine a scenario where a business provides an online streaming subscription service. The number of paid subscriptions has been identified as being relatively low compared to users who used the free trial version. Of the total users who signed up for the free trial, only about 20% of those users subscribed to the paid service after their free trial had ended.
The algorithm, in this case, will be used for analysis to aid in the decision-making
process. It will help identify weaknesses in the structure of the streaming service and it will also aid in mapping out the user demographic and preferences. For instance; if the data showed that people searched for certain items, but such said items were unavailable on the service and that the majority of those users who did not find what they wanted were the ones who did not subscribe, the business could use this information to expand the content of the service.
Another instance would be where the results show that many users who went to the platform and stayed on the homepage for a short amount of time were the ones who least subscribed. This could be an indication that they probably did not fancy the interface or it was not engaging enough. Knowing this, the business can implement changes such as redesigning the homepage.
Advantages And Disadvantages Of Using Social Media In Business
The classifier can also identify strengths that can then be reinforced upon. For instance; If the classifier found that majority of the people who watched trailers of upcoming television series offered on the platform subscribed, then the business could hone in on this and make sure that trailers are more accessible.
The algorithm will take in as input user data from all users. The data for which will contain attributes such as the time spent on the website, the user location, what a user spent most of their time doing on the website, when a user was using the service most frequently, e.t.c whether they subscribed to the mailing list. From this data, using the previously explained selection measures the attributes would be chosen.
The data will need to be updated at least monthly since subscriptions are renewed at a monthly interval. The algorithm would also need to be run within the same monthly interval and return results immediately so as to re-analyze and see if changes made to the service were showing a positive trend in the subscriber list. If the results reflect positively, then, further optimizations could be considered to improve the service even further.
The decision tree classifier can be optimized further by ‘pruning’(simplifying and compressing) the tree or bagging (used when our goal is to reduce the variance of a decision tree)( Nagpal, 2017), to increase the type and size of the data set in order to improve the accuracy.
There are other algorithms such as random forest – a more complex configuration of a decision tree – or VFDT(very fast decision tree) which improve on the decision tree by having increased accuracy and by being faster when dealing with large datasets.
References
Anon, 2018. Machine Learning: Pruning Decision Trees. Display. Available at: https://www.displayr.com/machine-learning-pruning-decision-trees/ [Accessed November 14, 2020].
Anon, 2020. C4.5 algorithm. Wikipedia. Available at: https://en.wikipedia.org/wiki/C4.5_algorithm [Accessed November 12, 2020].
Anon, 2020. Classification Algorithms in Machine Learning: How They Work. MonkeyLearn Blog. Available at: https://monkeylearn.com/blog/classification-algorithms/ [Accessed November 16, 2020].
Anon, 2020. Entropy (information theory). Wikipedia. Available at: https://en.wikipedia.org/wiki/Entropy_(information_theory) [Accessed November 15, 2020].
Anon, 2020. The information gained in decision trees. Wikipedia. Available at: https://en.wikipedia.org/wiki/Information_gain_in_decision_trees [Accessed November 16, 2020].
Anon, An Improved Algorithm of Decision Trees for Streaming Data Based on VFDT – IEEE Conference Publication. Available at: https://ieeexplore.ieee.org/document/4732288 [Accessed November 16, 2020].
Anon, Decision Tree Algorithm, Explained. KDnuggets. Available at: https://www.kdnuggets.com/2020/01/decision-tree-algorithm-explained.html [Accessed November 16, 2020].
Anon, Machine Learning Decision Tree Classification Algorithm – Javatpoint. www.javatpoint.com. Available at: https://www.javatpoint.com/machine-learning-decision-tree-classification-algorithm [Accessed November 10, 2020].
Anon, Machine Learning Decision Tree Classification Algorithm – Javatpoint. www.javatpoint.com. Available at: https://www.javatpoint.com/machine-learning-decision-tree-classification-algorithm [Accessed November 11, 2020].
Anon, What is a Decision Tree Diagram. Lucidchart. Available at: https://www.lucidchart.com/pages/decision-tree [Accessed November 14, 2020].
B.Azhagusundari, Antony Selvadoss Thanamani 2013, Feature Selection based on Information Gain
Christopher Lee,Pankaj Nagpal,Sinead G. Ruane,Hyoun Sook Lim,2018, Factors Affecting Online Streaming Subscriptions
Data Mining pre-eminent paper (Springer LNCS, 2008)
Dietterich, Thomas G, 2000, An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization
Ganti, A., 2020. Weighted Average Definition. Investopedia. Available at: https://www.investopedia.com/terms/w/weightedaverage.asp [Accessed November 15, 2020].
Hershy, A., 2020. Gini Index vs Information Entropy. Medium. Available at: https://towardsdatascience.com/gini-index-vs-information-entropy-7a7e4fed3fcb [Accessed November 15, 2020].
JOUR, Liaw, Andy, Wiener, Matthew,2001, Classification and Regression by RandomForest
K, C., 2008. Thirteen ways of looking at a default. Courses. Available at: https://courses.media.mit.edu/2008fall/mas622j/Projects/CharlieCocoErnestoMatt/decision_trees/ [Accessed November 14, 2020].
King, J., 2020. How to Build Decision Trees. Medium. Available at: https://towardsdatascience.com/how-to-build-decision-trees-aa4c09114eba [Accessed November 11, 2020].
Leo Breiman, Jerome Friedman, Charles J. Stone, R.A. Olshen,1984, Classification and Regression Trees
Lucas, B., 2017. Cross-Validation: Estimating Prediction Error. DataScience+. Available at: https://datascienceplus.com/cross-validation-estimating-prediction-error/ [Accessed November 14, 2020].
Nagpal, A., 2017. Decision Tree Ensembles- Bagging and Boosting. Medium. Available at: https://towardsdatascience.com/decision-tree-ensembles-bagging-and-boosting-266a8ba60fd9 [Accessed November 12, 2020].
Narula, G., 2019. Machine Learning Algorithms for Business Applications – Complete Guide. Emerj. Available at: https://emerj.com/ai-sector-overviews/machine-learning-algorithms-for-business-applications-complete-guide/ [Accessed November 10, 2020].
- Prasad, P. Kumar and N. Mm, “An Approach to Prediction of Precipitation Using Gini Index in SLIQ Decision Tree,” 2013 4th International Conference on Intelligent Systems, Modelling and Simulation, Bangkok, 2013, pp. 56-60, DOI: 10.1109/ISMS.2013.27.
- Juluri, V. Tamarapalli and D. Medhi, “Measurement of Quality of Experience of Video-on-Demand Services: A Survey,” in IEEE Communications Surveys & Tutorials, vol. 18, no. 1, pp. 401-418, First quarter 2016, DOI: 10.1109/COMST.2015.2401424.
Tiemens, E. et al., 2019. Streaming service problems. The Echo. Available at: https://www.theechonews.com/article/2019/10/4htds1egqqgeg4w [Accessed November 10, 2020].
Tyagi, N., 2020. Understanding the Gini Index and Information Gain in Decision Trees. Medium. Available at: https://medium.com/analytics-steps/understanding-the-gini-index-and-information-gain-in-decision-trees-ab4720518ba8, 2020].
Quinlan, 1986, Induction of Decision Trees
Rudin, Cynthia MIT 15.097 Course notes: decision trees
Aruna devi1, Dr K. Nirmala,2013, Construction of Decision Tree: Attribute Selection Measures Wray buntine,Tim Niblett,1992, A Further Comparison of Splitting Rules for
Decision-Tree Induction
APPENDIX A
Implementation of decision trees using python from scratch
https://dhirajkumarblog.medium.com/decision-tree-from-scratch-in-python-629631ec3e3a
https://machinelearningmastery.com/implement-decision-tree-algorithm-scratch-python/