What is Data Mining?
Data mining is the process of searching large databases to find useful information that is useful for decision making. The English term “data mining” is also used.
It can be understood as the technology and software used to find patterns of behavior within the database. The fundamental basis of this is that these patterns help decision making. For example, it could help companies to know the behavior patterns of their customers. So it would facilitate the establishment of strategies to increase sales or reduce costs.
Advantages of Data Mining:
The fundamental advantage of this data analysis process is the large number of business scenarios to which it can be applied, as an example we have:
- Prediction: Forecast of the company’s sales.
- Probability: Selection of the best clients for a direct contact either by phone or email.
- Sequence analysis: Analysis of the products that customers have purchased and check the interrelation between them.
Top Data Mining Techniques:
To use some techniques and / or algorithms of Data Mining, it is necessary to have the minable view. That is, to have the data prepared, to know its description, and to select the technique, it is required to know the type of data.
The method to use depends on the problem that you want to solve. You have to do different analyzes to find the indicated algorithm. The following list shows some examples.
- Predict how long it takes to pay a customer: late payment
- Know who the customers are
- Identify the buyer profile of a certain product
- Detect networks of tax users who commit fraud
- Find customers who intend to leave the service you have
The problems raised generally have two types of approach: descriptive problems and predictive problems.
Data Mining Techniques for Descriptive Problems
The objective of solving descriptive problems is to find a description of the data. For example, it is usual to want to group customers with similar characteristics to send notifications that are more personalized. Another description may be to find associations of products that are sold together. Assignments Planet recognized well for its law essay writing service UK make the most use of Data Mining technique specially the descriptive one to group students for future notifications regarding their promotions and discounts. The data mining techniques for descriptive problems are:
The intention of this technique is to find similar, homogeneous groups in the data. For its solution, unsupervised or learning models are built.
The goal is to obtain relationships between the data, unknown relationships and that make sense. The best known classic example is to analyze the shopping cart in a supermarket. It is said that Walmart used models in the transactions to discover that on Fridays, a group of shoppers in addition to bringing beer also bought diapers. So he placed the diapers near the beers and with this action increased their sales of both products.
Data Mining Techniques for Predictive Problems
It is used to obtain models that will be used to apply it in future data, essentially to predict behaviors. In artificial intelligence they are called supervised learning models. The variables used can be categorical and numerical.
It refers to models in which the variable to predict has defined values and is countable. Categorical variables are used. In an investigation that I carried out, and taking the example of this, we can classify in an event, the Week of the Entrepreneur, the type of visitors: entrepreneur, entrepreneur and spectator. Another example is to predict whether or not a customer is going to buy a certain product.
As its name says, you try to predict values, so it occupies more numerical variables. Often what you get is the probability of an event: the probability that a customer will continue with their credit card, or that they acquire an additional card.
Classification according to Focus
According to one of the scholar “in practice, perhaps, one of the most interesting classifications of data mining algorithms is the one that corresponds to their function” Then they can be classified
- Classifiers Classify data into predefined classes
- Regression Algorithms From the data they generate a predictive function.
- Discovery of association rules. Search for relationships between variables.
- Dependency modeling. Generation of models that explain the dependencies between attributes.
- Grouping. Creation of groups when classes are unknown.
- Case-based learning. They establish themselves in indexing and remembering the most significant cases, so that the new cases are classified according to the closest descriptor.
- Compaction Search for more compact descriptions of the data. Dimension reduction techniques.
- Deviation detection. Based on the search for more important deviations from the data with respect to previous values.
- Summary. Describe the properties shared by those observations that belong to the same class.
If you also consider algorithms that support the previous tasks of preprocessing and data preparation, you can add:
- Multivariate visualization techniques.
- Algorithms for the detection and elimination of atypical data.
- Algorithms for detecting missing data and filling them out.
The first group covers the Data Exploration Process (Exploratory Data Analysis, AGE) through iterative and visual techniques, which allow insight into data structures, domains, atypical, etc.
Data Mining Algorithms
As the name says it is a sequence of decisions that are organized hierarchically, as branches of a tree. These algorithms accept both numerical and categorical data. This algorithm is often applied for classification, grouping and forecasting tasks. If they predict categories it is usually called classification trees. If they are numerical and are intended to predict, they are called regression trees.
Artificial neural networks
These kinds of algorithms are very powerful, and help to model with virtually any type of problem. Classification, prediction and grouping tasks are also performed. One of the disadvantages is that neural networks work with numerical data. Categorical variables are often discretized to apply these algorithms. Networks can be conceived as graphs with nodes and links. They are organized by layers, the first is input, the following are called hidden, and finally the output layer.
Principal Component Analysis
It is a multivariable technique, which is used in order to reduce the size of a data set. That is, if we have a large number of variables, obtain a sufficient minimum of variables that will be the main components that represent the information in the data. This algorithm works with numerical variables.
Like the analysis in main components, the objective of the factor analysis also aims to reduce the dimension of the data. It is a combination species that condense the complete information. The algorithm is designed to work with quantitative variables.
If you need to solve dimensionality problems with categorical variables, the correspondence analysis is the one indicated to carry out this task. Two verticals are used, the simple correspondence analysis, which evaluates two variables; It is based on the contingency table and the multiple correspondence analysis, which considers more than two variables, considering Burt’s table.
It is used to graphically represent through a perceptual map the similarities that you have objects of a data cloud, considering the positioning between them. It looks a lot like cluster analysis. The difference is that in this model the variables to determine the similarity are not known, while in the cluster itself.