Data Mining Functions
Classification
Classification is a data mining function that assigns items in a collection to target categories or classes. The goal of classification is to accurately predict the target class for each case in the data. For example, a classification model could be used to identify loan applicants as low, medium, or high credit risks.
Classification models include decision trees, rule-based models, neural networks, support vector machines, and other machine learning algorithms. Classification models are often used in fraud detection, anti-money laundering (AML), and other compliance applications.
Regression
Regression is a statistical technique that attempts to model and predict the relationship between a dependent variable (often called a response variable) and one or more independent variables (often called predictor variables). In its simplest form, regression analysis is used to estimate the value of a response variable y based on the value of another predictor variable x.
Association
In general, association rules are If/Then statements that help uncover relationships between seemingly unrelated data. For example, a retail store might use association rules to identify customers who purchase a particular item and then also buy a related item. This information can then be used to place the items together in the store or even offer a discount on the second item to encourage customers to buy both items.
Other examples of where association rules might be used include:
-Marketing: identify customers who tend to buy certain products together so that related products can be marketed together
-Fraud detection: detect unusual patterns of behavior that might indicate fraud
-Biology: find genes that tend to be activated or suppressed together
-Web usage mining: analyze web logs to find pages that are viewed together frequently
Sequence
Sequence is a data mining function that finds patterns in data that occur over a period of time. This function is often used to predict future events.
Classification is a data mining function that assigns items in a dataset to one or more classes. This function is often used to segment customers into groups.
Regression is a data mining function that predicts the value of a dependent variable based on the values of one or more independent variables. This function is often used to predict future sales or trends.
Clustering is a data mining function that groups items in a dataset together based on similarity. This function is often used to find groups of customers with similar characteristics.