EWA Systems

PRODUCTS
SOLUTIONS
OPEN USE TOOLS
INDUSTRIES
SERVICES
NEWS
EWA SYSTEMS
Data Mining - Decision Trees V4.3


Overview:

EWA Systems' Decision Tree Engine is a powerful, easy-to-use, 100%-Java decision tree tool that automatically mines large, complex data sets, searching for and isolating significant patterns and relationships. Evolved from This discovered knowledge is then used to generate reliable, easy-to-grasp predictive models for applications such as profiling customers, targeting direct mailings, detecting telecommunications and credit card fraud, and managing credit risk.


Background:

Decision trees is a data mining technique used in classification and regression tasks. Decision trees is a rules-based algorithm, meaning that the model it builds are based on sets of rules. An example of such a rule is "if temperature < 65" or "if color is blue". This rule-based nature of the decision tree algorithm is both a strength and a weaknness. The strength is that the rules are easy to read and comprehend. The weakness is that distinctions that drive a classification or regression model may not be easily encoded using rules.

Decision trees work by incrementally segmenting the data so that the resulting segments increase the ability to predict the correct results. The result of a decision tree algorithm is typically a tree of nodes starting with a root node. At every node where a split improves the tree, there is a splitter node, which has associated with it the splitting rule and two children nodes. Where no improvement is found, the tree terminates its growth with a terminal node. The result of this growth process is an exploratory tree, which represents the decision tree's best effort to model the data used in the learning process. This exploratory tree is over-specialized to the learning data. Should one want to use this model on data other than the learning data, the tree will need to be pruned to make it more general in nature.


Description:

EWA Systems' Decision Tree Engine is an enterprise-strength algorithm capable of solving the largest and most complex problems quickly and accurately. Its underlying methodology is mature, having been developed at Stanford University, and honed on real-time e-commerce datasets and multi-Terabyte semiconductor fabrication datasets where it has earned a repuation for speed, accuracy, and reliability.

EWA Systems' Decision Tree Engine is in 100% Java, Java Data Mining (JDM) compliant, and multithreaded to achieve the utmost performance.

EWA Systems' Decision Tree Engine algorithm gives the user complete control of the incoming data. After all, the results can only be as good as the incoming data. Every record in both the learning and test datasets can be individually weighted. Each target state can be weighted according to a number of prior weighting schemes. Misclassification weights can be set detering the algorithm from certain kinds of errors. The data itself can be any combination of categorical, continuous, or ordinal, and consist of integer, long, float, double, String, or Date information.

The datasets being mined need not fit in memory, since EWA Systems' Decision Tree Engine includes an efficient disk-caching mechanism allowing the an an unlimited problem size to grow to the available storage space on the local hard drive, a RAID array, or even a Storage Area Network. These datasets can even be left in the database as EWA Systems' Decision Tree Engine interfaces with any JDBC-compliant database.

EWA Systems' Decision Tree Engine algorithm is based on an overgrow-then-prune methodolgy. The algorithm supports both binary and multi-way decision trees. Users can use any combination of the included splitter or pruning techniques, (Gini, SymGini, Twoing, Ordered Twoing, Entropy and CHAID for classification trees and Least Squared Error and Least Absolute Difference for regression trees) or build one of their own splitter or pruning algorithm to accomplish their specific task. For example, Support Vector Machines or Neural Network may be used to create a splitting rule. Additionally, EWA Systems' Decision Tree Engine is a look-ahead algorithm, so it is less prone to greediness problems. Each Variable can be individually weighted as usefulness as being a splitter.

Missing Values can be replaced using either via the standard surrogate techniques, or using any of the other EWA techiques, such as neural or Bayesian networks which can greatly improve accuracy when mining extremely sparse datasets. Presense of missing values can also be used to detract from the value of splitting rules based upon these sparse values.

EWA Systems' Decision Tree Engine produces detailed reports concerning the data, its analysis, and the resulting model. These reports are available directly from the model, or its object or XML persistence covering the common XML standards. Reports include the engine configuraitons used, univariate analysis of each data varaible, and full details of the optimal tree, the exploratory tree, and every other "optiminal tree" for each terminal node count. For each tree, there is a summary report detailing the resulting accuracy, the attribute importance list, and confusion matrix. For each node in the tree, there is a summary report of the records included at that node and the suggested prediction should the tree terminate at that node. Splitter nodes also include the splitting rule, its competitors, and its surrogate rules, including the statistics and various improvement measures for each of the rules.

The optional graphical user interface visualizes the whole process from the data import and preparation, to the engine configuration and runtime monitoring, to the resulting model.

Applications:

Industries using EWA Systems' Decision Tree Engine include telecommunications, banking, financial services, manufacturing, retail and catalog sales, and education.

Applications span:
• Marketing: market segmentation, customer profiling, retention/attrition analyses
• Direct Mail: market segment profitability, campaign targeting, response prediction
• Financial Services: credit card scoring, fraud detection (e.g., see Fleet Bank success story)
• Manufacturing: assembly line failures, quality control


Performance:

In a third-party single threaded comparison, EWA's Decision Tree Engine had performance similiar to that of its C-based competitors, and was much faster than other Java implementations. EWA's Decision Tree Engine also supports multi-threading which further accelerates its performance. All implementations achieved similiar error rates.

EWA Systems (Java)
WEKA (Java)
RuleQuest (C)
Sleep Data Set
17s with 40MB
5m52s in 240MB
10s with 12MB
Forest Cover Data Set
11m with 180MB
Not Enough RAM to Finish
6m33s with 140MB

Feature List:

  • Implemented in 100% Java, with performance similar to C.
  • Unlimited Problem Size (Problem does not have to fit in memory)
  • Controlled Multi-Threaded Implementation (Uses only the number of threads specified)
  • Uses EWA's Standard Data Manipulation Package
  • Uses EWA's Standard Data Preparation Package
    • Supports Separate Learning/Testing Datasets
    • Supports Single Learning/Testing Dataset
      • Dedicated Test Set
      • Verification Folding
    • Supports Data Subsampling
      • Sampling from dataset's start, end, evenly, or randomly
    • More about the Data Preparation Package...
  • Configuration Options
    • Categorical and Continuous Binning
    • Prior Modifications
      • Based on Learning or Test Dataset, a Mixture of Both, or as Specified
    • Misclassification Cost
      • Either Unit, or as Specified
    • Tree Topography Limits
      • Max Tree Depth
      • Max Number of Terminal Nodes
      • Atomic Node Size (Via # of Records, or % of Root Size)
      • Child Node Size Limits (Via # of Records, or % of Parent Size)
    • Splitter Types
      • Classification Tree by GINI, Twoing, Ordered Twoing, CHAID
      • Regression Trees by Least Squared Error, Least Absolute Deviation
      • Custom User-Defined Splitters
      • Look-Ahead Splitters
    • Pruning
      • Exploration Trees
      • Standard Error Pruning
      • Custom User-Defined Pruning
    • Node Information
      • Number of Competitors
      • Number of Surrogates
        • Surrogate Importance Factor
      • Missing Value Factors and Cutoff
    • Results
      • Decision Tree (All best-of-size trees from single node to the optimal tree size and the exploratory tree)
      • Decision Rules (All rules sets equivalent to the above tree models)
      • Confusion Matrix
      • Variable Importance List


EWA Systems' Decision Tree Engine User's Manual (Request)

EWA Systems' Decision Tree Engine JavaDocs (Request)


Competitors List

We invite you to check out our competitors. We know you will be satisfied with our features, performance and price.

Competitors with Decision Trees not integrated with other Algorithms:

Competitors with Decision Trees integrated with other Algorithms


To Purchase or For More Information, contact our Sales Team.

Copyright © 2005 by EWA Systems, Inc.