Since launching Watson Machine Learning, a variety of additional Nodes have been added to the DSX Canvas for the SPSS Modeler runtime. These nodes provide a significant amount of new functionality for data scientists building flows. These include Auto Classifier and Auto Numeric, which automatically train many models and select the top performer. These nodes save a lot of time by removing the need to manually train multiple models. Anomaly Detection is another new node that can be used to identify unusual data which has many valuable uses including fraud detection. Read on to learn about all the newest nodes and how they can be used in your flows.
This allows multiple data sets to be appended together (similar to ‘UNION’ in SQL). For example, a customer may have sales data in separate files for each month and wants to combine them into a single view of sales over several years.
LSVM (Linear Support Vector Machine)
This is a classification algorithm that is particularly suited for use with wide data sets, that is, those with a large number of predictor fields.
PCA/Factor (Principal Components Analysis)
This node aims to reduce the complexity of data by finding a smaller number of derived fields that effectively summarizes the information in the original set of fields.
This is used for survival analysis i.e. estimating the probability that an event has occurred at a certain time. For example, a company is interested in modeling the time to churn in order to determine the factors that are associated with customers who are quick to switch to another service.
This is an unsupervised algorithm used to cluster the dataset into distinct groups. Instead of trying to predict an outcome, K-Means tries to uncover patterns in the set of input fields. Records are grouped so that records within a group or cluster tend to be similar to each other, but records in different groups are dissimilar.
This is used to identify outliers, or unusual cases, in the data. Unlike other modeling methods that store rules about unusual cases, anomaly detection models store information on what normal behavior looks like. This makes it possible to identify outliers even if they do not conform to any known pattern, and it can be particularly useful in applications, such as fraud detection.
This builds several classification models using multiple algorithms and settings, evaluates them and selects the best performing. These can then be used to score new data and by combining (“ensembling”) the results from those models, a more accurate prediction can be obtained.
This is equivalent to the Auto Classifier but for numeric/continuous targets.
GLE (Generalized Linear Engine)
This uses a variety of statistical techniques to support both classification and continuous predicted values. Unlike many algorithms, the target does not need to have a normal distribution.
In summary, the new nodes enhance the power of the Data Science Experience by supporting automatic multi-model building and allowing new application areas to be addressed.