IBM Bluemix and Watson Machine Learning (WML)
First you’ll need an IBM Bluemix account (register for a free account here). Once you have a Bluemix account, you’ll need to add a Watson Machine Learning (WML) service to your account. You can find the WML service from the Bluemix catalog by filtering on
Once the WML service has been added, you’ll see the Machine Learning service added to your Bluemix services dashboard.
Click on the Machine Learning service on the services dashboard. Now on the left side bar, click on Service credentials. This is where you can manage and view your WML service credentials. If credentials haven’t been created, you can create them now by clicking the New credential button. Once you have service credentials, click on the View credentials action to see the credential details in JSON format. The credential attributes that will be required for scoring data using the deployed WML endpoints are the URL, username and password.
Click Manage in the left side bar, then click the Launch Dashboard button in the Watson Machine Learning card.
Now click the Deployments tab to see a list of deployed WML scoring endpoints. Currently, training and saving WML models can be done in scala, python and through the DSX Model builder UI. At the time of this writing, there is no R support for training and saving WML models to the WML service.
Once you have some model deployments (using whatever means available), click on one of the deployments in the table to see a deployment overview. Under the API Details section is the Scoring Endpoint, which is the endpoint you’ll use to score data.
Score data with WML using R
Now that you have some WML endpoints deployed, it’s time to use them to score data. Fire up your R tool of choice. I’m going to choose the RStudio tool embedded in the IBM Data Science Experience (you can get a free DSX account here), but you can use a Jupyter R notebook, RStudio or something else.
Create a file called
wml_config.R and provide the URL, username and password WML credential attributes. This is also a good place to define your deployed ML scoring endpoints.
Now it’s time to write the R code that will score your data. I’m going to create a new R notebook in RStudio. First, I’m going to install and use a WML scoring helper library called R4WML, source the
wml_config.R file (created in the step above) to bring in the authentication and deployed endpoints, then use the URL, username and password to create WML authentication headers. We’ll be using these authentication headers to access the scoring endpoints.
devtools::install_github(repo = 'IBMDataScience/R4WML') library(R4WML) source('wml_config.R') watson_ml_creds_auth_headers <- get_wml_auth_headers(watson_ml_creds_url, watson_ml_creds_username, watson_ml_creds_password)
I’ll show 2 examples of how to prepare a data payload for scoring.
First we’ll use one of the WML scoring endpoints to score a 2 record payload that is created by hand:
payload <- to_wml_payload( data.frame( CASENO = c('1', '2'), KEYWORDS = c('Financial', 'Criminal Activity'), FINDINGS_OF_FACT = c('test payload 1', 'test payload 2') ) ) payload_scored <- from_wml_payload(wml_score(ml_endpoint.naive_bayes, watson_ml_creds_auth_headers, payload)) View(payload_scored[,c('CASENO', 'predictedLabel', 'probability')])
Second, we’ll score the top 150 records from a CSV file:
data <- read.csv(file='2004-cases.csv') data <- head(data, n=150) payload <- to_wml_payload(data) results <- wml_score(ml_endpoint.naive_bayes, watson_ml_creds_auth_headers, payload) payload_scored <- from_wml_payload(results) View(payload_scored[,c('CASENO', 'predictedLabel', 'probability')])
I use RStudio’s
View feature to browse the scored results.
Below is my complete notebook.
After running the notebook, here are the results of scoring the top 150 records from the CSV file.