Machine Learning Scripts

Configure and Initially Train a New Machine Learning Script

Before You Begin

Create a new script.
Create a query or dataset of your data that you want to use to initially train the script. See the table below for details.
Create a query or dataset of your data that you want to use to test (and eventually incrementally train) the script. This query should include test data for which you want to predict a certain value.

Procedure

Access the script that you want to configure and initially train.

Configure the script as described in the following table.


Section	Description	Required or Optional
Standard List	Select the standard list that is relevant to the script that you are testing.	Required
Query or Dataset	Select to browse the Catalog and select the query or dataset that you want to use to train the script. The query or dataset that you select must meet the following requirements: Contains data that is relevant to the script. Contains data in which you have a high degree of confidence. Contains labeled data where the labels correspond to values in a standard list. Includes a significant number of records to ensure that the script has a robust set of features. Notes: Fields that are used in a query but hidden from display will be ignored by the script. Hyperlinks defined for columns in a query will appear on the Test Results tab. This feature allows you to easily access associated records while testing a script. For example, a column containing work order numbers could be configured with a link to the full work order. Ensure that the query you use is a Select query. Other types of queries will cause errors to occur. Tip: For example, to test the IsAFailure.py script, the query should include the short and long descriptions of the work history event, the work order priority, and the breakdown indicator.	Required
Field to Classify	Select a column from the query or dataset that contains the correct values to use to improve the script's predicted values. Tip: For example, when testing the IsAFailure.py script, select the breakdown indicator column as the result column.	Required
Standard List Reference Field	Select a column from the query or dataset that contains values that identify sub-sets of the specified Standard List (e.g., Equipment Class values). When the script is run, values in the specified field are compared to values in the List Reference field in Classifier Standard List records to determine which sub-set(s) of the script’s Standard List to use.	Optional
Classifier Input Fields	Select the check box for each column in the query that you want to use as an input to the script. Tip: By default, the field selected in the Standard List Reference Field box is not included as an input to the script. If you want the script to process values in this field, include the column twice in your query or dataset.	Required

The script is configured.

Select .
Select .

The Continue with Full Train dialog box appears.
Select OK.

The script processes the data and the initial set of training data is created.

Tip: When the training process is complete, you can select to test the script, and then view the results on the Test Results workspace to ensure that they are as expected. See Test A Machine Learning Script for details about the information that appears in the Test Results tab.
In the Workbench workspace, in the Query or Dataset box, modify the query or dataset to one which includes test data for which you want to predict a certain value.
Select

Configure a Machine Learning Script That has Initial Training Data

Before You Begin

Create a machine learning script by copying an existing script that contains initial training data.
Create a query or dataset of data that you want to use to test (and eventually incrementally train) the script. See the table below for details.

Procedure

Access the script that you want to configure.

Configure the script as described in the following table.


Section	Description	Required or Optional
Standard List	Select the standard list that is relevant to the script that you are testing.	Required
Query or Dataset	Select to browse the Catalog and select the query or dataset that you want to use to test (and incrementally train) the script. The query or dataset that you select must meet the following requirements: Contains data that is relevant to the script. Contains data for which you want to predict a certain value. See About Prediction Improvement for details. Notes: Fields that are used in a query but hidden from display will be ignored by the script. Hyperlinks defined for columns in a query will appear on the Test Results tab. This feature allows you to easily access associated records while testing a script. For example, a column containing work order numbers could be configured with a link to the full work order. Ensure that the query you use is a Select query. Other types of queries will cause errors to occur. Tip: For example, to test the IsAFailure.py script, the query should include the short and long descriptions of the work history event, the work order priority, and the breakdown indicator.	Required
Field to Classify	Select a column from the query or dataset that contains the current values that the script is attempting to predict. Tip: For example, when testing the IsAFailure.py script, select the breakdown indicator column as the result column.	Required
Standard List Reference Field	Select a column from the query or dataset that contains values that identify sub-sets of the specified Standard List (e.g., Equipment Class values). When the script is run, values in the specified field are compared to values in the List Reference field in Classifier Standard List records to determine which sub-set(s) of the script’s Standard List to use.	Optional
Classifier Input Fields	Select the check box for each column in the query that you want to use as an input to the script. Tip: By default, the field selected in the Standard List Reference Field box is not included as an input to the script. If you want the script to process values in this field, include the column twice in your query or dataset.	Required

The script is configured.

Select .

Test A Machine Learning Script

Before You Begin

Configure a Machine Learning Script that Has Initial Training Data.

-or-

Configure and Initially Train a New Machine Learning Script.

Procedure

Access the script that you want to test.
Select .
-or-

Select the Test Results tab.

The script processes the data and then predicts the result for each record. The Test Results workspace appears with 50 results per page. By default, all rows in the grid are selected, indicating that they will be included when you incrementally train the script.

Tip: To view a single row of the test results, select . To navigate through the test results, in the Details window, select and .

The following columns appear in the grid:
- Prediction: Predicted value produced by the script.
- Score: The score of the prediction produced by the script. Scores are presented as a percentage between 0.0% and 100.0%.
- Rating: Column in which you can confirm or reject the script's prediction. See Incrementally Train a Machine Learning Script for details.
- Actual: Column in which you can provide the correct value for the prediction to improve the training data. This column is disabled until you reject the script's prediction in the Rating column.
  
  Note: If the data type of the script's predicted value is Boolean (i.e., true/false), the Actual column is hidden, and the Boolean value in the hidden Actual column is populated automatically based on your selection in the Rating column.
- A column for each field you selected in the Classifier Input Fields section. The column for the field that you selected in the Field to Classify box on the Workbench tab appears first.
Review the results to ensure they are as expected. If the results are not satisfactory, you can train the script.

What To Do Next

Incrementally train the script

Incrementally Train a Machine Learning Script

Machine learning scripts include sets of training data that influence the predicted values they produce. If you test a machine learning script and the results are unsatisfactory, you can improve the training data using one of the following methods to incrementally train the script:

Add training data from the Test Results workspace by evaluating test results and adding reviewed data to the existing training data.
Add training data from the Workbench workspace by adding additional new training data to the existing training data via a query or dataset.

Note: Adding a proportionately small amount of data to a large set of existing trained data will have a limited impact on improving the script.

Before You Begin

Test a machine learning script.

Steps: Add reviewed data to the existing training data

Access the Test Results workspace.

By default, all rows in the grid are selected, indicating that they will be included when you incrementally train the script.
If you do not want the data in particular row to be included in training, clear the check box in the first column to ignore that row.
Tip:
- You can select the check box in the heading row of the grid to ignore all data on the current page.
- You may want to ignore certain rows if you imbalanced data. See for About Script Prediction Improvement details.
-or-

If the value in the Prediction column is correct and you want to include the row in training, move on to the next row.

-or-

If the value in the Prediction column is wrong, select in the Rating column. switches to , indicating that the original prediction was wrong.

Depending on your data, one of the following results occur:
- If the value in the specified Field to Classify matches a value in the standard list, the Actual column is populated with that standard list value by default.
  
  -or-
- In the following scenarios:
  - If the value in the specified Field to Classify does not match a value in the standard list.
  - If the values already matched but you indicated that the prediction was wrong.
  - If there is no value in the specified Field to Classify.
  ...the Actual column is not populated and the cell is selected so that you can specify the correct value.
  
  -or
- If the data type of the specified Field to Classify is Boolean (i.e., true/false), the Actual column is hidden, and the Boolean value in the hidden Actual column is populated automatically based on your selection in the Rating column.
If you indicated that the original prediction was wrong (i.e., by marking ) and the data type of the Field to Classify is not Boolean, specify the correct value in the Actual column.
Tip: After specifying a value in the Actual column, press Enter or Tab, or select another cell in the Actual column, to move on to the next row.
Repeat steps 2-3 until you are satisfied that all incorrect values in the Actual column on the current page have been corrected, or you have cleared the check box in rows that you want to exclude from training.
In the upper-right corner of the workspace, select .
The Continue with Incremental Train? dialog box appears.
Select OK.
The data from the selected rows in the table (i.e., the data in the query or dataset and the values in the Actual column) are appended to the training data.
In the box in the upper-left corner of the workspace, select the next available set of records that you want to train.
Repeat steps 2-7 until you are satisfied with the amount of data that has been appended to the script's training data.

Steps: Add additional new training data via a query or dataset

Create a query or dataset of the training data that you want to add to the script's existing training data.
Important:
The query or dataset that you select must meet the following requirements:
- Contains data that is relevant to the script.
- Contains data in which you have a high degree of confidence.
- Contains labeled data where the labels correspond to values in a standard list.
- Includes a significant number of records to ensure that the script has a robust set of features.
Access the machine learning script to which you want to add the training data from the query or dataset.
In the Workbench workspace, in the Query or Dataset box, specify query or dataset that you created in Step 1.
In the upper-right corner of the workspace, select .
The Continue with Full Train? dialog box appears.
Select OK.
A notification appears, indicating that the training job has been submitted. You can navigate to other areas of the application while the records are trained. When the process is complete, the script’s history shows a status of Completed, which indicates that all the data from the query or dataset and the script's predicted values are appended to the trained data.

Fully Retrain a Machine Learning Script

Before You Begin

Create a query or dataset of the training data that you want to use to replace the script's existing training data.
Important:
The query or dataset that you select must meet the following requirements:
- Contains data that is relevant to the script.
- Contains data in which you have a high degree of confidence.
- Contains labeled data where the labels correspond to values in a standard list.
- Includes a significant number of records to ensure that the script has a robust set of features.

Procedure

Access the machine learning script for which you want to replace the existing training data.
Select the Statistics tab.

The Statistics section appears.
Next to the count of trained records, select .

The Are you sure? dialog box appears.
Select OK.

A message appears, indicating that the trained data has been deleted.
Select the Workbench tab.

The Workbench workspace appears.
In the Query or Dataset box, select to browse the Catalog and select the query or dataset that you want to use to replace the script's existing training data.

The Query or Dataset box is populated with your selection.
In the upper-right corner of the workspace, select .

The Continue with Full Train? dialog box appears.
Select OK.

The script processes the data and the set of training data is replaced with data from the specified query or dataset.