Machine learning scripts include sets of training data that influence the predicted values they produce. If you test a machine learning script and the results are unsatisfactory, you can improve the training data using one of the following methods to incrementally train the script:
Note: Adding a proportionately small amount of data to a large set of existing trained data will have a limited impact on improving the script.
Access the Test Results workspace.
By default, all rows in the grid are selected, indicating that they will be included when you incrementally train the script.
If you do not want the data in particular row to be included in training, clear the check box in the first column to ignore that row.
Tips:
You may want to ignore certain rows if you imbalanced data. See for About Script Prediction Improvement details.
-or-
If the value in the Prediction column is correct and you want to include the row in training, move on to the next row.
-or-
If the value in the Prediction column is wrong, select in the Rating column.
switches to
Depending on your data, one of the following results occur:
If the value in the specified Field to Classify matches a value in the standard list, the Actual column is populated with that standard list value by default.
-or-
In the following scenarios:
...the Actual column is not populated and the cell is selected so that you can specify the correct value.
-or
If the data type of the specified Field to Classify is Boolean (i.e., true/false), the Actual column is hidden, and the Boolean value in the hidden Actual column is populated automatically based on your selection in the Rating column.
If you indicated that the original prediction was wrong (i.e., by marking
Tip: After specifying a value in the Actual column, press Enter or Tab, or select another cell in the Actual column, to move on to the next row.
Repeat steps 2-3 until you are satisfied that all incorrect values in the Actual column on the current page have been corrected, or you have cleared the check box in rows that you want to exclude from training.
In the upper-right corner of the workspace, select
The Continue with Incremental Train? dialog box appears.
Select OK.
The data from the selected rows in the table (i.e., the data in the query or dataset and the values in the Actual column) are appended to the training data.
In the box in the upper-left corner of the workspace, select the next available set of records that you want to train.
Repeat steps 2-7 until you are satisfied with the amount of data that has been appended to the script's training data.
Create a query or dataset of the training data that you want to add to the script's existing training data.
IMPORTANT:
The query or dataset that you select must meet the following requirements:
Contains data that is relevant to the script.
Contains labeled data where the labels correspond to values in a standard list.
Includes a significant number of records to ensure that the script has a robust set of features.
In the Workbench workspace, in the Query or Dataset box, specify query or dataset that you created in Step 1.
In the upper-right corner of the workspace, select
The Continue with Full Train? dialog box appears.
Select OK.
A notification appears, indicating that the training job has been submitted. You can navigate to other areas of the application while the records are trained. When the process is complete, the script’s history shows a status of Completed, which indicates that all the data from the query or dataset and the script's predicted values are appended to the trained data.
Copyright © 2018 General Electric Company. All rights reserved.