If the predictions produced by a fuzzy logic or machine learning script do not meet the expected results, you can take action to improve the script's predictions. The method by which you improve a script's predictions depends upon the type of script.
To improve a machine learning script's ability to predict target values, you can train the script by adding training data.
Training data includes relevant data for the specific machine learning script that enables the script to make consistent predictions. To train a machine learning script, you can create a query or dataset that includes relevant data and the corresponding correct target value. The fields of the relevant data provide text from which features can be produced. The script evaluates the specific features against the corresponding correct target value to help predict future target values.
The following table shows example fields that provide features and the target value of the isAFailure.py script.
Fields that Provides Features | Target Value | |||
---|---|---|---|---|
Event Short Description | Event Long Description | Failure Mode Description | Priority Description | Breakdown Indicator |
The impact of the training data you add to a machine learning script depends upon the amount of training data you are adding in proportion to the existing amount of training data. For example, if you add 100 records of training data to a machine learning script that has 100,000 records of training data, the impact is minimal and the improvement may appear to be ineffective. However, if you add 50,000 records of training data to the existing 100,00 records, the impact will be significant and the improvement will be obvious.
If you are very confident in a training set of data, you can completely replace the existing training data with your training data set.
When developing training data, follow these important principles:
The training data should have a good distribution of labels (i.e., target values).
For example, if a machine learning script will predict a target value of either True or False, the training data should not have 95% of the records with True values and 5% of the records with False values, which would be called class imbalance. The training data should be more evenly spread to include a significant amount of samples of both True and False target values.
Tip: If you have imbalanced data (i.e., many more of one or two labels than the rest), you can improve the model predictions by ignoring that data and incrementally training several times, thereby oversampling the infrequent labels.
The training data should be compatible with the input data used for predictions.
The training data should include similar words and phrases to those that are expected to be found in the input data. For example, if the training data only has words and phrases in English but the expected input data has words and phrases in Spanish, the training data is not compatible.
The initial set of training data should include a significant number of records to ensure that the machine learning script has a robust set of features.
Adding training data after the machine learning script has been trained initially does not add new features. If you are adding training data for the first time to a new machine learning script or you are removing the existing training data and replacing it with new training data, you should use a query or dataset with a significant number of records to create the initial set of training data. If you train a machine learning script on an initial training set with just a few records, the script will not produce accurate predictions.
Fuzzy logic scripts rely on a standard list of standard values in the Classifier Standard List family to produce predictions. If the predictions produced by a fuzzy logic script are invalid, you can modify the standard list of standard values that is referenced by the script in one of the following ways:
Add missing standard values. If the standard list does not contain a necessary standard value, it is impossible for the fuzzy logic script to use that value as a prediction. You can use Record Manager to add standard values, and their mapped words, to a standard list.
Add or modify mapped words to an existing standard value. If the standard value exists in the standard list, verify that the mapped words match words that are used in your data. If they do not match, you can use Record Manager to add or modify the mapped words for a specific standard value.
Modify the weight of mapped words. If your standard list contains both the correct standard values and mapped words that match your data but the prediction is consistently the wrong target value, you can use Record Manager to increase or decrease the weight of a mapped word. When the fuzzy logic script processes the input data, it creates scores for potential matches. In the case of a tie score, the script will select the word with the higher weight.
Copyright © 2018 General Electric Company. All rights reserved.