AI at Talenom: Active Learning
One of the most common use-case of machine learning is to automate work that has been previously done manually. The learning is typically based on training data where a decision (output) is made based on a set of conditions (inputs). After training with the data, the machine will then be able to make the correct decision with varying accuracy, greater for the easier cases and lower for the harder ones. This expected accuracy is typically available for each decision to be made allowing users to set a level of certainty where they are comfortable trusting the algorithm and delegate decisions falling under that threshold to manual processing. This procedure is also in place at Talenom where machine learning is utilized for automating the posting of invoices.
Problem
Illustration of how machine learning (ML) used for automation can lead to degradation of accuracy. 1) New dataset is presented for ML to classify. 2) ML handles easy cases automatically but 3) is unsure about some. 4) Unsure cases are directed for human classification. 5) A new labeled dataset has become biased
Over time we end up in a situation where more and more of the easier cases are handled by the machine while harder (and potentially more interesting) decisions have been left for humans. Sounds like a happy circumstance but trouble arises when the machine learning model needs to be updated due to data or model drift. When the model is retrained with the latest data, the data is no more independently labeled since machine learning model has done some part of the labeling.
Now, there are two extreme options. First, forget this nasty fact that machine has done some part of the labeling, and retrain with all the data. However, this would mean that the machine is going to be fed back its own decisions creating a self-reinforcing loop that solidifies the errors.
Another option would be to drop all cases handled by the machine and only retrain using human decisions. This leads to an opposite problem (assuming that retraining interval is long) where the machine is going to see only difficult cases and likely become less certain of the decisions it used to make easily.
Are there any other choices? Yes, we can randomly sample minor part of the data where the machine has been confident and leave it to human labeling. Moreover, this small part of the data can be used also for monitoring production performance.
Active Learning
Instead of simple random sampling could we use a more optimal strategy for the problem? Active learning is a sub-field of machine learning studying just that, namely, how to best choose what data to collect for efficient training. The importance of the field has emerged recently as data centric AI approach has taken place over more traditional model centric AI approach.
There are many ways to approach the problem, but all methods rank the examples based on some metric of usefulness. This metric can be derived straight from the data, for instance by measuring the of uniqueness of the example with the logic being that rare cases are likely to be more difficult for the machine and therefore useful to teach. Other approaches ask the machine learning model itself to rate the examples, again by uniqueness or for instance by the uncertainty of the decision. For some machine learning models it is feasible to measure the expected effect that the example could have on the model if the decision was available but in most cases these approaches are prohibitively computationally expensive. Time is an important factor too because the older the examples get, the less valid they are likely to be.
Regarding practical implementation point of view, a recent python library ALaas (Active Learning as a Service) provides interesting approach for providing easy usage of multiple different active learning strategies in easy manner.
It should be noted that when more complex active learning strategies will be taken place, estimating production performance from these non-uniformly sampled samples will be harder.
Study
Explicit active learning is not the only solution to tackle the problem. Another is to retrain frequently. Retraining frequently has implicitly similar effect than explicit active learning. Namely, after each retraining, slightly different examples end to human labeling. However, using this as an only active learning strategy would increase time dependent variation of machine learning system that is usually undesirable but rarely measured phenomenon.
We studied these effects to long term performance before we scaled automatically accepted postings up. We performed multiple simulated trainings by dropping different ratio of the cases in which our previously trained model was confident from training data. Moreover, we varied the time intervals from training data where this training sample dropping was done. As expected, we found some degeneration of performance when the ratio of dropped samples was getting very high. However, we concluded also that retraining the model in short intervals helped to stabilize the performance in long term. Therefore, by combining frequent retraining with simple active learning, we can keep automation rate relatively high without an essential long term performance trade-off. Moreover, we consider important that we can monitor production performance with the samples used for active learning.