Artificial Intelligence
May 29, 2020
November 18, 2024

Machine Learning - Giving us all a Bad Name?

Understand the differences between supervised and unsupervised machine learning, and why unsupervised ML offers a more scalable and efficient solution for complex records management challenges.

Machine Learning - Giving us all a Bad Name?

Interview multiple candidates

Lorem ipsum dolor sit amet, consectetur adipiscing elit proin mi pellentesque  lorem turpis feugiat non sed sed sed aliquam lectus sodales gravida turpis maassa odio faucibus accumsan turpis nulla tellus purus ut   cursus lorem  in pellentesque risus turpis eget quam eu nunc sed diam.

Search for the right experience

Lorem ipsum dolor sit amet, consectetur adipiscing elit proin mi pellentesque  lorem turpis feugiat non sed sed sed aliquam lectus sodales gravida turpis maassa odio.

  1. Lorem ipsum dolor sit amet, consectetur adipiscing elit.
  2. Porttitor nibh est vulputate vitae sem vitae.
  3. Netus vestibulum dignissim scelerisque vitae.
  4. Amet tellus nisl risus lorem vulputate velit eget.

Ask for past work examples & results

Lorem ipsum dolor sit amet, consectetur adipiscing elit consectetur in proin mattis enim posuere maecenas non magna mauris, feugiat montes, porttitor eget nulla id id.

  • Lorem ipsum dolor sit amet, consectetur adipiscing elit.
  • Netus vestibulum dignissim scelerisque vitae.
  • Porttitor nibh est vulputate vitae sem vitae.
  • Amet tellus nisl risus lorem vulputate velit eget.
Vet candidates & ask for past references before hiring

Lorem ipsum dolor sit amet, consectetur adipiscing elit ut suspendisse convallis enim tincidunt nunc condimentum facilisi accumsan tempor donec dolor malesuada vestibulum in sed sed morbi accumsan tristique turpis vivamus non velit euismod.

“Lorem ipsum dolor sit amet, consectetur adipiscing elit nunc gravida purus urna, ipsum eu morbi in enim”
Once you hire them, give them access for all tools & resources for success

Lorem ipsum dolor sit amet, consectetur adipiscing elit ut suspendisse convallis enim tincidunt nunc condimentum facilisi accumsan tempor donec dolor malesuada vestibulum in sed sed morbi accumsan tristique turpis vivamus non velit euismod.

Machine Learning is a really common AI technology. People tend to assume that ML means machines teaching themselves - but really, ML means machines learning from people. 

Once the machine has learned, or been taught, it can start to make its own predictions. But the process of learning can be very onerous, depending on the approach.

There are two main approaches to ML. One is supervised ML. In this approach, a large set of training data is used. The data is curated and labelled, then shown to the machine. The machine learns to recognise data that should also match the examples it has been given. This is a robust type of ML, but has a significant disadvantage: it requires a lot of training data, and a lot of effort to curate that data. Supervised ML approaches to records and information management have been proof-of-concepted by various vendors, and the feedback has been:

  • The AI needed a lot of training by our records team
  • We couldn't come up with 1,000 good examples of a document for every rule
  • We had to spend time correcting or confirming the machine on every single match
  • Training each rule was so onerous that we had to limit the rules we applied
  • When rules change, we will have to train all over again
  • We couldn't feasibly apply more than one 'rule' to a document
  • It was too much work to set up, and it created more work than it alleviated

But it doesn't have to be this way! Supervised ML is really not scalable for a problem as complex as records management (which needs to apply retention, security, privacy and handling rules from multiple different instruments, and update them dynamically over the life of the record), or over data sets as large as corporate file shares, for example. There's too much data, and each item is rarely just about one thing, so you really can't simplify the rules to the point where supervised ML is comfortable. Remember that even AFDA v2, ostensibly a 'rolled' up' Records Authority with only 86 classes, actually has 256 separate rule types within those classes. So that's at least 256,000 documents you would need to find, cleanse, and curate for a supervised ML approach, then 'approve or deny' the attempted matches.

And that is assuming one document only matches one class. But a document is never just 'yellow, red or blue'. It's a bit yellow, a bit blue, and mostly red.  A contract is not just a 'financial' record. It can also be a record of core business, or relate to compensation, or even be subject to a 'freeze' like PFAS or Natural Disasters such as have arisen in recent Royal Commissions. Multiple rules will always need to be applied, and those rules come from multiple types of instruments. That's why Microsoft's labelling approach also can't work for records management.

Unsupervised ML, on the other hand, doesn't need the records team to create and curate sample sets, and train the machine. The machine looks at the data itself, and finds its own patterns, clusters and dimensions. It doesn't need humans to create training labels, and it doesn't need humans to 'mark' every match it makes in order to learn and improve. It is a much faster, simpler and easier ML model from the client's perspective - whereas supervised ML puts the work back on the organisation to teach the machine, unsupervised ML keeps the burden on the vendor (where it really belongs) to develop sophisticated technology.

So don't throw the baby out with the bathwater! AI done wrong can have really negative consequences, that outweigh any potential benefits. You can (and should) have great quality, sophisticated ML as part of your AI and automation strategy. But it doesn't have to hurt. 

Video credit: Christina Trexlor on TikTok