Certificates of Analysis (CoA) are essential documents in the quality assurance process. They are used across multiple industries (Chemicals, Pharma, Food, etc.) and handed from supplier to customer. A CoA contains specifically measured properties of the delivered batch of materials and are core for the Quality Assurance process.
Interestingly, there is no international standard format for CoA. There are rules and regulations about what should be in a CoA (like EU GMP Guide Part 1 and 2, WHO Annex 10 etc. for Pharma industry or FDS regulations), but fixed format or a digital interface are still not standardized and every supplier uses a different layout and structure.
This is similar to other commercial documents like invoices or purchase orders. However, for those documents, large companies are already exchanging information electronically with EDI or similar for decades and modern software can identify the relevant information in pdf invoices using machine learning. These documents are simpler than CoA and more commonly used across industries.
For CoA there is much less automation available. It’s still a lot of paper and manual work to bridge the system gap between supplier and customer. The core data within a CoA usually consists of a table of properties and measurements. While the rest of information like supplier, batch, product, etc. are typically similar, the measurements highly depend on the individual material and the contracts with the supplier. The data is very unstructured and sometimes in tables, sometimes in lines of text, sometimes totally differently formatted.
So how do you want to get this information from one party (supplier) to another one (customer) and use it as data within the goods receipt and quality assurance processes?
How to have an interface as flexible as paper
The commonly accepted interface for business data exchange between individual entities still is paper - or its equivalent: pdf. It’s universally flexible and allows to store all kinds of information. Properly skilled humans are very capable of reading this and extracting the data, even if the format and content is always differently structured.
If we want to have a fully digital process from supplier production to customer quality assurance, we need to establish an interface that can understand CoAs in paper, pdf format or in Excel Sheets just like a clerk does it in the backoffice. The manual human centered process (reading paper and typing it into system) is very cumbersome and error-prone. To digitize this process, we would need a digital assistant.
Typing in text is a process which requires pattern recognition and a flexibility to understand various layouts and formats from various suppliers. This is a tiring and error prone process requiring concentration but minimal intelligence. Perfect for machine learning!
We can put a digital assistant as interpreter between the printer of the supplier and the scanner of the customer.
Machine Learning can do this!
Luckily, the required machine learning models for this have evolved a lot in the last years. Combining the right models for layout-, form- and entity-extraction helps to get the right data extracted. Transformer models can identify the relevant bits of information, even if they are named differently in different documents. To increase accuracy, you can even train individual machine learning models for each of the suppliers.
How does BASF handle this?
At BASF we were able to automate the CoA process using the recobo platform. Recobo can use various ML models flexibly: always the best in class depending on each use case. It’s already pretrained on the typical documents and on the domain knowledge of the Chemical and Pharma industry. Additionally, recobo provides everything an enterprise platform needs for scaling, authentication, security and operations.
Recobo is a venture out of the BASF Chemovator incubation program. We provide enterprise level digital assistants using best of class machine learning models on our platform.