
Some details
Processing of 200.000+ legal documents from incoming mails on a monthly basis. Applying a standard OCR tool wasn',t enough to bring real process automation as all details still had to be transferred to proper systems and OCR errors had to be corrected. Periodically, batches of 500.000 + files had to processed in the shortest possible time.
SolutionSet of text classifiers – recognizing a
document type
Named Entity Recognition models – mining the information
Image Processing Pipelines – improving OCR input quality
Quality Assurance Framework - assessing if extracted details hold with business logics