Cognitive PII Data Redaction
Machine learning algorithms implemented to redact PII data from millions of emails.
Multinational Financial Services Company Needed to Implement Machine Learning Algorithms to Redact PII Data
Our client, a multinational financial services company, needed to redact PII data contained within millions of emails per week, apply GDPR compliance rules to meet customer preferences, store data in a data lake, and pass the audit trail of metadata before and after redaction.
Development of Cognitive Computing Framework Which Applies Supervised Machine Learning Algorithms
Vaco’s team built the cognitive computing framework which applies supervised machine learning algorithms to identify, validate, and redact PII data within TB volumes of emails. Vaco also developed a scalable GDPR rules engine to handle multi-level compliance scenarios in real-time as well as a robust data stewardship routine to support metadata audit requirements.
The cognitive computing framework delivers automated email scanning for data, identification, and classification of PII data attributes (using ML, CNN, and LSTM technology); comprehension and extraction of relevant information (using ML, DNN, and NLP); GDPR compliance business rules; and automatic population of compliant data into an enterprise data lake.
The machine learning model was trained to achieve:
Model Accuracy – 99.54%
Model Precision – 1.000
Model Recall – 0.9900
Model F1 Score – 0.9933
- Parses, identifies, and validates PII data within 56M email records at a rate of 450 emails per second.
- Reduced development time to 4-6 months versus 9-12 months time (compared to developing from scratch by other onshore and offshore firms).
Download the full case study here.