Cognitive PII Data Redaction

Machine learning algorithms implemented to redact PII data from millions of emails.

  • Client

    Multinational Financial Services Company

  • Services

    Consulting, Managed Services

  • Areas of Expertise

    Technology

  • Industry

    Financial Services

Our Challenge

Multinational Financial Services Company Needed to Implement Machine Learning Algorithms to Redact PII Data

Our client, a multinational financial services company, needed to redact PII data contained within millions of emails per week, apply GDPR compliance rules to meet customer preferences, store data in a data lake, and pass the audit trail of metadata before and after redaction.

Our Solution

Development of Cognitive Computing Framework Which Applies Supervised Machine Learning Algorithms

Vaco’s team built the cognitive computing framework which applies supervised machine learning algorithms to identify, validate, and redact PII data within TB volumes of emails. Vaco also developed a scalable GDPR rules engine to handle multi-level compliance scenarios in real-time as well as a robust data stewardship routine to support metadata audit requirements.

The cognitive computing framework delivers automated email scanning for data, identification, and classification of PII data attributes (using ML, CNN, and LSTM technology); comprehension and extraction of relevant information (using ML, DNN, and NLP); GDPR compliance business rules; and automatic population of compliant data into an enterprise data lake.

The machine learning model was trained to achieve:

Model Accuracy – 99.54%

Model Precision – 1.000

Model Recall – 0.9900

Model F1 Score – 0.9933

Impact

  • Parses, identifies, and validates PII data within 56M email records at a rate of 450 emails per second.
  • Reduced development time to 4-6 months versus 9-12 months time (compared to developing from scratch by other onshore and offshore firms).

Download the full case study here