Amazon is adding a new privacy-focused feature to its business transcription service, one that automatically redacts personally identifiable information (PII) such as names, social security numbers, and credit card credentials.

Amazon Transcribe constitutes part of Amazon’s AWS cloud unit and was launched into general availability back in 2018. In a nutshell, Transcribe is an automatic speech recognition (ASR) service that enables enterprise customers to convert speech into text — this can help make audio content searchable from a database, for example. Contact centers can also use it to mine call data for insights and sentiment analysis. However, privacy issues have cast a spotlight on how technology companies store and manage consumers’ data.

Privacy

The problem Amazon is looking to solve has to do with minimizing access to sensitive information. Text-to-speech services can be used to search for keywords and sentiment at a later date, but phone calls often feature significant private data that may also be transcribed by Amazon and stored in a searchable database — even if that information is not necessary for analysis purposes. There are also a growing number of regulations around the world designed to protect consumer data — including the recently implemented California Consumer Privacy Act (CCPA) and Europe’s General Data Protection Regulation (GDPR).

Against this backdrop, Amazon Transcribe will now enable companies to automatically redact personal data, including credit/debit card numbers, expiration dates, CVV codes, PINs, social security numbers, bank account numbers, customer names, email addresses, phone numbers, and postal addresses. It’s worth noting that Google Cloud Platform offers a data loss prevention API that could be used in conjunction with its speech-to-text service to identify and redact sensitive data. But building automated redaction directly into Amazon Transcribe should make the process a lot easier to implement.

Companies using Amazon Transcribe can use automatic redaction as they see fit and can choose which PII elements they wish to obfuscate. The transcribed text will then display a [PII] tag in place of the sensitive information, and the corresponding timestamps mean anyone with sufficient system access will still be able to locate the necessary PII in the original audio file. This may also prove useful if a company wants to carry out extra audio processing to fully redact the information in the original recording.

Amazon Transcribe is available in 31 languages, six of which are supported by real-time transcription, though for now the automated redaction feature is limited to U.S. English. The feature is billed monthly at a rate of $0.00004 per second of content.