Chris Halaschek of Pindrop: As Smart Speakers Rise in Popularity, Synthetic Speech and Voice Synthesis is Something We Need to be Ready For

The huge annual Consumer Electronics Show (CES) is wrapping up today, and it seems for the second year in a row smart devices with voice assistants like Amazon’s Alexa and Google’s Assistant were the talk of the show… pun intended.  With Amazon recently announcing they’ve sold over 100 million Echo devices, and analyst estimates saying Google has sold about half that amount of Google Home devices, it’s easy to see that we’re getting quite used to interacting with smart devices.  But in order of for people to feel comfortable enough to use these devices to handle certain kinds of more sensitive interactions and services — both personally and professionally — they’ll need even better security to protect their data from bad actors out there.  

Chris Halaschek, vice president of IoT at Pindrop, a pioneering company in voice fraud prevention and authentication, invited me over to the company’s Atlanta-based headquarters to talk about where we are today with security for these kinds of devices, and what can be done to make it safer to use the popular devices to do more things.  

Below is an edited transcript of our conversation.  To see the whole interview — and to see a demo of how voice identification can block people who aren’t you from asking Alexa for your bank information, check out the video below or click on the embedded SoundCloud player.

As Smart Speaker Use Rises, Voice Assistant Security Concerns Do As Well

Small Business Trends:  Okay, hey, this is Small Business Trends, and I’m sitting at the headquarters of Pindrop, and this is a really cool company here in Atlanta. Frequently I wish I could do more in Atlanta. This company is doing some really interesting things around voice and biometrics. I’m sitting here with Chris Halaschek. Chris, thank you for joining me today.

Chris Halaschek:  Yeah, I appreciate it Brent. Thanks for coming in.

Small Business Trends:  So tell me a little bit about you and also a little bit about Pindrop, what you guys do here.

Chris Halaschek:  I am an Atlanta native. I grew up in Atlanta and then moved up North to University of Maryland where I did my PhD in Computer Science. I spent some time in the DC area. I then headed out to the West Coast and dug into technology. I was CTO for a handful of early stage tech companies. I spent all my time building products, bringing those products to market, and then growing and scaling those businesses.

I have been at Pindrop now for the past roughly three and a half or so years. Our focus has always been to bring real time identity, security, and trust to all voice interactions. We’ve typically focused in the enterprise call center, which is predominantly where voice has been, but I think you’ll appreciate that voice is now moving well beyond the telephone channel to interesting devices like smart speakers, automotive, and so on.

Small Business Trends:  When it comes to these new devices, these smart speakers that have voice assistants in them, what is the current state of security, and where does it need to go for it to be adopted at an even higher level than we’re seeing today?

Chris Halaschek:  Yeah. It’s a good question. It’s one we need to be asking, Brent. That’s one of the reasons why I was so interested in us having this conversation because security is usually an afterthought. We’re at a point where the types of interactions that are going to be kind of achievable with these types of devices, they’re going to be a lot more rich, and they’re going to start to expose much more sensitive data. It’s not just going to be listening to music or turning on your lights.

So state of the art right now is probably a best case, if we’re just talking smart speakers, is using a spoken four digit pin. I think any of us will probably appreciate that saying your password out loud is not really advisable. So I think there’s a lot of opportunity to bring stronger forms of identity and authentication to these various sort of voice environments, be that again a smart speaker inside a vehicle if you’re speaking inside of your car, or even into the office setting such as this. There’s the opportunity to get access to business information assuming you can bring along with it proper security, identity, and trust.

Small Business Trends:  One of the things that I think about and a lot of us are thinking about, from your own perspective is how do you get folks like salespeople to use CRM more? Voice seems like an obvious thing for it. But from a standpoint of privacy and security, what needs to happen in order for salespeople and just folks who use business enterprise applications to make sure that the right person is using it and entering the data and accessing the data. What has to happen from a voice biometrics perspective to make it something that companies are going to feel comfortable doing?

Chris Halaschek:  I look at it as let’s say even for me if I’m going to walk into one of our conference rooms where we have a voice enabled device, and let’s say that I want to get access to perhaps some of our CRM related data related to some of our accounts, I need to make sure that because it’s a shared device that I have the right authorization to actually access that information. The opportunity as we see it, and, again, we have historically focused in the call center with both fraud detection and authentication solutions. The way we’ve approached it there I think is a similar way you can approach it in these other voice channels.

Chris Halaschek:  If you look at what we do in the call center today, and, again, I think this will parallel into these other channels, Brent, is that we’re trying to replace the traditional forms of authenticating someone who’s speaking in this voice channel. The way that that is typically done is using something called knowledge based authentication questions. It’s usually in authentication or security parlance something that you know. So it’s my mother’s maiden name, my last four digits of my SSN, maybe a pin or a password.

Again, we mentioned earlier in the conversation about we’re using four digit pins in smart speakers. Similar types of approaches have been used in the call center. The unfortunate reality is that that’s horribly insecure. This type of data is available on secondary markets or black markets. That’s what has led to large numbers of breaches. In the voice channel in the enterprise call centers what it leads to what is effectively today a 14 billion dollar problem in terms of voice fraud loss on that channel.

We see an opportunity. And what Pindrop does is replaces those pins and passwords with your voice, using our voice biometrics technology, which we can talk about in more depth. We have technology to very uniquely and accurately identify the device that’s actually active in that type of voice interaction. So we have technologies that allow us to in a friction free way verify the right voice, right device, right behavior.

If you look at things like smart speakers and me walking into maybe one of our conference rooms and interacting with one of the voice enabled devices there, we see a huge opportunity in taking that same voice biometrics technology to ensure that I’m the right speaker in that particular transaction. Say we use Salesforce and say “Hey, Salesforce, or Hey Einstein, let me know the latest status on the X, Y, Z opportunity”, it’s only going to give it to me because I’ve been the identified speaker, and I have access to that information.

Small Business Trends:  Now you also do things to alert the user that the voice is either authentic or not authentic, or organic or not organic. Talk a little bit about that.

Chris Halaschek:  If you’re looking at voice identity and voice biometrics technology, you know you have to be resilient to the various threat vectors that exist and are using that type of authentication credential. The reality is that bad actors are very smart, and they go to great lengths to kind of get past these types of defenses. So if you’re looking at voice biometrics, you have a variety of different voice spoofing attack vectors that bad actors will try. It’s things like replay attacks where they actually get a recording of you doing some type of interaction, and they go back and try to leverage that recording to get access to this type of system or data.

Other more merging attack vectors are something called synthetic speech generation or voice synthesis. I don’t know if you saw maybe the Google Duplex demo at the recent Google I/O conference.

Small Business Trends:  Yes. I saw it and was amongst the folks who were like, “Whoa, okay. This is interesting.”

Chris Halaschek:  Really cool and at the same time a little scary, right?

Small Business Trends:  Yeah.

Chris Halaschek:  I think from an end user standpoint it can drive a lot of efficiencies, but it kind of does showcase where you can go with synthetic speech generation because the bot on the other end, that was all done in real time with synthetic speech. We have some demos, and I’m happy to show you some of them today, that show just how much you can do with just a couple of minutes of audio that we pull from, say, something like YouTube. Our research team internally has built our own voice synthesis engine mainly to showcase the realities of this type of threat and why you need to protect against it.

We see things like voice distortion. We see things, voice morphing. You will have a bad actor trying to compromise someone’s bank account, and they know that it’s perhaps a female or male account, so they’ll adjust the pitch of their voice so they sound like a male or female.

Small Business Trends:  Yeah.

Chris Halaschek:  So synthetic speech and voice synthesis is something that’s coming that we’ve got to be ready for.

Small Business Trends:  When you think about enterprise applications, software applications, things that even the call center agents are using – this becomes really critical to getting over that security hump that people are legitimately worried about.

Chris Halaschek:  That’s exactly right. If you look at hearing a voice as it comes out of the telephone channel – more towards these smart speakers giving you access to things like unlocking doors in your house, which is now kind of out there – you’ve got to be thinking about these types of threats and protecting against them.

Small Business Trends:  Where are we currently in kind of the maturity of this whole situation with these smart devices and needing security?

Chris Halaschek:  I think we’re still early, which is good, and early in the sense that I think we’re just scratching the surface about the types of interactions we have with these devices. Another reason why I think it’s good is because people are starting to think ahead. We’ve talked to some of our enterprise customers, and they’re looking at bringing out voice skills to the various platforms in 2019. They want to bring richer experiences to those particular channels and environments, but they got to do it in a secure way.

Now, from a technology standpoint I think the technology is there. We just got to get it out there and be thoughtful about how you apply it. I mean, as I look to next year I think you’re going to see more and more enterprises bring these types of experiences into these channels. I think we’re still going to be doing pretty basic things. As some of the security and identity related solutions come to market in these channels, we’re going to start to expose a lot more interesting use cases in data if that makes sense.

Small Business Trends:  How does consumer adoption of smart devices impact what happens in the enterprise? We all know that we all are consumers. We bring things into our house. We start to use them. They become real convenient. Then we start to think, “Oh, gosh, why can’t the way we use enterprise … Why can’t that be as convenient as what we do at home?”

Chris Halaschek:  I think we see a blurring of consumer and enterprise. I think the reality is we all expect compelling customer experiences both from an enterprise standpoint and a consumer standpoint because at the end of the day, you’re right, we’re all consumers. I think if you are an enterprise software company you have to still bring delightful user experiences even to your business consumers. That’s just my philosophy. I think that tide has sort of shifted a while ago. It’s really a question of looking at those business applications, and the data that is exposed for those types of applications in many cases can be viewed as a lot more sensitive.

A lot of the home usage of these voice assistants is still kind of basic, but starting to trend to things like payments and managing, things related to payments or purchases. So you’re going to start to get to more sensitive use cases. We’ve also envisioned where things like financial trading … For me as a retail consumer that may want to do things like stock trading using a voice assistant, we think those will start to come to market.

It’s really about the sensitivity of the data. I think typically on the business side you have security teams that are assessing how you’re going to expose and lock down that information whereas on the consumer side I think at least we’ve started in the smart speaker or some of these voice assistant space. It’s in the confines of your own home, a little bit more of a trusted scenario. But as you bring richer transactions there, obviously you’re going to have to have strong forms of authentication and identity.

This is part of the One-on-One Interview series with thought leaders. The transcript has been edited for publication. If it's an audio or video interview, click on the embedded player above, or subscribe via iTunes or via Stitcher.

Brent Leary Brent Leary is the host of the Small Business Trends One-on-One interview series and co-founder of CRM Essentials LLC, an Atlanta-based CRM advisory firm covering tools and strategies for improving business relationships. Brent is a CRM industry analyst, advisor, author, speaker and award-winning blogger.