Alexa is already one smart cookie. The perennially popular smart assistant from Amazon has quickly become one of the most popular helpers on the market, and is capable of helping its users with everything from controlling their smart homes to answering pressing questions about life to making announcements to the household. But according to Rohit Prasad, the vice president and head scientist of the Alexa division at Amazon, we ain’t seen nothin’ yet. In fact, according to an interview with Prasad published in the Amazon blog, his team has only “scratched the surface of what’s possible.”
Prasad leads Alexa’s research and development in speech recognition, natural language understanding, and machine learning technologies, all in hopes of bettering users’ experiences with Echo devices. Since November 2014, Prasad and his team have shown that far-field speech recognition, even in loud environments, is possible with a high degree of accuracy. The reason for this, the executive says, is that Amazon has managed to develop a series of machine learning algorithms, data, and “immense computing power.” While conversation artificial intelligence (A.I.) has been a topic of interest among researchers for nearly five decades, it has historically been difficult for machines to not only understand, but also communicate in human language. As a result, Alexa’s ability to comprehend and respond to a “wide array of intents” makes her particularly impressive, Prasad noted.
So how exactly does Alexa work? First and foremost, your Echo device listens for a spoken audio cue, which is then converted to text by far-field automatic speech recognition (ASR) in the Amazon Web Services (AWS) cloud. Then, Alexa leverages natural language understanding (NLU) to convert these words into what Prasad calls a “structured interpretation of intent that can be used to respond to the user from the more than 30,000 Alexa skills built by first- and third-party developers.” This interpretation is coupled with certain contexts, like what kind of device the speaker is using, who the speaker is, or the most likely skills capable of providing a response. This context ultimately helps decide what Alexa’s next action ought to be, whether it’s a response or to ask for more information.
Alexa then responds using text-to-speech synthesis (TTS), which helps to translate strings of words into intelligible audio. Of course, the challenge here is to ensure that Alexa responds not only accurately, but quickly as well. As Prasad noted, As scientists and engineers we’re always battling this healthy tension between accuracy and latency from when the user stops speaking to Alexa to when she responds.”
So what is it that makes Alexa more capable than other smart assistants? Apparently, it has a lot to do with the fact that Alexa lives mostly in the cloud, which means the more you talk to her, the smarter she becomes. The smart assistants employs a range of learning techniques, but Prasad pointed out that Alexa “scientists and engineers are continually applying and inventing new learning techniques,” including what’s called transfer learning, which allows Alexa to apply lessons learned from one skill to another, or even one language to another.
And as far as what we can look forward to from not only Alexa, but smart assistants as a whole, Prasad has a few ideas. “A.I. will have deep societal impact and will help humans learn new skills that we can’t even imagine today,” he said. “In the next five years, we will see conversational A.I. get smarter on multiple dimensions as we make further advances with machine learning and reasoning. With these advances, we will see Alexa become more contextually aware in how she recognizes, understands, and responds to requests from users.” Ultimately, Prasad envisions a future in which the smart assistant will be able to engage in conversations regarding current events and other everyday topics. You can already check out what is possible by saying, “Alexa, let’s chat.” You may just be surprised by what you learn.