Skip to content

Artificial Intelligence-Powered Assistants Potentially Transform Siri and Alexa into Practical Aids

In the upcoming year, progress in artificial intelligence and speech innovations might finally deliver on promises that tech titans have been promising for over a decade.

Apple's Advancement in Artificial Intelligence: Focus on Siri
Apple's Advancement in Artificial Intelligence: Focus on Siri

Artificial Intelligence-Powered Assistants Potentially Transform Siri and Alexa into Practical Aids

In 2016, when Google's freshly appointed CEO Sundar Pichai introduced the Google Assistant as part of his "AI-first" strategy, he advertised the nascent voice assistant as a tool to help users accomplish tasks. He explained in a blog post that the Google Assistant "allows you to get things done, bringing you the information you need, when you need it, wherever you are."

This ambitious goal, however, has largely gone unmet. Frequently, the software struggles with requests, resorting to a web search and apologetically admitting its limitations. This situation drove people to confine voice assistants to basic tasks such as setting timers, playing music, or controlling lights. Amazon's Alexa, released a decade earlier, hasn't shown much improvement. Siri, the pioneer, introduced by Apple in 2011, has been heavily criticized.

As generative AI has gained popularity over the past two years, it has paved the way for AI "agents": software specifically designed to complete tasks on behalf of users, such as booking reservations or making online purchases. As Pichai calls it, the "agentic era" is set to make its entrance in 2025, offering the potential for voice assistants to finally live up to their initial promise and function as personal assistants.

Instead of merely reciting your daily schedule, like Google Assistant currently does, it might even be capable of scheduling meetings, reaching out to contacts, and coordinating times that work for both parties. Voice assistants like Google Assistant, Alexa, and Siri could potentially book flights and hotels for vacations, acting as digital travel agents with little more information than trip dates and destinations.

Agents are the latest craze in the tech industry, with more than 470 platforms dedicated to this technology, according to Forrester research. This range includes big tech giants as well as smaller startups like LangChain, CrewAI, and Play.ai. Beyond consumer features, they could potentially transform businesses by offering agents for customer service or software development. AI agent startups have seen a more than 81% increase in deal count over the past year, and more than $8 billion has been invested in the field, according to PitchBook.

"The race is on," said Steve Jang, an investor on the Midas List and founder of Kindred Ventures. "Startups will be competing with the established platforms to orchestrate this at much higher fidelity. And who can create much more humanistic and realistic voices and conversations, and access the data and actions we all want."

The major tech voice assistants are best positioned for an AI jumpstart. Google has Gemini to bolster its voice searches. Apple partnered with OpenAI to utilize ChatGPT for some Siri queries. And in the past year, Amazon invested $8 billion in Anthropic, developing the potent Claude chatbot. Google declined to make any of its executives available for interviews. Apple and Amazon did not respond to interview requests.

"I only use Siri for trivial things that I know it's not going to mess up."

Jang believes the real advancements will emerge from voice AI models. Unlike large language models, which power services like ChatGPT, voice models are not trained on text and then read aloud by the software. Instead, voice models are taught on actual voice audio, enabling them to detect subtleties in speech, such as cadence or emotional cues. Jang has invested in Play.ai, which focuses on voice agents, competing with companies like ElevenLabs, OpenAI, and Google working on voice models.

Some, however, doubt that agents will significantly enhance the performance of major voice assistants. Kanjun Qiu, founder of Imbue, which creates agents for coding software, thinks adding more AI to these products will only produce "incremental" improvements. She argued that new AI features still won't inspire enough trust for people to rely on them. "Delegation as a paradigm is actually quite challenging for people," said Qiu. "I only use Siri for trivial things that I know it's not going to mess up."

But she believes advancements in AI and voice technology will benefit consumers in other ways. For instance, more apps will incorporate voice features, Qiu predicted. With improved latency and natural language understanding, users will be able to instruct an app to carry out a specific action, such as returning a pair of shoes that are too snug, according to Qiu, an engineer by training who has developed an app to transform her ramblings into a to-do list.

Improvements in AI and voice technology could also facilitate unrealized hardware ambitions in Silicon Valley. More than a decade ago, Google notoriously flopped when it launched Google Glass, a smart eyewear that sparked privacy concerns and was not particularly helpful. Earlier this month, the company unveiled new prototype glasses intended for Project Astra, Google's platform for AI agents. During a demo, the glasses, which are voice-controlled, retrieved a door code from the wearer's email just by looking at the entry keypad. The technology could also display route information for the bus ahead or highlight points of interest, such as an art sculpture.

Meanwhile, Facebook's Orion glasses, unveiled recently, incorporate voice and hand gestures to manage AI functionalities. For instance, you could examine the assortment of items in your kitchen cupboard and command the tech to pull up a recipe using those components.

Voice-driven innovations have the potential to expand technology's reach. Not everybody can read, write, or type, but plenty of individuals have the capacity to speak. Furthermore, it's a popular choice among the youth: 42% of U.S. millennials (aged 18-29) exchange voice messages in their messaging apps at least once a week, according to a survey conducted by YouGov and Vox.

Progress in AI might further popularize voice technologies and alter the way people engage with their devices. "It transforms voice assistants — and voice itself — into an impressive new user interface that has been untapped in computer science thus far," Jang stated.

Google's CEO Sundar Pichai, from tech giant Alphabet, envisioned Google Assistant as a powerful tool in 2016, capable of accomplishing tasks for users. However, voice assistants like Google Assistant, Alexa, and Siri often fall short, resorting to web searches and admitting their limitations.

In the era of generative AI, AI "agents" are rising, with startups like LangChain, CrewAI, and Play.ai joining the tech giants in this field. These agents could potentially schedule meetings, book flights, and act as digital travel agents, transforming voice assistants into personal assistants.

The tech industry is seeing a surge in interest in AI agents, with more than 470 platforms dedicated to this technology and millions of dollars invested. Google, Apple, and Amazon are among the major companies investing in technology to enhance their voice assistants' performance.

Voice models, trained on actual voice audio, could potentially detect subtleties in speech, such as cadence or emotional cues. Investors like Steve Jang are betting on startups and companies working on voice models, like Play.ai, to create more humanistic and realistic conversations.

Despite these advancements, some believe that AI and voice technology won't significantly enhance the performance of major voice assistants, only producing incremental improvements. Kanjun Qiu, founder of Imbue, argues that even with new AI features, people may not yet trust these products enough to rely on them for delegation.

Read also:

    Comments

    Latest