Ruling the World one Disembodied Voice at a Time

The Future of Voice and the Implications for News is a report which was published by the Reuters Institute in November 2018. Any title that contains the words ‘future’ and ‘voice’ is enough to enthuse any journalist, and Nic Newman’s summary is well worth the read. As I logged into Wednesday’s webinar on this very topic, I reflected that, once again, I was at the disposal of a disembodied voice. Refreshingly, Newman’s engaging presentation style is far more palatable than the programmed voice of the home.


Before 2015, the idea of our houses being ruled by faceless voices seemed more Star Trek than reality. Yet, since Amazon’s release of the first hands-free speaker in 2014, the use of virtual assistants has doubled each year. No sooner had Alexa been welcomed into the bosom of the family with the parrot having got the hang of how to use it,  than other tech giants got in on the act. the Google Home, Cortana from Microsoft, Samsung’s Digsby and the Apple Homepod joined the list of smart speakers that are changing our society one task at a time.


I bought the Amazon Echo at the beginning of 2017 and, after the faff of setting it up using the Amazon Assistant app on my phone, spent a fruitless month looking through the list of skills before realising that I would never view them all because Alexa is constantly being evolved to do new things. It’s disappointing, then, that she still knows so little, and that she often has to be asked to open a skill before doing anything of use. She spent eighteen months in our living room playing the radio, a hypnosis podcast that I was too scared to finish, sound effects, and, like 84% of users, I have asked it to play my music. Then she was moved to the kitchen when I bought the Google Home, and now sets timers and reminders, plays games with me when I cook, fails to find the recipe of anything I’d want to eat, gives me garbled cooking instructions, and plays a lot of radio. During meal conversation, one of us will ask her a question to which she invariably doesn’t know the answer. I will then walk a few steps to the doorway and shout to my Google Home in the sitting room. Google provides the answer we want. In fact, the only thing I can say Alexa does that Google can’t involves an amusing skill called Ditty, which sings what you say to it. Perhaps it’s no surprise – this is Google we’re talking about, after all – although Alexa is the most popular of the smart speakers in UK homes. Newman’s findings predict that this hands-free technology will soon extend to cars and other places outside the house. I think a battery-operated smart speaker would be amazing, as the freedom to control the playback of music and podcasts with soapy hands cannot be underestimated.


Nic Newman began the webinar by demonstrating three of the smart speakers. He asked them to find different pieces of information, when I thought it would have been more effective if he had given the same command to each. Still, Reuters aren’t trying to play one off against another, and Newman was quick to point out their overarching benefits. The first of these is the view that smart speakers have enabled an older demographic to engage with the tech, even if they dofind it difficult to get their heads around! Google’s function of enabling the user to tell them where they have left important things, and Alexa’s skills for people living with dementia are examples of how the older generation have been thought about. During his research, Newman has encountered people who have never been able to use a smartphone, yet quickly get the hang of how a virtual assistant works. No mention was made of the assistants’ reliance on their associated app; not only is this required to connect the device to internet, but the speaker will regularly send information to it, which the user is expected to read. Newman unintentionally created a paradox by mentioning their appreciation for a device which does not require gazing at screens, yet companies are now experimenting by releasing smart speakers with a visual interface. Perhaps even the big Apple and know-it-all Google struggle to get their heads around what we want.


The report’s findings suggest that the smart speaker appeals to so many users because they dislike the labyrinthine nature of the great drain on time known as the worldwide web. However, the report seems to overlook the fact that disembodied voices can take you down rabitholes too. I’m referring to the confusing process of trying to get the speaker to understand what you want. This often involves asking for things in a particular way. I could say to Google “Postcode of Cadbury World”, yet I would have to ask “what is the postcode for Cadbury World?” for Alexa to give me the information. Subtle difference, but not so subtle when you’re dashing around getting ready for work, and you need to scribble something down before you leave.


Having these devices around the house creates all sorts of mix-ups. I once asked for the definition of a word, and the device told me about a band with the word as its title. Newman talked about the importance of the smart speakers being able to have a conversation with you. For me, Google comes out top again, as Alexa can’t remember the last question she answered, and Siri only has limited capacity for this. If I ask Google for a synonym (something I’m doing every five minutes of my writing this) and I then ask it to spell the alternative word it has given, I can’t get it to tell me. Following up my original question with “hey Google, spell it” results in the response “it is spelt I t.” In frustration, I once said, “ok Google, spell the word you just said to me” and it replied, “the word you just said to me is spelt t h e w o r d…” etc. It’s like talking to an obnoxious child. I have to get the synonym I want, and then ask Google for the spelling by including the word in my request. There was one strange occasion where Google remembered its previous answer for hours so that whenever I asked it something, it informed me that “Aston Villa lost in the match against Arsenal.” I’ve asked my devices to ‘ban my boyfriend from talking about football’ – no response there! The point is that this stuff is confusing, frustrating, and laborious. Companies are aware of this, so watch this space.


It hasn’t taken long for the concerns around an ever-listening disembodied presence to emerge. Privacy is a big worry, particularly in light of Alexa launching her creepy laugh on unsuspecting ears. I asked her to demonstrate it and she said “sorry, I don’t know about that.” These voices will join in your conversations at odd times – the strangest being when I was talking about taking the dog for a walk and she played the sound of a dog barking. She regularly mistakes “I’ll ask her”, or “I’ll take her” for her name; perhaps it’s my Northern accent! Google was once playing a podcast which was co-presented by Dougal. When his colleague said, “hey Dougal”, it set Google off! I cannot understand why people are panicking about what their smart speakers hear and report back when they’ve been carrying phones around with them for years that have the same capability. Your phone has a microphone and your location tracked, which is what a smart speaker would use. I once talked to my friend who works at GCHQ about this. They said that the security services don’t have time to listen to all the recordings created by these devices – there would be no server big enough, for a start – so they pick up on certain words, like a wake word that alerts them of potential danger. If my devices are listening to me every day, they would tell you that I’m obsessed with chocolate, cake, and guinea pigs. The message here is that there’s nothing to worry about if you have nothing to hide.


It won’t have escaped your notice that I have referred to my Google Home as ‘it’ and Alexa as ‘she’. This didn’t escape Newman either. The findings in his report suggest that the gendering that is singularly applied to Alexa is due to her having more personality as an assistant than her competitors, therefore increasing her integration of function in the family home. Toddlers can talk to her, although shedoesn’t understand what they’re saying.

Part of this speculation arose from his data on the number of users who insert words like ‘please’ and ‘thank you’ into their requests. That indicates less to me about Alexa’s personality, and more about users’ initial reaction when talking to something that doesn’t exist in a human sense. Consider the names of these devices, and you will find that Alexa is the only one with a human name – until we welcome the next generation of Googles, Pods, Cortanas and Digsbys, anyway. It seems Amazon properly thought about this one and allowing the user to change the wake word was a nice touch too. With a choice between Alexa, Amazon or Echo, we can see how they went for three unique female names!


The webinar’s focus was to examine how these devices will affect the news industry and my head was full of questions by the time Newman devoted the last portion of his presentation to discussing this. A mere 1% of slaves to disembodied voices use them to listen to the news, although I wonder whether this statistic includes the larger portion of people who stream the news through playing live radio. Each device enables the listener to hear a three minute ‘flash briefing’, which they can customise in the app. I once added The Economist news to mine, only to discover that the briefing had become an hour long because it stuck their podcast on the end of the news. Newman found that people want more information and control to customise the device’s reported news to suit the reader’s tastes, preferably in a minute-long bulletin. 60 seconds of news summary – that’s a challenge for the industry.


These views have become so extreme that Newman’s report describes smart speakers as an ‘existential threat’ to broadcasters. I would be surprised if there were a time in the next decade when we would actively choose to hear information from a synthesised voice over a human who happens to know a bit about talking proper. For a start, a person doesn’t require you to repeat the same question 12 times because the internet is poor that day. If smart speakers continue to grow in popularity, I predict that society will either talk in the way the devices require them to, or they will die out because learning the art of conversation takes a lot of time and money.


I now have an Alexa and two Googles in my house, plus Siri on my phone. As a result, my Maths skills have declined and I have been known to say “Alexa, what’s 9.5…” outside of the house. There’s no denying that voice-activated machines have taken off as inventer and futurist Ray Kurzweil once predicted. He also suggested that the rate of technological development is so powerful that it grows exponentially each year. If I could put one question to Reuters, I would be interested to know if they think this will be the case for our disembodied voices.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s