Monday, 30 April 2012

Fear Perfect Voice Recognition Software. FEAR!

I was chatting with a friend recently regarding their "smart" phone's voice recognition software. We had a good laugh, mostly about its inability to recognize names.  My first encounter with voice recognition was when I received a phone call from my brother (not the strangest one ever, might I add), that went something like this:

Me: Allo!

Bro: Um, you're not my wife.

Me: ... no. Ew.

Bro: I was trying to call my wife.

Me: Hello?  Hellloooo?

My brother is uncouth. I obviously won out with the couthness gene (truly). He informed me, days later, that he had been testing voice recognition on his car's blue tooth system.  I'm not sure how exactly that worked out, since my name is "Marie" and my sister-in-law's is "Jessica."  I figure he shortened it in his address book to "Jessy" or "Wessy," as she's not one to stand on three-syllable formalities. Or he calls her something else in his address book that I really don't want to know about. Regardless, not close enough to be that easily confused with my name.

Almost every time a new operating system comes out on a "smart" phone, they claim the voice recognition software has gotten better.  (I'm sorry, I can't write that without the quotation marks. If you think a phone is smart, you need to get yourself some smarter friends, is all I'm saying.)

Oh ya?  Sure, why not.  But here's the scary truth (as I see it. It's my blog).  To have perfect voice recognition software can only mean two things.  (I was with a bunch of people at a pub and we came up with a third, but I seriously don't recall, possibly because of those awesome caesars...  It worked with category one, though, so we'll let that stand.)

1. Computers are so smart they adapt to language, dialect and contractions

Oh ya.  No two ways about it.  Language, my friend, is a complex thing.  To truly have voice recognition software that is perfect, and by that I mean you don't have to use canned sentences, you don't have to speak - exactly - like - so - enunciating - e-ve-ry-sy-lla-b-le-per-fect-ly, you don't have to hide you regional dialect, AND you stick to actual ways of speaking, not that canned dictionary shit.

Heck, even Speak Like a Pirate Day should be no deterrent to perfect voice recognition software. Why would it be?  Words are words, right?

Not so. Words are sounds.  Words are living organisms affected by the speaker, and grammar rules need not be obeyed.  Sounds roll and trickle and tumble away in a cacophony of colour echoed by walls, picked up by nearby ears, reflected by windows.  Words, as it stands, are not static, can be screamed or chanted, sung or hurled.

Words are sounds, and sound do not obey the simple "from my mouth to your metalic ear" scenario. Sound picks up other bits from its surroundings and carries it to destination.  Sound always checks extra baggage, unless you're in a perfectly hermetic room with no echoes.  Sound likes, heck, loves company. And sound loves to dance, and the human mouth loves to play with words and make them stern, sexy, angry, loving.  The human mouth and words are long-time lovers, let's face it, and we grow familiar with our lovers. We get lazy about the perfection of the affair. So our words suffer for their familiarity.

When I had an iBook G4, a few years back, I could ask it to tell me a joke. I had to be very precise in how I said it.  "Tell • Me • A • Joke."  Granted, that was a few years ago, and all it could tell me were knock knock jokes (and not very funny ones, at that), but still.  It was a pain.

So, if computers understand us all, all the time (which I certainly don't), then it's smart without quotations.  And all languages, too. Remember, we're discussing perfect voice recognition.  No compromise.

It's a Skynet level of intelligence.  Even on Star Trek the computers didn't get everything. It had its little "I totally have no clue what you just babbled" noise.  And it never tried to destroy humanity, either (well, except when taken over by an evil/sexy intelligence, but that's a few other story lines).

Do you want to live in Skynet's world?  I didn't think so. So dial up your numbers, already.

Okay, what's the second scenario, then?  In what other circumstances could we have achieved this?

2. The world has become such a homogenized place that we all speak the same way

Scary, ain't it?

English is my second language.  At this point in my life, I think it's safe to say that I'm perfectly bilingual.  But throw in any accent, and I'm in trouble. Not so in French.  I can understand all sorts of accents in my mother tongue, can identify where people are from, and can replicate them.

I can't in English. I can't do an accent to save my life (thankfully, it's never come to that). Heck, I can barely do a French Canadian accent, unless I'm tired and it slips in naturally.

I remember in university, watching a film on anti-semitism.  A German speaking man walked us through a Concentration Camp.  I turned to Roomy (in the early days of our friendship, but she was already a Roomy by then), and she was nodding along.  Now, we were taking German classes at that point, since so many archaeological papers are written in German, but still, I was impressed with how much more advanced she was than me.

She looked at me, saw my look of admiration, and said: "He's speaking English, you know."

She always tries to make me feel better about that by saying he did have a strong accent, but I think she's just trying to be nice.

So which one is scarier?  Super intelligent computers or a homogenized human population?

I dunno about you, but I'm willing to fight Skynet, if it means avoiding scenario 2.