Thoughts @


jott delivers voice-to-text at last!

jott home page

UPDATE 1/2507: I’ve been using Jott now for about a week and am amazed at its accuracy – especially with names and spelling of certain words. I started to get curious – it was too good to be true. So I did a search and found the following from an inquiry made by another curious blogger, Nitin. Here is the response she got from John Pollard, Jott’s CEO:

You are correct that we use a mix of human and machine technologies. Not only would it be amazing for a machine to have such high accuracy with names, technical jargon, etc., it would be impossible to do it burdened by car noise, random accents, zero grammatical context and the typical low bandwidth cellular phone connection. We are dead focused on making Jott immediately useful, in situations that are realistic.

I have to admit – I’m a bit disappointed. Not by the service, but by the fact that the wizard’s curtain has been pulled back and there is a human managing the process afterall. I once made a robot that had a human inside – spitting the right 3×5 cards out for people when they asked it questions. I was in 3rd grade.

Original Post:

Well color me blind. I have been waiting and waiting for a voice recognition system that allows you to leave a voicemail and send directly to the web. Something that doesn’t garble the text TOO much. Well, it looks as though Jott delivers – at least to email for now. The Seattle-based company has been covered in the blog-dom for at least a month now, and today (after the prompting of a colleage in our office) I signed up and tried the service. IT WORKS!

My first message was to get back to Steve G. (who I need to get back to soon!) and the text transcript came through about 2 minutes later nearly perfect (it missed the words “gotomobile” but I was talking fairly fast.) I remember being excited by OneBox (before they completely sold out) nearly 10 years ago. I was able to call and save voicemails in .wav format and post them on the web. At the time, it was quite revolutionary.

(From TechCrunch on December 10th) Seattle based Jott will launch its new voice to text product sometime this week. It’s very simple – a user calls a specific phone number and leaves a voice message along with a recipient or recipients (an obvious use for Jott will be for people to leave themselves quick notes). The voice message will then be converted from voice into text and delivered via email or SMS. The recipient or recipients can choose between reading the text or listening to the original voice message.

I guess the main competitors in this space are Pinger (that does not currently convert voice to text, but you can bet they are working on it) and UK-based Spinvox. Spinvox also converts spoken voice to (SMS and emails) and posts it to your blog, which is a feature I wrote about a few years ago as a “dream” and have been talking to Microsoft research gurus for a few years more asking when this type of technology would be available.

(from a Spin Vox press release dated May 24, 2006) By converting voicemail messages to text, SpinVox has demonstrated increased call continuity, with an average uplift of 7% in voice traffic and 17% in text traffic; equivalent to over twice the revenue of the entire mobile content industry. SpinVox’s findings come from aggregate results of trials with retailers and network operators.

The timing of these findings is ideal, as a combination of fierce competition and regulatory requirements are currently driving down prices and the subsequent average revenue per user (ARPU) for operators. SpinVox’s data illustrates that through simple innovation there are ways of creating value in voice services today. The significant impact that this has on the bottom line for fixed-line and mobile operators, along with its impact on customer satisfaction, has given SpinVox the opportunity to deploy its service with network operators. Indeed, the Company now expects the first operator to launch voicemail-to-text as an integral part of its core voice services by the end of this year.

If voice to text and voice recognition is finally reaching a point where it can be integrated into our daily lives, everything will change. The iPhone now has its alternate navigational system. Bluetooth headsets will now dial upon command and cars will start when we say so. I guess it may take some time for voice recognition and commands to reach the level of quality and consistency needed for true integration, but we’re really on our way.

Posted on January 19th, 2007 in Thoughts
Tagged as No Tags
Written by Kelly Goto


7 Responses to “jott delivers voice-to-text at last!”

Comments

  1. psmith says:

    If you want voicemail to text in the U.S. I’m using a service called SimulScribe and it’s pretty good. I never have to listen to my voicemail because I read it on my phone and I have unlimited voicemail storage.

  2. e-Speaking says:

    You aren’t limited to just text and email. What about the possibility of using the service to interact with web-based applications. For example, you could use Jott’s technology to enable you to connect to a web server and access web-based applications. Initiate processes. Execute CGI/ASP scripts. Use it to connect to your house’s computer system to perform home automation tasks.

    http://www.e-Speaking.com

  3. Kelly Goto says:

    I just posted an update - I was a bit “too” amazed at the accuracy of the transcription. My mom isn’t even that good. Also, there was a significant lag time between leaving the message and getting the email/post. So I did a search and found out others were wondering the same thing - and Jott’s CEO admitted to a ‘mix of human and technology’ in the transcription process. I was wondering how this could work so well on a mobile device when friends (in the know) have said accuracy (for mobile blogging as an example) is still pretty far down the road.

  4. Bobhoppe says:

    I use the services offered by Spinvox. Im very impressed with the speed and accuracy of their service … As far as internet applications, spinvox’s speak a blog is a very cool blogging tool. I played around with it and it is scarily accurate. I believe that they also use software to convert your voice instead of human transcription.

  5. SM says:

    Loot at their UK Patent.
    They clearly state that humans transcribe !!!!!!

    http://v3.espacenet.com/textde.....=GB2420943

  6. Kelly Goto says:

    SM - it does make a difference leaving a message knowing others are listening. It’s disappointing for sure to know that the service is potentially not scalable and the quality will most likely go down (as will the speed of the transcription) once the service becomes more popular. It reminds me of KOZMO the 7-Eleven service at your door that delivered diet coke, a magazine and chocolate for FREE any time in the city. Then as the service became more popular, the courier service declined, became slow and sloggy and finally you had to pre-schedule ‘windows’ of time for delivery. Eventually the company failed completely - human services aren’t scalable.

  7. joe says:

    Human services are scalable if you keep adding more humans :-)

Related Stuff