Discussion:
voice/speech recognition
Jaco van der Merwe
2010-10-26 21:00:16 UTC
Permalink
Hi guys,

Has anyone had success with voice/speech recognition on the desktop
(specifically Ubuntu)?

There are times my hand are full or I'm otherwise occupied, but need to dictate
mail or fill in web-forms.

If there's heavy-lifting to be done, I would prefer handing off the workload to
a server, so that I'm able to implemenent agents on multiple clients, and so not
overburden them (some being pretty "lightweight", such a a netbook)

Any ideas that are simple implement?

- J





_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Nevyn
2010-10-27 04:09:28 UTC
Permalink
Post by Jaco van der Merwe
Hi guys,
Has anyone had success with voice/speech recognition on the desktop
(specifically Ubuntu)?
There are times my hand are full or I'm otherwise occupied, but need to dictate
mail or fill in web-forms.
If there's heavy-lifting to be done, I would prefer handing off the workload to
a server, so that I'm able to implemenent agents on multiple clients, and so not
overburden them (some being pretty "lightweight", such a a netbook)
Any ideas that are simple implement?
- J
It's something I look up every now and again but have found the
situation has remained much the same for the last 6-7 years (and
probably much the same before then). The open source offerings have,
for the most part, been frame works rather than anything suitable for
an end user (and let's face it, this technology is very much for the
end user). I know there's a few bits and pieces for the KDE desktop
which will do thinks like recognise a phrase (hopefully) and launch an
application but in terms of dictation, there was nothing that struck
me as "very cool" (about as close to metrics on my opinion as I'm
going to get).

Just to be clear on my position - I'm not a big fan of the technology.
It's a curiosity thing.

Regards,
Nevyn
http://nevsramblings.blogspot.com/

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Mark Foster
2010-10-27 10:06:50 UTC
Permalink
Post by Nevyn
Post by Jaco van der Merwe
Hi guys,
Has anyone had success with voice/speech recognition on the desktop
(specifically Ubuntu)?
There are times my hand are full or I'm otherwise occupied, but need to dictate
mail or fill in web-forms.
If there's heavy-lifting to be done, I would prefer handing off the workload to
a server, so that I'm able to implemenent agents on multiple clients, and so not
overburden them (some being pretty "lightweight", such a a netbook)
Any ideas that are simple implement?
My first thought on reading your email Jaco, is that i'mi not sure how
you'd be intending to hand off the processing of an audio clip to a remote
server. You're talking about something either recording your voice and
then processing the recording by samples, or offloading in realtime over
an IP network (ala VOIP).... I see server-side attempts of doing this via
a PC application as problematic.

I am however not qualified to assert that as anything other than an
uninformed opinion.
Post by Nevyn
It's something I look up every now and again but have found the
situation has remained much the same for the last 6-7 years (and
probably much the same before then). The open source offerings have,
for the most part, been frame works rather than anything suitable for
an end user (and let's face it, this technology is very much for the
end user). I know there's a few bits and pieces for the KDE desktop
which will do thinks like recognise a phrase (hopefully) and launch an
application but in terms of dictation, there was nothing that struck
me as "very cool" (about as close to metrics on my opinion as I'm
going to get).
Just to be clear on my position - I'm not a big fan of the technology.
It's a curiosity thing.
There are people who get a lot out of it - a colleague of mine at Polytech
was dislexic and had managed to get ~99% accuracy out of Dragon Naturally
Speaking with some training. The issue is the training, you need to teach
the system about your voice and your nuances, which can take a while -
especially if you're using it only on an ad-hoc basis.

You could track down @vavroom on Twitter and see if he has an opinion from
an accessibility POV, perhaps.

Mark.

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Jaco van der Merwe
2010-10-27 20:12:38 UTC
Permalink
thanks for the feedback, guys.
I'll look into those suggestions.

re Libre solutions: at this stage, I'm not overly concerned with a 100% open
solution, as long as it does the trick. I'll commit to a Libre solution once
it's viable & usable.

re hand-off, I was thinking of how google's "cloud" solution works for the
'droid mobiles. The audio gets recorded on the client, submitted to the back-end
cloud for processing (which I'm lead to believe is a pretty hefty process), &
the results returned.
If the processing-requirement for speech-recognition is more than my puny
netbook is capable of, then handing it off to another server (I have a internal
VoIP gateway, so realtime is possible) seems the smarter thing to do (only
thinking out loud here)

It's a pity about this state of affairs, since it's my understanding this tech
is pretty mature on win & mac, and I was hoping to avoid dragon on wine

cheers

- J





_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Nick Rout
2010-10-27 20:24:51 UTC
Permalink
On Thu, Oct 28, 2010 at 9:12 AM, Jaco van der Merwe
Post by Jaco van der Merwe
thanks for the feedback, guys.
I'll look into those suggestions.
re Libre solutions: at this stage, I'm not overly concerned with a 100% open
solution, as long as it does the trick. I'll commit to a Libre solution once
it's viable & usable.
re hand-off, I was thinking of how google's "cloud" solution works for the
'droid mobiles. The audio gets recorded on the client, submitted to the back-end
cloud for processing (which I'm lead to believe is a pretty hefty process), &
the results returned.
Ahh so thats why when I tried it the other day it said "no network
available" and failed!
Post by Jaco van der Merwe
If the processing-requirement for speech-recognition is more than my puny
netbook is capable of, then handing it off to another server (I have a internal
VoIP gateway, so realtime is possible) seems the smarter thing to do (only
thinking out loud here)
It's a pity about this state of affairs, since it's my understanding this tech
is pretty mature on win & mac, and I was hoping to avoid dragon on wine
cheers
- J
_______________________________________________
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Daniel Pittman
2010-10-27 23:50:01 UTC
Permalink
Post by Nick Rout
On Thu, Oct 28, 2010 at 9:12 AM, Jaco van der Merwe
Post by Jaco van der Merwe
re Libre solutions: at this stage, I'm not overly concerned with a 100%
open solution, as long as it does the trick. I'll commit to a Libre
solution once it's viable & usable.
That will be a ... long way off, basically. Even the commercial stuff on
Linux gets you a choice of expensive or bad. :/
Post by Nick Rout
Post by Jaco van der Merwe
re hand-off, I was thinking of how google's "cloud" solution works for the
'droid mobiles. The audio gets recorded on the client, submitted to the
back-end cloud for processing (which I'm lead to believe is a pretty hefty
process), & the results returned.
Ahh so thats why when I tried it the other day it said "no network
available" and failed!
It absolutely is, and as far as I know it is based on the same technique as
their text translation system:

They have literally petabytes of samples of speech along with what it
translates to; they simply not-exactly-brute-force it to identify the
candidate. (This is definitely how they do the text stuff; it is the specific
audio bits that I am unsure of.)

This is pretty much the opposite approach to the desktop things, which all aim
to understand the speech, and the common logic about how to do this, which is
that you need to have a sensible model for it.

Google don't bother trying to understand, just use the enormous volume of data
to help identify the statistically most likely candidate result for "correct".
Post by Nick Rout
Post by Jaco van der Merwe
If the processing-requirement for speech-recognition is more than my puny
netbook is capable of, then handing it off to another server (I have a
internal VoIP gateway, so realtime is possible) seems the smarter thing to
do (only thinking out loud here)
It's a pity about this state of affairs, since it's my understanding this
tech is pretty mature on win & mac, and I was hoping to avoid dragon on
wine
Since several folks at my office have RSI recently[1] I have some feedback,
and they say that it is ... OK, if what you do is write "Word Documents", but
not very great for anything else.

Regards,
Daniel

Footnotes:
[1] I suspect this is the age group rather than ergonomics, since the do a
very good job of the later. Oh, well.
--
✣ Daniel Pittman ✉ ***@rimspace.net ☎ +61 401 155 707
♽ made with 100 percent post-consumer electrons

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Mark Harris
2010-10-28 00:12:51 UTC
Permalink
Post by Daniel Pittman
Since several folks at my office have RSI recently[1] I have some feedback,
and they say that it is ... OK, if what you do is write "Word Documents", but
not very great for anything else.
I'd agree with that, having used it on Win and OSX. Does the job, but by
no means perfected. Which, given the state of play 5 years ago vs. now,
shows how bloody hard speech recognition actually is.

~mark

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Daniel Pittman
2010-10-28 01:00:56 UTC
Permalink
Post by Daniel Pittman
Since several folks at my office have RSI recently[1] I have some feedback,
and they say that it is ... OK, if what you do is write "Word Documents", but
not very great for anything else.
I'd agree with that, having used it on Win and OSX. Does the job, but by no
means perfected. Which, given the state of play 5 years ago vs. now, shows
how bloody hard speech recognition actually is.
*nod* I think that the biggest thing that said that to me, other than the
lack of commercial success, was the fact that using what amounts to brute
force is comparable or better than the intelligent options.

At least it has gone from "terrible with training" to "terrible without
training" in that time, I guess. :)

Daniel
--
✣ Daniel Pittman ✉ ***@rimspace.net ☎ +61 401 155 707
♽ made with 100 percent post-consumer electrons

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
cr
2010-10-28 07:23:48 UTC
Permalink
Post by Daniel Pittman
Post by Mark Harris
Post by Daniel Pittman
Since several folks at my office have RSI recently[1] I have some
feedback, and they say that it is ... OK, if what you do is write "Word
Documents", but not very great for anything else.
I'd agree with that, having used it on Win and OSX. Does the job, but by
no means perfected. Which, given the state of play 5 years ago vs. now,
shows how bloody hard speech recognition actually is.
*nod* I think that the biggest thing that said that to me, other than the
lack of commercial success, was the fact that using what amounts to brute
force is comparable or better than the intelligent options.
At least it has gone from "terrible with training" to "terrible without
training" in that time, I guess. :)
Daniel
I think the problem with the 'intelligent' options is that people just don't
speak in a self-consistent, coherent, logical fashion. (Quite aside from the
problem of accents and mumbling). So trying to apply logical rules of
syntax to the input just doesn't work, because the input is constantly
bending the rules. When we listen to someone speaking, not only is most
speech highly redundant but we only hear about half of it anyway and our
brains fill in the gaps by context and inference. Which is probably closer
to Google's brute-force approach than to the 'intelligent' approach.

I suspect it's a much harder problem than OCR.

cr

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug

Bruce Clement
2010-10-27 23:51:33 UTC
Permalink
Despite starting with "There is currently no open-source equivalent of
proprietary speech recognition software (e.g. Nuance's Dragon
NaturallySpeaking or Windows Speech Recognition) for GNU/Linux.
However, there are several incomplete, open-source projects and
solutions that could be used to attain some elements of speech
recognition in the free operating system. It is also possible to use
Windows speech recognition software under GNU/Linux."
http://en.wikipedia.org/wiki/Speech_recognition_in_Linux goes on to
provide a list of available products ... obviously in various degrees
of completion

The one I've been meaning to have a good look at is the CMU Sphinx
series which do seem to be under active development:
* Wikipedia http://en.wikipedia.org/wiki/CMU_Sphinx
* Unofficial wiki, including quick start guides
http://sphinx.subwiki.com/sphinx/index.php/Main_Page
* Homepage http://cmusphinx.sourceforge.net/

HTH

Bruce
--
Bruce Clement

Home:    http://www.clement.co.nz/
Twitter:    http://twitter.com/Bruce_Clement
Google Buzz: http://www.google.com/profiles/aotearoanz

"Before attempting to create something new, it is vital to have a good
appreciation of everything that already exists in this field." Mikhail
Kalashnikov

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Loading...