In Silico Vox: Speech Recognition in Silicon


Lab Researchers: Patrick Bourke, Jeffrey Johnston, Edward C. Lin, Kai Yu

Collaborators: Richard M. Stern, CMU; Tsuhan Chen, CMU

Whether running on a single cell phone, a conventional PC, or an enterprise-level server farm – all of today’s state-of-the-art speech recognizers exist as complex software running on conventional computers. This is profoundly limiting for applications in which speed or mobility are essential. We need recognizers which can run significantly faster than realtime, to search large online media streams for keywords. We need desktop-quality recognizers to evolve off our desktops into the small, power-limited appliances we carry in our pockets. To do this, we must move the core of today’s most successful speech recognition strategies directly into silicon. This is the path taken by critical tasks such as graphics, which have seen performance improvements of six orders of magnitude over the last decade. The CMU “In Silico Vox” project is developing a range of custom architectures for speech recognition. A recent example is our working FPGA-based prototype which handles a 1000-word vocabulary, and is, to the best of our knowledge, the most complex recognizer ever rendered completely in hardware.

Key Papers/Talks

  • “The Talking Cure,” The Economist (Technology Quarterly Issue), pp 11, March 12 2005. pdf
  • Patrick Bourke, Rob A. Rutenbar, “A High-Performance Hardware Speech Recognition System for Mobile Applications,” Proc Semiconductor Research Corporation TECHCON, Aug. 2005. pdf
  • R.A. Rutenbar, E.C. Lin, K. Yu, T. Chen, “In Silico Vox: Toward Speech Recognition in Silicon,” 2006 (Eighteenth) Hot Chips Symposium, August 2006.
  • Stephen Shankland, “Chips Promise to Boost Speech Recognition,” CNET, Aug 22, 2006.
  • Edward C. Lin, Kai Yu, Rob A. Rutenbar and Tsuhan Chen, “Moving Speech Recognition from Software to Silicon: the In Silico Vox Project,” Proc. International Conference on Spoken Language Processing (InterSpeech2006), September 2006. pdf
  • Linda Daily Paulson, “Speech Recognition Moves from Software to Hardware,” IEEE Computer, pp 15-18, Nov. 2006.pdf
  • Edward C. Lin, Kai Yu, Rob A. Rutenbar and Tsuhan Chen, “A 1000-Word Vocabulary, Speaker- Independent, Continuous Live-Mode Speech Recognizer Implemented in a Single FPGA,” Proc. ACM International Symposium on FPGAs, Feb. 2007. pdf
  • Rob A. Rutenbar, “Toward Speech Recognition in Silicon: the Carnegie Mellon In Silico Vox Project,” Invited Talk, 2007 Brammer Memorial Lecture, Wayne State University, Oct 2007. pdf
  • K. Yu, R.A. Rutenbar, “Generating Small, Accurate Acoustic Models with a Modified Bayesian Information Criterion,” Proc. Interspeech 2007, August 2007. pdf
  • Patrick Bourke, Rob A. Rutenbar, “A Low-Power Hardware Search Architecture for Speech Recognition,” Proc Interspeech’08, Sept. 2008. pdf

