Purpose: Invalidity Analysis


Patent: US7979277B2
Filed: 2004-09-14
Issued: 2011-07-12
Patent Holder: (Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd
Inventor(s): Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds

Title: Speech recognition circuit and method

Abstract: A speech recognition circuit comprising a circuit for providing state identifiers which identify states corresponding to nodes or groups of adjacent nodes in a lexical tree, and for providing scores corresponding to said state identifiers, the lexical tree comprising a model of words; a memory structure for receiving and storing state identifiers identified by a node identifier identifying a node or group of adjacent nodes, said memory structure being adapted to allow lookup to identify particular state identifiers, reading of the scores corresponding to the state identifiers, and writing back of the scores to the memory structure after modification of the scores; an accumulator for receiving score updates corresponding to particular state identifiers from a score update generating circuit which generates the score updates using audio input, for receiving scores from the memory structure, and for modifying said scores by adding said score updates to said scores; and a selector circuit for selecting at least one node or group of adjacent nodes of the lexical tree according to said scores.




Disclaimer: The promise of Apex Standards Pseudo Claim Charting (PCC) [ Request Form ] is not to replace expert opinion but to provide due diligence and transparency prior to high precision charting. PCC conducts aggressive mapping (based on Broadest Reasonable, Ordinary or Customary Interpretation and Multilingual Translation) between a target patent's claim elements and other documents (potential technical standard specification or prior arts in the same or across different jurisdictions), therefore allowing for a top-down, apriori evaluation, with which, stakeholders can assess standard essentiality (potential strengths) or invalidity (potential weaknesses) quickly and effectively before making complex, high-value decisions. PCC is designed to relieve initial burden of proof via an exhaustive listing of contextual semantic mapping as potential building blocks towards a litigation-ready work product. Stakeholders may then use the mapping to modify upon shortlisted PCC or identify other relevant materials in order to formulate strategy and achieve further purposes.

Click on references to view corresponding claim charts.


Non-Patent Literature        WIPO Prior Art        EP Prior Art        US Prior Art        CN Prior Art        JP Prior Art        KR Prior Art       
 
  Independent Claim

GroundReferenceOwner of the ReferenceTitleSemantic MappingBasisAnticipationChallenged Claims
12345678910111213141516
1

US20030200089A1

(Kenichiro Nakagawa, 2003)
(Original Assignee) Canon Inc     

(Current Assignee)
Canon Inc
Speech recognition apparatus and method, and program speech recognition method speech recognition method

speech accelerator speech recognition means

speech recognition circuit digital watermark

audio time frame, time frame speech data

control means said input

35 U.S.C. 103(a)

35 U.S.C. 102(b)
teaches a method for controlling a system especially an electrical andor electronic system comprising storing control…

discloses a system where a remote control is able to control the volume of at least one device first mode and also used…

discloses generating loading instructions but does not specifically disclose doing so after scanning of a package…

teaches wherein the multimodal application further comprises a plurality of elements at least one of the elements…
XXXXXXXXXXXXXXXX
2

JP2004086150A

(▲高▼見 雅之, 2004)
(Original Assignee) Denso Corp; 株式会社デンソー     音声制御装置 next feature 備えること

word identification 識別手段

audio signal データ

feature calculation 前記一

35 U.S.C. 103(a)

35 U.S.C. 102(b)
teaches a private branch exchange extension lines and station lines as well as speech terminals see elements…

discloses the system comprising a speech recognition device configured to interpret the continuous or interrupted flow…

teaches wherein each state is associated with a context wherein there is one context for each state…

discloses a speechenhanced device with partial vocabularies for consumer audiovideo functionality of a CD player…
XXXXXXX
3

JP2004170765A

(Hiroaki Ogawa, 2004)
(Original Assignee) Sony Corp; ソニー株式会社     音声処理装置および方法、記録媒体並びにプログラム speech accelerator 音声処理方法

search stage processing 前記選択手段

next feature 備えること

result memory 処理結果

word identification 他の情報

35 U.S.C. 103(a)

35 U.S.C. 102(b)
discloses the mobile computing device multimodal device is implemented as a multimodal device ROSE abstract…

teaches a method comprising a step of re recognizing said speech from said user when a part of said correct values is…

teaches displaying a list of possible words to the user wherein each possible word is having an associated symbol…

teaches a method wherein said displaying device is a touch screen see the pen…
XXXX
4

CN1420486A

(李恒舜, 2003)
(Original Assignee) Motorola Inc     

(Current Assignee)
Motorola Solutions Inc
基于决策树的语音辨别 interrupt signal 一个副本

distance results 间隔的

feature calculation 频谱特

35 U.S.C. 103(a)

35 U.S.C. 102(b)
teaches wherein performing a partial action comprises starting an application on the electronic device…

discloses a robot apparatus with a vocal interactive function comprising a microphone for collecting a vocal input…

teaches detecting a time based offset location of a received content corresponding to an audio fingerprint ie…
XXX
5

US20030200085A1

(Patrick Nguyen, 2003)
(Original Assignee) Individual     

(Current Assignee)
Sovereign Peak Ventures LLC
Pattern matching for large vocabulary speech recognition systems word identification searching operation

second processor, dynamic scheduling second processor, processing power

front end search algorithm

first processor first processor

35 U.S.C. 103(a)

35 U.S.C. 102(e)
discloses every limitation claimed as applied above in claim…

discloses the features indicated above as evidenced by the fact that one of ordinary skill in the art would clearly…

discloses a method for expanding vocabulary but does not specifically teach using a search engine…

discloses a method comprising finding the text data in an internal or external telecommunications network internet using…
XXXXXXXXX
6

KR20020095731A

(김창민, 2002)
(Original Assignee) 주식회사 엑스텔테크놀러지     음성특징 추출장치 audio time frame 음성구간

audio front end 청각의

35 U.S.C. 103(a)

35 U.S.C. 102(b)
discloses employing word spotting techniques and similarity template matching in keyword recognition…

discloses user independent speech recognition that assigns probabilities to individual sounds in a realtime fashion…

teaches wherein a match in said finding step is deemed to have occurred if the of differences between the selected…

teaches a method of measuring confidence of speech recognition in a speech recognizer the method comprising detecting…
XXXXX
7

JP2000293191A

(Hiroki Yamamoto, 2000)
(Original Assignee) Canon Inc; キヤノン株式会社     音声認識装置及び音声認識方法並びにその方法に用いられる木構造辞書の作成方法 next feature 更に有すること, 備えること

result memory 認識結果

processor to identify words 読メモリ

35 U.S.C. 103(a)

35 U.S.C. 102(b)
discloses making the transition includes fading between one visual output and another visual output…

discloses wherein the modifying including enlarging the video frame…

discloses a control system and method for controlling a plurality of medical devices such as ultrasound system intercom…

teaches that the end user can speak an utterance into the terminal…
XXX
8

GB2333172A

(Tony Robinson, 1999)
(Original Assignee) SoftSound Ltd     

(Current Assignee)
SoftSound Ltd
Recognition audio signal, digital audio signal temporal alignment

calculating means calculating means

calculating distances determining means

audio time frame, time frame speech data

XXXXXX
9

US5878392A

(Peter Meyer, 1999)
(Original Assignee) US Philips Corp     

(Current Assignee)
US Philips Corp
Speech recognition using recursive time-domain high-pass filtering of spectral feature vectors digital audio spectral component

calculating circuit generating means

audio signal window function

feature calculation pass filtering

result memory speech signal

time frame time frame

35 U.S.C. 103(a)

35 U.S.C. 102(b)
describes and make obvious those limitations as indicated there…

describes processing a voice message as the embodiment for stored audio data…

teaches detecting a time based offset location of a received content corresponding to an audio fingerprint ie…

discloses a system for topic discrimination using posterior probability scores or con dence scores such that topic…
XXXXXXXX
10

EP0780828A2

(Mazin G. Rahim, 1997)
(Original Assignee) AT&T Corp     

(Current Assignee)
AT&T Corp
Method and system for performing speech recognition speech recognition circuit, word identification recognizing speech

digital audio, digital audio signal domain converter, said signals

calculating means, speech recognition method following steps

next feature vector filtered signal

35 U.S.C. 103(a)

35 U.S.C. 102(b)
teaches that when the background noise spectrum estimate has been calculated the updating of the background noise…

teaches that the energy contents of a signal section that are referred to an energy threshold called ET are evaluated…

teaches about the computer aided telephony server CTI server includes the digital signal processor…

discloses the third period defined to include at least the sum of the first period and the second period setting of the…
XXXXXXXXXXXXXXXX
11

EP1093112A2

(Mazin G. Rahim, 2001)
(Original Assignee) AT&T Corp     

(Current Assignee)
AT&T Corp
A method for generating speech feature signals and an apparatus for carrying through this method audio signal, digital audio signal domain representation, domain converter

calculating means, speech recognition method following steps

next feature vector filtered signal

word identification LPC coefficient

35 U.S.C. 103(a)

35 U.S.C. 102(b)
teaches that when the background noise spectrum estimate has been calculated the updating of the background noise…

teaches that the energy contents of a signal section that are referred to an energy threshold called ET are evaluated…

teaches about the computer aided telephony server CTI server includes the digital signal processor…

discloses the third period defined to include at least the sum of the first period and the second period setting of the…
XXXXXXXX
12

US5881312A

(Carole Dulong, 1999)
(Original Assignee) Intel Corp     

(Current Assignee)
Intel Corp
Memory transfer apparatus and method useful within a pattern recognition system result memory performing comparison

comprising dynamic scheduling control function

35 U.S.C. 103(a)

35 U.S.C. 102(b)
discloses the features indicated above as evidenced by the fact that one of ordinary skill in the art would clearly…

discloses a method of performing a search originating from a mobile device eg portable wireless device…

discloses a system to detect a target in an image based on extracting different features in an image normalizing each of…

teaches a plurality of tasks which are disclosed as topics can be retrieved as a new task based on the user s action…
XX
13

JPH08110793A

(Alejandro Acero, 1996)
(Original Assignee) Microsoft Corp; マイクロソフト コーポレイション     特性ベクトルの前端正規化による音声認識の改良方法及びシステム search stage, search stage processing 受け取る段階

speech recognition method システム

35 U.S.C. 103(a)

35 U.S.C. 102(b)
teaches the claimed determining a largest common substring…

describes and make obvious those limitations as indicated there…

discloses tagging a place in an audio media in order to note a particular time at which a word appears in the media…

describes processing a voice message as the embodiment for stored audio data…
XXXXXXXX
14

CN1151218A

(沙-平·托马斯·王, 1997)
(Original Assignee) Motorola Inc     

(Current Assignee)
Motorola Solutions Inc
用于语音识别的神经网络的训练方法 speech recognition circuit, speech recognition method 用于识别, 语音识别

result memory 上述一个

35 U.S.C. 103(a)

35 U.S.C. 102(b)
describes the claimed limitations as a whole recognizable to one versed in the art as the embodiment for processing…

teaches visual information including identification of a voice shortcut for each service in a subset of services…

discloses conversion step executed by accessing a stored phonemegrapheme assignment table at col…

teaches a wireless device that allows the user to scroll through various menus and select various menu items through…
XXXXXXXXXXXXXXXX
15

EP0627726A1

(Takao C/O Nec Corporation Watanabe, 1994)
(Original Assignee) NEC Corp     

(Current Assignee)
NEC Corp
Pattern recognition with a tree structure used for reference pattern feature vectors or for HMM feature vector recognition method

control means, calculating circuit calculating step, control means

35 U.S.C. 103(a)

35 U.S.C. 102(b)
describes and make obvious those limitations as indicated there…

describes processing a voice message as the embodiment for stored audio data…

discloses a method of identifying among a plurality of audio files in digital format generated by machine…

teaches a method for constructing model for speech recognition a speaker independent speech recognition model is…
XXXXXXXX
16

JPH07248790A

(Ryosuke Hamazaki, 1995)
(Original Assignee) Fujitsu Ltd; 富士通株式会社     音声認識システム next feature 備えること

distance results, result memory 検出結果

speech recognition circuit 音声信号

calculating circuit 演算処理

35 U.S.C. 103(a)

35 U.S.C. 102(b)
discloses listing candidate words from the core lexicon and candidate words from the extended lexicon…

teaches a memory storing a plurality of programs including a telephonecalling program having a predetermined message…

teaches wherein the detecting the voice input comprises detecting a trigger signal by a manipulation of a…

discloses making the transition includes fading between one visual output and another visual output…
XXXXXXXXXXXXXXX
17

JPH06348292A

(Takao Watanabe, 1994)
(Original Assignee) Nec Corp; 日本電気株式会社     音声認識システム speech recognition method フレームベクトル, システム

distance calculation, calculating distances の距離

35 U.S.C. 103(a)

35 U.S.C. 102(b)
describes and make obvious those limitations as indicated there…

describes processing a voice message as the embodiment for stored audio data…

discloses a method of identifying among a plurality of audio files in digital format generated by machine…

teaches a method for constructing model for speech recognition a speaker independent speech recognition model is…
XXX
18

EP1400814A2

(Amada Tadashi, 2004)
(Original Assignee) Toshiba Corp     

(Current Assignee)
Toshiba Corp
Directional setting apparatus, directional setting system, directional setting method and directional setting program distance results detection period

calculating means signals output

35 U.S.C. 103(a)

35 U.S.C. 102(b)
teaches a system comprising a memory that stores instructions a processor that executes the instructions to perform…

discloses a method of calibrating a sound utilizing a microphone array…

teaches wherein the adder calculates the added value based on the input signals and the subtracter calculates the…
XX
19

US20040054532A1

(Dieter Staiger, 2004)
(Original Assignee) International Business Machines Corp     

(Current Assignee)
Nuance Communications Inc
Method and processor system for processing of an audio signal accelerator signals control signal

calculating circuit, control means control means

audio signal, speech accelerator audio signal

XXXXXXXXXX
20

US20040117189A1

(Ian Bennett, 2004)
(Original Assignee) Bennett Ian M.     

(Current Assignee)
Nuance Communications Inc
Query engine for processing voice based queries including semantic decoding digital audio, digital audio signal server architecture

audio time frame, time frame speech data

35 U.S.C. 103(a)

35 U.S.C. 102(b)
teaches wherein said interjected ideas are generated or selected based on natural language and reasoning techniques…

discloses a method further comprising before creating the term clusters reducing dimensionality of the term vectors…

teaches the invention substantially as claimed as discussed above in claim…

teaches a method for creating context specific document questions comprising a obtaining an input sentence…
XXXXXX
21

US20040083092A1

(Luis Valles, 2004)
(Original Assignee) Valles Luis Calixto     

(Current Assignee)
Gyrus Logic Inc
Apparatus and methods for developing conversational applications calculating distances calculated distance

distance calculation second program

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
discloses rearranging the order of claims via adding a claim and properly imbedding the added claim…

discloses a method of preparing a patent application comprising the steps of numbering and rearranging the order of…

discloses nding each text strings contains at least one of the query terms to generate an abstract column…

teaches the use of several thresholds as a means to determine a relevancy score of a wordphrase wherein a double…
XX
22

CN1474379A

(小林载, 2004)
(Original Assignee) Pioneer Corp     

(Current Assignee)
Pioneer Corp
语音识别/响应系统、语音/识别响应程序及其记录介质 speech recognition circuit 语音识别, 识别用户

speech accelerator 语音输入

acoustic state 通过将

35 U.S.C. 103(a)

35 U.S.C. 102(a)
discloses a method comprising determining whether a mobile computing device…

discloses an electronic device that is connected to an external electronic device and a controller for controlling the…

teaches identifying at least one object displayed in the active state image of the target computer program which may…

discloses executing the trigger word detection routine continually until it is deactivated…
XXXXXXXXXXXXXXXX
23

US20040148170A1

(Alejandro Acero, 2004)
(Original Assignee) Microsoft Corp     

(Current Assignee)
Microsoft Technology Licensing LLC
Statistical classifiers for spoken language understanding and command/control scenarios speech accelerator statistical language

front ends search service

feature vector feature vector

result memory speech signal

XXXXXXXX
24

US20040078202A1

(Shin Kamiya, 2004)
(Original Assignee) Sharp Corp     

(Current Assignee)
Sharp Corp
Speech input communication system, user terminal and center system speech accelerator voice recognition

calculating circuit, control means control means

XXXXXX
25

JP2004234273A

(Toshihiro Kujirai, 2004)
(Original Assignee) Hitachi Ltd; 株式会社日立製作所     対話型端末装置及び対話アプリケーション提供方法 result memory 認識結果

audio time frame, time frame ワーク

next feature 基づき

35 U.S.C. 103(a)

35 U.S.C. 102(e)

35 U.S.C. 102(b)
discloses the method of automatically correcting an input location including a typographical mistake according to claim…

teaches modifying an itinerary taking into account user preferences…

discloses an information providing method for providing information to an occupant of a vehicle the method comprising…

discloses that it is known to determine that a user has requested more information regarding a particular interactive…
XXXXXXX
26

JP2004226881A

(Takashi Matsuda, 2004)
(Original Assignee) Casio Comput Co Ltd; カシオ計算機株式会社     会話システム及び会話処理プログラム distance results, result memory 判定結果

first processor メモリ

feature vector 前記発

XXXXXXXXXX
27

US20040128137A1

(William Bush, 2004)
(Original Assignee) Bush William Stuart; Roura Carlos Ferdinand     Hands-free, voice-operated remote control transmitter speech recognition circuit speech recognition circuit

digital audio, digital audio signal voice signals

distance calculation control unit

35 U.S.C. 103(a)

35 U.S.C. 102(b)
discloses purchasing an itemservice through an electronic device via voice command…

teaches a method for controlling a system especially an electrical andor electronic system comprising storing control…

discloses generating loading instructions but does not specifically disclose doing so after scanning of a package…

discloses a presentation audio system comprising a at least one IR audiovideocontroller wherein said IR audiovideo of…
XXXXXXXXXXXXXXX
28

US20040138890A1

(James Ferrans, 2004)
(Original Assignee) Motorola Inc     

(Current Assignee)
Google Technology Holdings LLC
Voice browser dialog enabler for a communication system speech recognition circuit, word identification recognizing speech

digital audio, digital audio signal subsequent input

35 U.S.C. 103(a)

35 U.S.C. 102(e)
discloses that a translating step comprises using speech recognition in a VXML gateway to convert spoken words of the…

teaches wherein the means for maintaining the recognized result includes means for indicating that the recognized…

teaches co articulation that is the identification of phrases such as verb or noun phrases with the ability to isolate…

teaches a method of integrating conversational speech into a multimodal…
XXXXXXXXXXXXXXX
29

JP2004212641A

(Masahide Arisei, 2004)
(Original Assignee) Toshiba Corp; 株式会社東芝     音声入力システム及び音声入力システムを備えた端末装置 speech recognition circuit 音声信号

result memory の結果

XXXXXXXXXXXXXX
30

US20040111264A1

(Zhong-Hua Wang, 2004)
(Original Assignee) International Business Machines Corp     

(Current Assignee)
Nuance Communications Inc
Name entity extraction using language models calculating means, speech recognition method following steps

second processor start node

XXXX
31

JP2004153732A

(Yasuhiko Katsuki, 2004)
(Original Assignee) Toshiba Eng Co Ltd; 東芝エンジニアリング株式会社     介護施設監視システム speech recognition method システム

search stage code LAN

XXX
32

US20030110033A1

(Hamid Sheikhzadeh-Nadjar, 2003)
(Original Assignee) Dspfactory Ltd     

(Current Assignee)
AMI Semiconductor Inc
Method and system for real-time speech recognition feature vector, search stage to identify words Discrete Cosine Transform

speech recognition circuit, word identification recognizing speech

second processor, dynamic scheduling second processor

control means calculating step

first processor first processor

interrupt signal second buffer

XXXXXXXXXXXXXXXX
33

US20030050783A1

(Shinichi Yoshizawa, 2003)
(Original Assignee) Individual     

(Current Assignee)
Panasonic Holdings Corp
Terminal device, server device and speech recognition method speech recognition method speech recognition method

speech accelerator speech recognition means

calculating distances determining means

35 U.S.C. 103(a)

35 U.S.C. 102(e)
discloses that the inventive subject matter may be implemented in various speech recognition systems andor apparatus…

discloses wherein the speech encoded in the audio data was detected by the client device and wherein the information…

discloses a system comprising at least one processor device and one or more instructions that when implemented in the at…

discloses presenting an indication of readiness to complete further portion of speechfacilitated transaction col…
XXXX
34

JP2004096520A

(Shunji Muraoka, 2004)
(Original Assignee) Hosiden Corp; ホシデン株式会社     音声認識リモコン accelerator signals アクティブ

speech recognition circuit 音声信号

audio signal データ

XXXXXXXXXXXXXXXX
35

US20030046073A1

(Shinsuke Mori, 2003)
(Original Assignee) International Business Machines Corp     

(Current Assignee)
Nuance Communications Inc
Word predicting method, voice recognition method, and voice recognition apparatus and program using the same methods speech recognition method probability distribution

feature vector recognition method

speech accelerator voice recognition, word string

XXXXXX
36

EP1423846A1

(Yoav Degani, 2004)
(Original Assignee) VoiceSense Ltd     

(Current Assignee)
VoiceSense Ltd
Method and apparatus for speech analysis calculating circuit cellular telephone

front end, front ends external output

speech recognition method speech segments

speech recognition circuit voice segment

speech accelerator, accelerator signals voice channel, equal length

result memory speech signal

first processor sampled data

word identification said unit

XXXXXXXXXXXXXXXX
37

WO2004015686A1

(Ralph Schleifer, 2004)
(Original Assignee) Telefonaktiebolaget Lm Ericsson (Publ)     Method for automatic speech recognition feature calculation keyword model

first processor memory part

35 U.S.C. 103(a)

35 U.S.C. 102(b)
teaches a device which teaches a system corresponding to the method of claim…

teaches a method comprising identifying one or more options for a compounded word in a source language each split…

teaches wherein the task execution is offered as a pay for use service…

teaches the use of user input to determine when to decompose a compound complex word for an application in speech…
XXXXX
38

JP2004012653A

(Takashi Akiyama, 2004)
(Original Assignee) Matsushita Electric Industrial Co Ltd     

(Current Assignee)
Panasonic Holdings Corp
音声認識システム、音声認識クライアント、音声認識サーバ、音声認識クライアントプログラムおよび音声認識サーバプログラム first processor, processor pursuant 前記音声認識システム

next feature 備えること

feature calculation 演算手段, 記憶装置

word identification 識別手段

next feature vector 音声識

XXXXXX
39

US20020099542A1

(John C. Mitchell, 2002)
(Original Assignee) AllVoice Computing PLC     

(Current Assignee)
ALLVOICE DEVELOPMENTS US LLC
Method and apparatus for processing the output of a speech recognition engine speech recognition method speech recognition method

speech accelerator, digital audio audio signal, audio data signal

frame dropping display control

interrupt signal processing data

calculating circuit, control means control means, said input

35 U.S.C. 103(a)

35 U.S.C. 102(b)
discloses the language processing and memory module of claim…

teaches a method for speech translation which describes a system for translating phrases in spoken language see…

teaches providing at least one input to a plurality of modules…

discloses a tactilefeedback device configured to present predetermined perception to a user…
XXXXXXXXX
40

CN1493071A

(丁华镇, 2004)
(Original Assignee) SUNWOO TECHNO Inc     

(Current Assignee)
SUNWOO TECHNO Inc
用于语音识别系统的语音命令鉴别器 word identification, speech recognition circuit 进行识别, 以识别

result memory 判定结果

35 U.S.C. 103(a) teaches wherein the first microphone of the twomicrophone array is an unidirectional microphone the second microphone…

teaches wherein the circuitry for generating weighted coefficients in a transform domain further comprises mask…

teaches a region which could be modeled by a matrix corresponding to a particular chromaticity range of LEDs with the…

teaches an adaptive filter comprising a filter circuit comprising a plurality of weighted filter elements each…
XXXXXXXXXXXXXXXX
41

GB2391679A

(Mark Catchpole, 2004)
(Original Assignee) Zentian Ltd     

(Current Assignee)
Zentian Ltd
Speech recognition circuit using parallel processors speech recognition method speech recognition method

front end search algorithm

calculating distances feature vectors

control means said input

XXXXXXXXX
42

CN1494711A

(˹���պ���H.����˹, 2004)
(Original Assignee) International Business Machines Corp     

(Current Assignee)
International Business Machines Corp
使用多模式输入进行多模式焦点检测,参考岐义解析和语气分类的系统和方法 calculating means, feature calculation 计算方法, 计算系统

digital audio 多个音频

speech recognition circuit 语音识别

to discard one 数据执行

35 U.S.C. 103(a)

35 U.S.C. 102(e)
teaches an orientation of a face or pose of a face as shown in fig…

discloses said response message includes a request for a rating of said core information therein see at least paragraph…

teaches the method wherein said receiving step comprises receiving a plurality of training analogue data sets and…

discloses a mobile terminal supporting a voice talk function…
XXXXXXXXXXXXXXX
43

EP1215658A2

(Stephen John Hinde, 2002)
(Original Assignee) Hewlett Packard Co     

(Current Assignee)
HP Inc
Visual activation of voice controlled apparatus speech accelerator speech recognition means

time frame same characteristic

digital audio, digital audio signal said time

35 U.S.C. 103(a) discloses activate the speech recognition based on the output of the image processing unit so only when the user is…XXXXXXX
44

US6721699B2

(Bo Xu, 2004)
(Original Assignee) Intel Corp     

(Current Assignee)
Intel Corp
Method and system of Chinese speech pitch extraction audio signal window function

result memory speech signal

time frame square root

XXXXXXX
45

EP1318505A1

(Shunji Mitsuyoshi, 2003)
(Original Assignee) A G I Inc; AGI Inc     

(Current Assignee)
A G I Inc ; AGI Inc
Emotion recognizing method, sensibility creating method, device, and software speech accelerator voice recognition

result memory receiving pieces

accelerator signals time intervals

audio front end signal input

35 U.S.C. 103(a)

35 U.S.C. 102(b)
discloses the extraction of features for context unitskey phrases that include speaker sideidentity…

discloses that the emotion change detecting unit calculates changesfluctuation in intonation on all words which would…

teaches applying a clustering algorithm to the depth map see column…

teaches an emotion detection system and a method for training the system comprising the extraction of prosody…
XXXXXX
46

US6594348B1

(Hans Bjurstrom, 2003)
(Original Assignee) Pipebeach AB     

(Current Assignee)
Hewlett Packard Development Co LP
Voice browser and a method at a voice browser calculating distances audio objects

distance results time window

35 U.S.C. 103(a)

35 U.S.C. 102(e)

35 U.S.C. 102(b)
discloses the methodsystem for evaluating a voice interaction between an agent and a customer utilizing speech…

teaches all aspects of the claimed invention set forth in the rejection of claim…

teaches selectively providing the plurality of interconnected speech enabled sites col…

discloses applying a set of incoming SMS message rules to incoming SMS message and applying a set of outgoing SMS…
XX
47

US20030036903A1

(Courtney Konopka, 2003)
(Original Assignee) Sony Corp     

(Current Assignee)
Sony Corp ; Sony Electronics Inc
Retraining and updating speech models for speech recognition second processor digital representation

speech recognition circuit, word identification recognizing speech

audio time frame, time frame speech data

result memory plural user

35 U.S.C. 103(a)

35 U.S.C. 102(b)
teaches a natural language processing apparatus which executes morphological analysis using connection cost…

discloses speech recognition device and speech recognition method title comprising executing noise adaptation processing…

teaches retraining and updating speech models for speech recognition and…

discloses system and method for facilitating user access to services with wireless device and voice command abstract…
XXXXXXXXXXXXXXXX
48

US20020161579A1

(Richard Saindon, 2002)
(Original Assignee) Speche Communications     

(Current Assignee)
COURTROOM CONNECT
Systems and methods for automated audio transcription, translation, and transfer control means comprises information

speech accelerator first language

35 U.S.C. 103(a)

35 U.S.C. 102(e)
teaches utilizing instant message technology which is performed on a local platform to access to services such as…

teaches a payment method where the service charge is redirected to the called party or even a third party very much…

teaches media translator for converting among speech signals and textual information…

discloses all the claimed features but does not disclose the advisory system as claimed in claim…
XX
49

US6483927B2

(Hugh L. Brunk, 2002)
(Original Assignee) Digimarc Corp     

(Current Assignee)
Digimarc Corp
Synchronizing readers of hidden auxiliary data in quantization-based data hiding schemes speech accelerator, audio signal message signal, audio signal

frame dropping video signal

XXXXXXXX
50

US6785647B2

(William R. Hutchison, 2004)
(Original Assignee) William R. Hutchison     

(Current Assignee)
ME ME ME Inc ; Sensory
Speech recognition system with network accessible speech processing resources speech recognition circuit, word identification extraction process

to discard one third interface

XXXXXXXXXXXXXXX
51

US20010047258A1

(Anthony Rodrigo, 2001)
(Original Assignee) Nokia Oyj     

(Current Assignee)
Nokia Technologies Oy
Method and system of configuring a speech recognition system speech accelerator speech recognition means

calculating distances determining means

35 U.S.C. 103(a)

35 U.S.C. 102(e)

35 U.S.C. 102(b)
teaches generating a list of unacceptable remote locations and wirelessly transmitting the electronic information to…

teaches a method for providing performance features for mobile subscribers via a communications network…

discloses wherein said key event is selected from the group consisting of key up key down and accept…

discloses collecting data from different sources and caching data content in the cache eg…
XX
52

US20020091527A1

(Shyue-Chin Shiau, 2002)
(Original Assignee) VerbalTek Inc     

(Current Assignee)
VerbalTek Inc
Distributed speech recognition server system for mobile internet/intranet communication speech accelerator speech recognition means

digital audio, digital audio signal voice signals

audio signal mobile phones

data flow data flow

XXXXXXXX
53

US20020077830A1

(Riku Suomela, 2002)
(Original Assignee) Nokia Oyj     

(Current Assignee)
Nokia Oyj
Method for activating context sensitive speech recognition in a terminal dynamic scheduling, comprising dynamic scheduling secondary input

first processor one second

control means said input

35 U.S.C. 103(a)

35 U.S.C. 102(e)
discloses a method and device hereinafter referenced as a method for controlling an electronic apparatus the method…

discloses a method wherein the text comprises a plurality of words and wherein the deleting the selected text comprises…

teaches determining whether a sound input corresponds to a particular user see…

discloses a presentation audio system comprising a at least one IR audiovideocontroller wherein said IR audiovideo of…
XXXXX
54

US6415256B1

(Richard Joseph Ditzik, 2002)
(Original Assignee) Richard Joseph Ditzik     

(Current Assignee)
NETAIRUS TECHNOLOGIES LLC
Integrated handwriting and speed recognition systems distance results recognition accuracy

feature vector, feature calculation said display screen

dynamic scheduling, digital audio more display

control means, result memory writing data, said input

audio time frame, time frame speech data

word identification more port

35 U.S.C. 103(a)

35 U.S.C. 102(b)
teaches adjusting the speaker volume andor the microphone gain to provide optimum audio signal levels for that…

teaches the method and computer program for converting the audio portion of an AV signal into captions that recognizes…

discloses A phoneme generator which parses the speech portion into phonemes in accordance with a speech model speech…

teaches having holes in the touch sensitive touchpad through which mechanical switches can be actuated…
XXXXXXXXXX
55

EP1107227A2

(Yasuharu Asano, 2001)
(Original Assignee) Sony Corp     

(Current Assignee)
Sony Corp
Voice processing data flow recording program

speech accelerator voice recognition

calculating circuit, control means control means, input voice

35 U.S.C. 103(a)

35 U.S.C. 102(b)
teaches wherein said arrangement for applying at least one emotionbased paradigm is adapted to alter at least one of…

discloses that text delivered in a chat environment can be animated column…

teaches applying emotion values to concatenative synthesis but does not particularly state whether the emotion values…

discloses wherein said dynamically indicating further comprises identifying a portion of said audio message which is…
XXXXXXXX
56

CN1293428A

(刘加, 2001)
(Original Assignee) Tsinghua University     

(Current Assignee)
Tsinghua University ; Qinghua University ; QINGHUA UNIV
基于语音识别的信息校核方法 speech accelerator 语音识别方法, 语音编码

calculating circuit, calculating means 计算机中

speech recognition circuit, distance results 识别结果

audio signal, audio time frame 音命令

XXXXXXXXXXXXXXXX
57

EP1100073A2

(Mototsugu Sony Corporation Abe, 2001)
(Original Assignee) Sony Corp     

(Current Assignee)
Sony Corp
Classifying audio signals for later data retrieval calculating circuit generating means

distance results n points

XXXX
58

EP1189202A1

(Krzysztof Advanced Technology Center Marasek, 2002)
(Original Assignee) Sony International Europe GmbH     

(Current Assignee)
Sony Deutschland GmbH
Duration models for speech recognition control means comprises information

result memory said determination, speech signal

calculating means, speech recognition method following steps

feature vector feature vector

search stage processing time alignment

35 U.S.C. 103(a)

35 U.S.C. 102(b)
teaches that the artificial neural network comprises a single hidden layer…

discloses wherein a correlation coefficient is determined from standard deviations and covariance of a plurality of…

teaches further comprising multiple confidence values each corresponding to one of the multiple speech recognizers for…

discloses a receiving during a rst time interval image data associated with a path of motion of a dynamic body the image…
XXXXXXXXX
59

US6718308B1

(Daniel L. Nolting, 2004)
(Original Assignee) Daniel L. Nolting     Media presentation system controlled by voice to text commands calculating means desired characteristic

speech accelerator voice recognition

first processor first direction

XXXXXX
60

US6757718B1

(Christine Halverson, 2004)
(Original Assignee) SRI International Inc     

(Current Assignee)
IPA TECHNOLOGIES INC.
Mobile navigation of network-based electronic information using spoken input calculating circuit cellular telephone

calculating means control device

35 U.S.C. 103(a)

35 U.S.C. 102(e)
discloses a system for sending and receiving content from at least one server connected to a network a computer and…

teaches the use of the user s supplemental input request processing logic…

teaches system and method for agentbased navigation in a speechbased data navigation system where when a spoken…

teaches the step of providing voice mail services to the authenticated caller column…
XXXX
61

US6453252B1

(Jean Laroche, 2002)
(Original Assignee) Creative Technology Ltd     

(Current Assignee)
Creative Technology Ltd
Process for identifying audio content digital audio signal digital audio signal

second processor selected frequency

next feature desired frequency

next feature vector filtered signal

XXX
62

GB2358253A

(Ho Jinyama, 2001)
(Original Assignee) KYUSHU KYOHAN Co Ltd     

(Current Assignee)
KYUSHU KYOHAN Co Ltd
Signal identification device using genetic algorithm and on-line identification system calculating means calculating means

word identification said signal

XX
63

US6629073B1

(Hsiao-Wuen Hon, 2003)
(Original Assignee) Microsoft Corp     

(Current Assignee)
Microsoft Technology Licensing LLC
Speech recognition method and apparatus utilizing multi-unit models speech recognition circuit, word identification recognizing speech

calculating means training data

result memory speech signal

35 U.S.C. 103(a) discloses in a traditional speech recognition system conversation in phone calls was recorded and was used for…XXXXXXXXXXXXXXX
64

CN1268732A

(刘加, 2000)
(Original Assignee) Tsinghua University     

(Current Assignee)
Tsinghua University ; Qinghua University ; QINGHUA UNIV
基于语音识别专用芯片的特定人语音识别、语音回放方法 speech recognition circuit, distance results 识别结果, 作为结果

speech accelerator 语音编码

audio signal, audio time frame 音命令

calculating circuit 数计算

XXXXXXXXXXXXXXXX
65

US6732142B1

(Cary Lee Bates, 2004)
(Original Assignee) International Business Machines Corp     

(Current Assignee)
Google LLC
Method and apparatus for audible presentation of web page content accelerator signals time intervals

digital audio, digital audio signal said time

XX
66

US6687341B1

(Robert A. Koch, 2004)
(Original Assignee) BellSouth Intellectual Property Corp     

(Current Assignee)
Open Invention Network LLC
Network and method for the specification and delivery of customized information content via a telephone interface distance calculation predetermined destination

interrupt signal incoming telephone call

35 U.S.C. 103(a)

35 U.S.C. 102(e)
teaches converting textbased messages such as email messages or facsimile messages stored in the IMAP message store…

teaches the IRS also includes an audiobased interface interpreter for converting the audiobased interface document to…

discloses a system for sending and receiving content from at least one server connected to a network a computer and…

teaches of a system of a network based auto attendant system with a provider network using enterprise voice directory…
XX
67

US6349132B1

(Darren L. Wesemann, 2002)
(Original Assignee) Talk2 Tech Inc     

(Current Assignee)
Talk2com ; Intellisync LLC
Voice interface for electronic documents next feature vector hierarchical relationship

calculating means third part

35 U.S.C. 103(a)

35 U.S.C. 102(e)
teaches selecting portions of the visual information and audible information for inclusion into a multimedia…

discloses the claimed aspect of the received electronic visual content is received as a result of a document scanning…

describes a different descriptive material than the claim then the descriptive material is nonfunctional and will not be…

discloses our solution is provide a special servergateway for broadband networks such as cable television networks and…
XX
68

US6442519B1

(Dimitri Kanevsky, 2002)
(Original Assignee) International Business Machines Corp     

(Current Assignee)
Nuance Communications Inc
Speaker model adaptation via network of similar users speech recognition method speech recognition method

speech recognition circuit, word identification recognizing speech

frame dropping noise generation, sound generation

accelerator signals particular user

XXXXXXXXXXXXXXXX
69

JP2001117583A

(Hideki Kishi, 2001)
(Original Assignee) Sony Corp; ソニー株式会社     音声認識装置および音声認識方法、並びに記録媒体 next feature 備えること

result memory 処理結果, 認識結果

audio signal データ

XXXXXXXX
70

US6539353B1

(Li Jiang, 2003)
(Original Assignee) Microsoft Corp     

(Current Assignee)
Microsoft Technology Licensing LLC
Confidence measures using sub-word-dependent weighting of sub-word confidence scores for robust speech recognition speech accelerator, speech recognition method discriminative training

dynamic scheduling different sub

digital audio digital value

result memory speech signal

audio signal noise signal

35 U.S.C. 103(a)

35 U.S.C. 102(b)
discloses phrasebased dialogue modeling method for producing a low perplexity recognition grammar from a conventional…

discloses a method and apparatus for recognizing speech comprising a the steps of receiving a speech phrase…

discloses wherein the discriminant function calculation component evaluates the likelihood of the hypothesized speech…

discloses a speech processing device comprising a prior probability table as claimed in the instant claim the function…
XXXXXXXXX
71

US6542866B1

(Li Jiang, 2003)
(Original Assignee) Microsoft Corp     

(Current Assignee)
Microsoft Technology Licensing LLC
Speech recognition method and apparatus utilizing multiple feature streams speech accelerator second language, first language

digital audio digital value

result memory speech signal

data flow one path

XXXXX
72

US6778557B1

(Yoshinori Yuki, 2004)
(Original Assignee) Toshiba Corp     

(Current Assignee)
Toshiba Corp
Point-to-multipoint communication system control means comprises information

interrupt signal second notification

distance results, distance calculation measurement result

calculating circuit data storage means

time frame first transmit

35 U.S.C. 103(a)

35 U.S.C. 102(b)
teaches an optical line terminal connected to a plurality of optical network units reference numeral…

discloses an information processing device forming a redundant system by an operation with a plurality of information…

teaches an enable signal for minislot payload and receiving mini slot payload bytes and sending them upstream on the…

discloses a method for enhancing network transmission in a communications system…
XXXXXXXXX
73

US6446039B1

(Yasunaga Miyazawa, 2002)
(Original Assignee) Seiko Epson Corp     

(Current Assignee)
Seiko Epson Corp
Speech recognition method, speech recognition device, and recording medium on which is recorded a speech recognition processing program speech recognition method speech recognition method

time frame when r

35 U.S.C. 103(a) teaches a prior art candidate confirmation device which differed from the claimed device by the substitution of one…

teaches a limit in presenting candidates before giving up and asking again altogether PAGE…

teaches segmenting the received speech signal into a plurality of sections speech input divided into time frames…
XXXXX
74

US6366578B1

(Christopher Sean Johnson, 2002)
(Original Assignee) Vertical Networks Inc     

(Current Assignee)
RPX Corp ; Vertical Networks Inc
Systems and methods for multiple mode voice and data communications using intelligently bridged TDM and packet buses and methods for implementing language capabilities using the same accelerator signals particular user

comprising dynamic scheduling allocation rule

35 U.S.C. 103(a)

35 U.S.C. 102(e)

35 U.S.C. 102(b)
discloses transferring a call that is still ringing by performing a blind transfer with the transfer button paragraph…

teaches a multiple mode voice and data communication system with language capabilities where backup communications…

discloses a management module operable to assign at least one time slot of a time division multiplexing TDM bus to…

teaches that a voice mail is purged permanently deleted from memory…
XX
75

JP2000322078A

(Osamu Hattori, 2000)
(Original Assignee) Sumitomo Electric Ind Ltd; 住友電気工業株式会社     車載型音声認識装置 distance results, result memory 判定結果

first processor メモリ

XXXXX
76

JP2000310999A

(Hideyuki Yamagishi, 2000)
(Original Assignee) Asahi Chem Ind Co Ltd; 旭化成工業株式会社     設備制御システム dynamic scheduling 処理回路と

next feature 基づき

XX
77

GB2336974A

(Leon Lumelsky, 1999)
(Original Assignee) International Business Machines Corp     

(Current Assignee)
International Business Machines Corp
Singlecast interactive radio system speech accelerator speech recognition means

calculating circuit generating means

first processor data repository

accelerator signals control signal

digital audio, digital audio signal voice signals

XXXXXXXXX
78

US6718015B1

(Viktors Berstis, 2004)
(Original Assignee) International Business Machines Corp     

(Current Assignee)
Google LLC
Remote web page reader first processor personal communication

front end one link

XXXXXXXX
79

US20020006126A1

(Gregory Johnson, 2002)
(Original Assignee) Motorola Inc     

(Current Assignee)
Motorola Solutions Inc
Methods and systems for accessing information from an information source speech accelerator voice recognition

feature calculation incoming call

35 U.S.C. 103(a)

35 U.S.C. 102(e)

35 U.S.C. 102(b)
discloses providing voice mail data in response to the at least one voice command…

teaches scanning the voicemail message using a voice recognition process to obtain the requested…

discloses a system that captures the interaction between a sales agent and a customer and based on the context of this…

teaches a conversational browser voice browser of a computing device that provides a conversational user interface to…
XX
80

US6138095A

(Sunil K. Gupta, 2000)
(Original Assignee) Lucent Technologies Inc     

(Current Assignee)
Nokia of America Corp
Speech recognition speech recognition circuit, word identification recognizing speech

calculating distances feature vectors

result memory speech signal

control means said input

XXXXXXXXXXXXXXX
81

EP0901000A2

(Taizo Asaoka, 1999)
(Original Assignee) Toyota Motor Corp     

(Current Assignee)
Toyota Motor Corp
Message processing system and method for processing messages calculating circuit, search stage processing output timing

speech accelerator, speech recognition circuit second voice, first voice

35 U.S.C. 103(a) discloses a lookup table or plot of parameters that is equivalent to an interference profile record to output an audio…

discloses an audio player including a pitch control for controlling a pitch of playing back the audio data see…

discloses modifying synthesis text based on a calculated read out time…
XXXXXXXXXXXXXXX
82

WO9857489A2

(James M. O'reilly, 1998)
(Original Assignee) Metalithic Systems, Inc.     Modular system for accelerating data searches and data stream operations result memory programmable logic device

interrupt signal processing data

XX
83

US6456965B1

(Suat Yeldener, 2002)
(Original Assignee) Texas Instruments Inc     

(Current Assignee)
Texas Instruments Inc
Multi-stage pitch and mixed voicing estimation for harmonic speech coders audio time frame, time frame time domain waveform

result memory speech signal

digital audio, digital audio signal said time

XXXXXXX
84

US6593956B1

(Steven L. Potts, 2003)
(Original Assignee) Polycom Inc     

(Current Assignee)
Polycom Inc
Locating an audio source calculating means, calculating distances positioning device

digital audio, audio signal video signals, audio sources

35 U.S.C. 103(a)

35 U.S.C. 102(e)
teaches that the metadata store comprises a removable storage device connectable to the camera arrangement the image…

teaches that the data handling medium is a transmission medium telecommunication network…

teaches that an apparatus operable to apply a harsher data compression to portions of a captured image not detected to…

discloses wherein greater importance is given to the approximate location within the image that the user is gazing at in…
XXXXXXX
85

US6757652B1

(Michael Lund, 2004)
(Original Assignee) Koninklijke Philips Electronics NV     

(Current Assignee)
Koninklijke Philips NV
Multiple stage speech recognizer second processor second processors

first processor first processor

XXXX
86

US6081779A

(Stefan Besling, 2000)
(Original Assignee) US Philips Corp     

(Current Assignee)
US Philips Corp
Language model adaptation for automatic speech recognition first processor, distance results reference values

calculating means, speech recognition method following steps

result memory speech signal

35 U.S.C. 103(a)

35 U.S.C. 102(b)
discloses D using the rst language model in a speech recognition process to recognize the spoken audio stream and…

discloses determining a device s location based on docking context…

discloses the metadata further characterizes an audio codec applied when generating the sequence of speech features a…

teaches recognizing words that have observable and meaningful relationships…
XXXXXXXXX
87

US5983180A

(Anthony John Robinson, 1999)
(Original Assignee) SoftSound Ltd     

(Current Assignee)
Longsand Ltd
Recognition of sequential data using finite state sequence models organized in a tree structure audio signal, digital audio signal temporal alignment

calculating means calculating means

calculating distances determining means

audio time frame, time frame speech data

35 U.S.C. 103(a)

35 U.S.C. 102(b)
teaches modeling lists of recognizable words with lexical trees for speech recognition…

teaches a method for constructing model for speech recognition a speaker independent speech recognition model is…

discloses the method comprising scoring each spoken document comprises calculating a document score as a combination of…

discloses calculating matching scores for each item from the list of items and find a best match…
XXXXXX
88

US6353661B1

(John Edson Bailey, 2002)
(Original Assignee) Bailey, Iii John Edson     Network and communication access systems digital audio continuous fashion

calculating means display portion

search stage processing volume level

35 U.S.C. 103(a)

35 U.S.C. 102(e)

35 U.S.C. 102(b)
teaches all the claimed subject matters as discussed in claim…

teaches the status report transmitted from the mobile unit to the user interface unit according to one of SMTP POP…

teaches the memory storing the status report for a predefined length of time after the status report is transmitted to…

teaches a receiver for receiving positioning data from satellites allowing the processor to use the positioning data…
XXX
89

JPH11119791A

(Shinji Wakizaka, 1999)
(Original Assignee) Hitachi Ltd; 株式会社日立製作所; Hitachi Ulsi Systems Co Ltd; 株式会社日立超エル・エス・アイ・システムズ     音声感情認識システムおよび方法 result memory 認識結果

search stage to identify words, processor to identify words の変形

audio signal データ

XXXXXXX
90

US6101467A

(Heinrich Bartosik, 2000)
(Original Assignee) US Philips Corp     

(Current Assignee)
Nuance Communications Austria GmbH
Method of and system for recognizing a spoken text speech accelerator speech recognition means

calculating means conversion data

audio time frame, time frame speech data

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
teaches a voice user interface system for producing input to a computer and a program for execution on said computer a…

discloses the storage of information on user s preferences and habits in a user profile database as a future improvement…

teaches limited vocabulary word spotting low perplexity with a parallel network of subword models used to model the…

discloses an apparatus system and method implementing a general purpose computer…
XXXXXX
91

US6771743B1

(Nicholas David Butler, 2004)
(Original Assignee) International Business Machines Corp     

(Current Assignee)
Nuance Communications Inc
Voice processing system, method and computer program product having common source for internet world wide web pages and voice applications interrupt signal incoming telephone call

feature calculation incoming call

35 U.S.C. 103(a)

35 U.S.C. 102(e)

35 U.S.C. 102(b)
teaches wherein the lowlevel descriptor language is extensible markup language XML column…

discloses a data processing system implemented method for implementing a…

teaches all aspects of the claimed invention set forth in the rejection of claim…

teaches a method of preserving state for applications over a telephone interface using a voice application computer…
XX
92

US6038305A

(Alexander I. McAllister, 2000)
(Original Assignee) Bell Atlantic Network Services Inc     

(Current Assignee)
Google LLC
Personal dial tone service with personalized caller ID first processor control instructions

speech recognition method voice authentication

digital audio signal receiving signals

result memory speech signal

audio time frame, time frame speech data

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
teaches the method for identifying a particular caller according to claim…

teaches extracting speech information from the input speech…

teaches tone prompting for the recording of the recipient s response during playing an announcement and upon detecting…

discloses the method and tangible data storage medium of the claimed invention…
XXXXXXXXXX
93

CN1171592A

(黄学东, 1998)
(Original Assignee) Microsoft Corp     

(Current Assignee)
Microsoft Technology Licensing LLC
采用连续密度隐藏式马尔克夫模型的语音识别方法和系统 calculating means 计算机系

acoustic states 的状态

computing extra front frames 一个表

35 U.S.C. 103(a)

35 U.S.C. 102(b)
teaches speech recognition system with a mapping algorithm which is essentially a mapping of the speech signal to a…

teaches the system is bene cial in improving the recognition capability of a speech recognition system…

teaches wherein each subspace is represented by a codebook wherein the mixture models are indicated by an index to the…

teaches applying maximum mutual information estimation col…
XXX
94

JPH10282986A

(Tomohito Nakagawa, 1998)
(Original Assignee) Hitachi Ltd; 株式会社日立製作所     音声認識方法およびそのモデル設計方法 audio time frame, time frame 音声データ, ワーク

feature calculation, distance calculation 計算方法

speech recognition method システム

XXXXX
95

US6208638B1

(Jack Rieley, 2001)
(Original Assignee) j 2 Global Communications Inc     

(Current Assignee)
j 2 Global Communications Inc ; J2 Cloud Services LLC
Method and apparatus for transmission and retrieval of facsimile and audio messages over a circuit or packet switched network second processor digital representation

speech accelerator voice message

35 U.S.C. 103(a)

35 U.S.C. 102(e)

35 U.S.C. 102(b)
discloses a system which transmits messages from a of different platforms email messages may be sent over the internet…

discloses the invention substantially as described in claims…

discloses a recording medium in which a program for making a computer execute processing the processing comprising…

teaches wherein the lowlevel descriptor language is extensible markup language XML column…
XX
96

US6021181A

(Richard A. Miner, 2000)
(Original Assignee) Wildfire Communications Inc     

(Current Assignee)
Orange SA
Electronic voice mail message handling system word identification second command

speech accelerator voice message

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
teaches an apparatus for processing a call from a calling party calling party…

teaches providing callers with selected user accessory responses…

discloses user control voice activated commands such as removing names from a contact database as disclosed in column…

teaches that it was well known in the art to have receiving means which includes a voice recognition unit for…
XX
97

US6018710A

(Michael J. Wynblatt, 2000)
(Original Assignee) Siemens Corporate Research Inc     

(Current Assignee)
Siemens Corp
Web-based interactive radio environment: WIRE calculating circuit calculation means

calculating means other parameters

35 U.S.C. 103(a)

35 U.S.C. 102(e)
describes a different descriptive material than the claim then the descriptive material is nonfunctional and will not be…

discloses an apparatus system and method implementing a general purpose computer…

teaches that it is old and well know to recommend a nearby venues based on manually inputted arbitrary location…

teaches the system allows for adding to the contextspeci c grammars col…
XXXX
98

US5881134A

(Peter J. Foster, 1999)
(Original Assignee) Voice Control Systems Inc     

(Current Assignee)
Philips North America LLC
Intelligent call processing platform for home telephone system word identification second command

speech recognition circuit, speech accelerator first voice

35 U.S.C. 103(a)

35 U.S.C. 102(b)

35 U.S.C. 102(e)
teaches an apparatus for processing a call from a calling party calling party…

discloses user control voice activated commands such as removing names from a contact database as disclosed in column…

teaches a voice input device an output device a wireless interface for communicating across at least a wireless…

teaches all the subject matter claimed note see the rejection of claim…
XXXXXXXXXXXXXXX




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US20030200089A1

Filed: 2003-04-16     Issued: 2003-10-23

Speech recognition apparatus and method, and program

(Original Assignee) Canon Inc     (Current Assignee) Canon Inc

Kenichiro Nakagawa, Hiroki Yamamoto
US7979277B2
CLAIM 1
. A speech recognition circuit (digital watermark) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US20030200089A1
CLAIM 1
. A speech recognition apparatus for recognizing input speech , comprising : storage means for storing recognition vocabulary information for speech recognition ;
input means for inputting speech data (audio time frame, time frame) ;
read means for reading external data including vocabulary information ;
speech recognition means for making speech recognition of the speech data using the vocabulary information in the read external data , and the recognition vocabulary information ;
and output means for outputting a speech recognition result of said speech recognition means .

US20030200089A1
CLAIM 5
. The apparatus according to claim 3 , wherein the external data is an image which contains the vocabulary information generated by a digital watermark (speech recognition circuit) ing technique .

US7979277B2
CLAIM 2
. A speech recognition circuit (digital watermark) as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor .
US20030200089A1
CLAIM 5
. The apparatus according to claim 3 , wherein the external data is an image which contains the vocabulary information generated by a digital watermark (speech recognition circuit) ing technique .

US7979277B2
CLAIM 3
. A speech recognition circuit (digital watermark) as claimed in claim 1 , comprising dynamic scheduling whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
US20030200089A1
CLAIM 5
. The apparatus according to claim 3 , wherein the external data is an image which contains the vocabulary information generated by a digital watermark (speech recognition circuit) ing technique .

US7979277B2
CLAIM 4
. A speech recognition circuit (digital watermark) as claimed in claim 1 , wherein the first processor supports multi-threaded operation , and runs the search stage and front ends as separate threads .
US20030200089A1
CLAIM 5
. The apparatus according to claim 3 , wherein the external data is an image which contains the vocabulary information generated by a digital watermark (speech recognition circuit) ing technique .

US7979277B2
CLAIM 5
. A speech recognition circuit (digital watermark) as claimed in claim 1 , wherein the said calculating circuit is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
US20030200089A1
CLAIM 5
. The apparatus according to claim 3 , wherein the external data is an image which contains the vocabulary information generated by a digital watermark (speech recognition circuit) ing technique .

US7979277B2
CLAIM 6
. The speech recognition circuit (digital watermark) of claim 1 , comprising control means (said input) adapted to implement frame dropping , to discard one or more audio time frames .
US20030200089A1
CLAIM 5
. The apparatus according to claim 3 , wherein the external data is an image which contains the vocabulary information generated by a digital watermark (speech recognition circuit) ing technique .

US20030200089A1
CLAIM 7
. The apparatus according to claim 6 , wherein said management means deletes at least some items of the recognition vocabulary information on the basis of an instruction input from said input (control means) means .

US7979277B2
CLAIM 7
. The speech recognition circuit (digital watermark) of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal for a predetermined time frame (speech data) .
US20030200089A1
CLAIM 1
. A speech recognition apparatus for recognizing input speech , comprising : storage means for storing recognition vocabulary information for speech recognition ;
input means for inputting speech data (audio time frame, time frame) ;
read means for reading external data including vocabulary information ;
speech recognition means for making speech recognition of the speech data using the vocabulary information in the read external data , and the recognition vocabulary information ;
and output means for outputting a speech recognition result of said speech recognition means .

US20030200089A1
CLAIM 5
. The apparatus according to claim 3 , wherein the external data is an image which contains the vocabulary information generated by a digital watermark (speech recognition circuit) ing technique .

US7979277B2
CLAIM 8
. The speech recognition circuit (digital watermark) of claim 1 , wherein the processor is configured to divert to another task if the data flow stalls .
US20030200089A1
CLAIM 5
. The apparatus according to claim 3 , wherein the external data is an image which contains the vocabulary information generated by a digital watermark (speech recognition circuit) ing technique .

US7979277B2
CLAIM 9
. The speech recognition circuit (digital watermark) of claim 1 , wherein the speech accelerator (speech recognition means) has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
US20030200089A1
CLAIM 1
. A speech recognition apparatus for recognizing input speech , comprising : storage means for storing recognition vocabulary information for speech recognition ;
input means for inputting speech data ;
read means for reading external data including vocabulary information ;
speech recognition means (speech accelerator) for making speech recognition of the speech data using the vocabulary information in the read external data , and the recognition vocabulary information ;
and output means for outputting a speech recognition result of said speech recognition means .

US20030200089A1
CLAIM 5
. The apparatus according to claim 3 , wherein the external data is an image which contains the vocabulary information generated by a digital watermark (speech recognition circuit) ing technique .

US7979277B2
CLAIM 10
. The speech recognition circuit (digital watermark) of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory .
US20030200089A1
CLAIM 5
. The apparatus according to claim 3 , wherein the external data is an image which contains the vocabulary information generated by a digital watermark (speech recognition circuit) ing technique .

US7979277B2
CLAIM 11
. The speech recognition circuit (digital watermark) of claim 1 , comprising increasing the pipeline depth by computing extra front frames in advance .
US20030200089A1
CLAIM 5
. The apparatus according to claim 3 , wherein the external data is an image which contains the vocabulary information generated by a digital watermark (speech recognition circuit) ing technique .

US7979277B2
CLAIM 12
. The speech recognition circuit (digital watermark) of claim 1 , wherein the audio front end is configured to input a digital audio signal .
US20030200089A1
CLAIM 5
. The apparatus according to claim 3 , wherein the external data is an image which contains the vocabulary information generated by a digital watermark (speech recognition circuit) ing technique .

US7979277B2
CLAIM 13
. A speech recognition circuit (digital watermark) of claim 1 , wherein said distance comprises a Mahalanobis distance .
US20030200089A1
CLAIM 5
. The apparatus according to claim 3 , wherein the external data is an image which contains the vocabulary information generated by a digital watermark (speech recognition circuit) ing technique .

US7979277B2
CLAIM 14
. A speech recognition circuit (digital watermark) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US20030200089A1
CLAIM 1
. A speech recognition apparatus for recognizing input speech , comprising : storage means for storing recognition vocabulary information for speech recognition ;
input means for inputting speech data (audio time frame, time frame) ;
read means for reading external data including vocabulary information ;
speech recognition means for making speech recognition of the speech data using the vocabulary information in the read external data , and the recognition vocabulary information ;
and output means for outputting a speech recognition result of said speech recognition means .

US20030200089A1
CLAIM 5
. The apparatus according to claim 3 , wherein the external data is an image which contains the vocabulary information generated by a digital watermark (speech recognition circuit) ing technique .

US7979277B2
CLAIM 15
. A speech recognition method (speech recognition method) , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US20030200089A1
CLAIM 1
. A speech recognition apparatus for recognizing input speech , comprising : storage means for storing recognition vocabulary information for speech recognition ;
input means for inputting speech data (audio time frame, time frame) ;
read means for reading external data including vocabulary information ;
speech recognition means for making speech recognition of the speech data using the vocabulary information in the read external data , and the recognition vocabulary information ;
and output means for outputting a speech recognition result of said speech recognition means .

US20030200089A1
CLAIM 8
. A speech recognition method (speech recognition method) for recognizing input speech , comprising : an input step of inputting speech data ;
a read step of reading external data including vocabulary information ;
a speech recognition step of making speech recognition of the speech data using the vocabulary information in the read external data , and recognition vocabulary information stored in a recognition vocabulary database ;
and an output step of outputting a speech recognition result of the speech recognition step .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method (speech recognition method) , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
US20030200089A1
CLAIM 1
. A speech recognition apparatus for recognizing input speech , comprising : storage means for storing recognition vocabulary information for speech recognition ;
input means for inputting speech data (audio time frame, time frame) ;
read means for reading external data including vocabulary information ;
speech recognition means for making speech recognition of the speech data using the vocabulary information in the read external data , and the recognition vocabulary information ;
and output means for outputting a speech recognition result of said speech recognition means .

US20030200089A1
CLAIM 8
. A speech recognition method (speech recognition method) for recognizing input speech , comprising : an input step of inputting speech data ;
a read step of reading external data including vocabulary information ;
a speech recognition step of making speech recognition of the speech data using the vocabulary information in the read external data , and recognition vocabulary information stored in a recognition vocabulary database ;
and an output step of outputting a speech recognition result of the speech recognition step .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
JP2004086150A

Filed: 2003-04-14     Issued: 2004-03-18

音声制御装置

(Original Assignee) Denso Corp; 株式会社デンソー     

▲高▼見 雅之, Masayuki Takami, Toru Nada, 名田 徹
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal (データ) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
JP2004086150A
CLAIM 1
ユーザの発話内容を認識し、その発話内容に応じた制御対象機器の操作を行う音声制御装置であって、 発話されるべき複数のコマンドを音声認識データ (audio signal) として記憶する記憶手段と、 前記制御対象機器の動作状態を検出する検出手段と、 前記検出手段が検出した制御対象機器の動作状態に基づいて、その動作状態において選択可能なコマンドを、前記音声認識データを構成する複数のコマンドから識別する識別手段と、 前記識別手段で識別されたコマンドを用いて、ユーザの発話内容をその中の1つのコマンドとして認識する音声認識手段とを備えることを特徴とする音声制御装置。

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal (データ) for a predetermined time frame .
JP2004086150A
CLAIM 1
ユーザの発話内容を認識し、その発話内容に応じた制御対象機器の操作を行う音声制御装置であって、 発話されるべき複数のコマンドを音声認識データ (audio signal) として記憶する記憶手段と、 前記制御対象機器の動作状態を検出する検出手段と、 前記検出手段が検出した制御対象機器の動作状態に基づいて、その動作状態において選択可能なコマンドを、前記音声認識データを構成する複数のコマンドから識別する識別手段と、 前記識別手段で識別されたコマンドを用いて、ユーザの発話内容をその中の1つのコマンドとして認識する音声認識手段とを備えることを特徴とする音声制御装置。

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature (備えること) vector from the front end .
JP2004086150A
CLAIM 1
ユーザの発話内容を認識し、その発話内容に応じた制御対象機器の操作を行う音声制御装置であって、 発話されるべき複数のコマンドを音声認識データとして記憶する記憶手段と、 前記制御対象機器の動作状態を検出する検出手段と、 前記検出手段が検出した制御対象機器の動作状態に基づいて、その動作状態において選択可能なコマンドを、前記音声認識データを構成する複数のコマンドから識別する識別手段と、 前記識別手段で識別されたコマンドを用いて、ユーザの発話内容をその中の1つのコマンドとして認識する音声認識手段とを備えること (next feature) を特徴とする音声制御装置。

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end is configured to input a digital audio signal (データ) .
JP2004086150A
CLAIM 1
ユーザの発話内容を認識し、その発話内容に応じた制御対象機器の操作を行う音声制御装置であって、 発話されるべき複数のコマンドを音声認識データ (audio signal) として記憶する記憶手段と、 前記制御対象機器の動作状態を検出する検出手段と、 前記検出手段が検出した制御対象機器の動作状態に基づいて、その動作状態において選択可能なコマンドを、前記音声認識データを構成する複数のコマンドから識別する識別手段と、 前記識別手段で識別されたコマンドを用いて、ユーザの発話内容をその中の1つのコマンドとして認識する音声認識手段とを備えることを特徴とする音声制御装置。

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal (データ) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
JP2004086150A
CLAIM 1
ユーザの発話内容を認識し、その発話内容に応じた制御対象機器の操作を行う音声制御装置であって、 発話されるべき複数のコマンドを音声認識データ (audio signal) として記憶する記憶手段と、 前記制御対象機器の動作状態を検出する検出手段と、 前記検出手段が検出した制御対象機器の動作状態に基づいて、その動作状態において選択可能なコマンドを、前記音声認識データを構成する複数のコマンドから識別する識別手段と、 前記識別手段で識別されたコマンドを用いて、ユーザの発話内容をその中の1つのコマンドとして認識する音声認識手段とを備えることを特徴とする音声制御装置。

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal (データ) using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
JP2004086150A
CLAIM 1
ユーザの発話内容を認識し、その発話内容に応じた制御対象機器の操作を行う音声制御装置であって、 発話されるべき複数のコマンドを音声認識データ (audio signal) として記憶する記憶手段と、 前記制御対象機器の動作状態を検出する検出手段と、 前記検出手段が検出した制御対象機器の動作状態に基づいて、その動作状態において選択可能なコマンドを、前記音声認識データを構成する複数のコマンドから識別する識別手段と、 前記識別手段で識別されたコマンドを用いて、ユーザの発話内容をその中の1つのコマンドとして認識する音声認識手段とを備えることを特徴とする音声制御装置。

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal (データ) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation (前記一) , to the distance calculation , and to the word identification (識別手段) .
JP2004086150A
CLAIM 1
ユーザの発話内容を認識し、その発話内容に応じた制御対象機器の操作を行う音声制御装置であって、 発話されるべき複数のコマンドを音声認識データ (audio signal) として記憶する記憶手段と、 前記制御対象機器の動作状態を検出する検出手段と、 前記検出手段が検出した制御対象機器の動作状態に基づいて、その動作状態において選択可能なコマンドを、前記音声認識データを構成する複数のコマンドから識別する識別手段 (word identification) と、 前記識別手段で識別されたコマンドを用いて、ユーザの発話内容をその中の1つのコマンドとして認識する音声認識手段とを備えることを特徴とする音声制御装置。

JP2004086150A
CLAIM 12
前記一 (feature calculation) 方のコマンドによる操作が実行できない旨を報知する報知手段をさらに備えることを特徴とする請求項11に記載の音声制御装置。




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
JP2004170765A

Filed: 2002-11-21     Issued: 2004-06-17

音声処理装置および方法、記録媒体並びにプログラム

(Original Assignee) Sony Corp; ソニー株式会社     

Hiroaki Ogawa, 浩明 小川
US7979277B2
CLAIM 2
. A speech recognition circuit as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing (前記選択手段) on the first processor .
JP2004170765A
CLAIM 8
前記入力音声と前記ネットワークとのマッチングに基づいて、前記ネットワーク上のスコアの高いパスを選択する選択手段と、 前記選択手段 (search stage processing) により選択された前記パスを含む前記ネットワークに基づいて、前記未知語に対応する発音を取得する発音取得手段と をさらに備えることを特徴とする請求項7に記載の音声処理装置。

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator (音声処理方法) has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature (備えること) vector from the front end .
JP2004170765A
CLAIM 1
入力音声を処理し、その処理結果に基づいて、前記入力音声に含まれる単語を登録する音声処理装置であって、 連続する前記入力音声を認識する認識手段と、 前記認識手段により認識された認識結果に、未知語が含まれているか否かを判定する未知語判定手段と、 前記未知語判定手段において前記認識結果に前記未知語が含まれていると判定された場合、前記未知語の単語境界に対応する時刻のサブワードを含むパスと、そのサブワードを含まないパスを有するネットワークを生成するネットワーク生成手段と、 前記未知語判定手段により、前記未知語が含まれていると判定された場合、前記未知語に対応する単語を獲得する獲得手段と、 前記獲得手段により獲得された前記単語を他の情報に関連付けて登録する登録手段と を備えること (next feature) を特徴とする音声処理装置。

JP2004170765A
CLAIM 11
入力音声を処理し、その処理結果に基づいて、前記入力音声に含まれる単語を登録する音声処理装置の音声処理方法 (speech accelerator) において、 連続する前記入力音声を認識する認識ステップと、 前記認識ステップの処理により認識された認識結果に、未知語が含まれているか否かを判定する判定ステップと、 前記判定ステップの処理において前記認識結果に前記未知語が含まれていると判定された場合、前記未知語の単語境界に対応する時刻のサブワードを含むパスと、そのサブワードを含まないパスを有するネットワークを生成するネットワーク生成ステップと、 前記判定ステップの処理により、前記未知語が含まれていると判定された場合、前記未知語に対応する単語を獲得する獲得ステップと、 前記獲得ステップの処理により獲得された前記単語を他の情報に関連付けて登録する登録ステップと を含むことを特徴とする音声処理方法

US7979277B2
CLAIM 10
. The speech recognition circuit of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory (処理結果) .
JP2004170765A
CLAIM 1
入力音声を処理し、その処理結果 (result memory) に基づいて、前記入力音声に含まれる単語を登録する音声処理装置であって、 連続する前記入力音声を認識する認識手段と、 前記認識手段により認識された認識結果に、未知語が含まれているか否かを判定する未知語判定手段と、 前記未知語判定手段において前記認識結果に前記未知語が含まれていると判定された場合、前記未知語の単語境界に対応する時刻のサブワードを含むパスと、そのサブワードを含まないパスを有するネットワークを生成するネットワーク生成手段と、 前記未知語判定手段により、前記未知語が含まれていると判定された場合、前記未知語に対応する単語を獲得する獲得手段と、 前記獲得手段により獲得された前記単語を他の情報に関連付けて登録する登録手段と を備えることを特徴とする音声処理装置。

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification (他の情報) .
JP2004170765A
CLAIM 1
入力音声を処理し、その処理結果に基づいて、前記入力音声に含まれる単語を登録する音声処理装置であって、 連続する前記入力音声を認識する認識手段と、 前記認識手段により認識された認識結果に、未知語が含まれているか否かを判定する未知語判定手段と、 前記未知語判定手段において前記認識結果に前記未知語が含まれていると判定された場合、前記未知語の単語境界に対応する時刻のサブワードを含むパスと、そのサブワードを含まないパスを有するネットワークを生成するネットワーク生成手段と、 前記未知語判定手段により、前記未知語が含まれていると判定された場合、前記未知語に対応する単語を獲得する獲得手段と、 前記獲得手段により獲得された前記単語を他の情報 (word identification) に関連付けて登録する登録手段と を備えることを特徴とする音声処理装置。




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
CN1420486A

Filed: 2002-11-15     Issued: 2003-05-28

基于决策树的语音辨别

(Original Assignee) Motorola Inc     (Current Assignee) Motorola Solutions Inc

李恒舜
US7979277B2
CLAIM 3
. A speech recognition circuit as claimed in claim 1 , comprising dynamic scheduling whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results (间隔的) and/or availability of space for storing more feature vectors and/or distance results .
CN1420486A
CLAIM 10
. 根据权利要求9所述的建立至少一个决策树的方法,其中潜在阈值通过把上述范围分成平均间隔的 (distance results) 子范围来确定。

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator has an interrupt signal (一个副本) to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
CN1420486A
CLAIM 12
. 一种语音辨别的方法,包括以下步骤:提供处理为至少一个特征向量的样本语音信号,该特征向量代表语音信号的频谱特征;把特征向量分成诸多子特征向量;把每个子特征向量应用于对应的决策树上,以获得模型子向量的诸多组,该模型子向量很可能至少指示样本语音信号的一个音素,决策树通过分析从统计语音模型获得的模型子向量来建立,其中决策树具有基于从潜在阈值选择的已选择阈值的决策,已选择阈值通过所述模型子向量之间的方差的变化来选择,所述方差依据所述平均值和与所述模型子向量关联的方差值来确定;从子特征向量的诸多组中选择多个模型子向量,从而识别模型子向量的最后候选名单;和处理该最后候选名单,以提供样本语音信号的一个副本 (interrupt signal)

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation (频谱特) , to the distance calculation , and to the word identification .
CN1420486A
CLAIM 12
. 一种语音辨别的方法,包括以下步骤:提供处理为至少一个特征向量的样本语音信号,该特征向量代表语音信号的频谱特 (feature calculation) 征;把特征向量分成诸多子特征向量;把每个子特征向量应用于对应的决策树上,以获得模型子向量的诸多组,该模型子向量很可能至少指示样本语音信号的一个音素,决策树通过分析从统计语音模型获得的模型子向量来建立,其中决策树具有基于从潜在阈值选择的已选择阈值的决策,已选择阈值通过所述模型子向量之间的方差的变化来选择,所述方差依据所述平均值和与所述模型子向量关联的方差值来确定;从子特征向量的诸多组中选择多个模型子向量,从而识别模型子向量的最后候选名单;和处理该最后候选名单,以提供样本语音信号的一个副本。




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US20030200085A1

Filed: 2002-04-22     Issued: 2003-10-23

Pattern matching for large vocabulary speech recognition systems

(Original Assignee) Individual     (Current Assignee) Sovereign Peak Ventures LLC

Patrick Nguyen, Luca Rigazio
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end (search algorithm) for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor (first processor) , and said calculating circuit is implemented using a second processor (second processor, processing power) , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US20030200085A1
CLAIM 10
. A method for improving pattern matching in a speech recognition system having a plurality of acoustic models , comprising : receiving continuous speech input ;
generating a sequence of acoustic feature vectors that represent temporal and spectral behavior of the speech input ;
retrieving a first group of acoustic feature vectors from the sequence of acoustic feature vectors into a first memory workspace accessible to a first processor (first processor) ;
retrieving a first acoustic model from the plurality of acoustic models into the first memory workspace ;
retrieving a first group of acoustic feature vectors from the sequence of acoustic feature vectors into a second memory workspace accessible to a second processor (second processor, dynamic scheduling) ;
retrieving a second acoustic model from the plurality of acoustic models into the second memory workspace ;
and determining a similarity measure for each acoustic feature vector of the first group of acoustic feature vectors in relation to the first acoustic model by the first processor contemporaneously with determining a similarity measure for each acoustic feature vector of the first group of acoustic feature vectors in relation to the second acoustic model by the second processor .

US20030200085A1
CLAIM 14
. The method of claim 11 wherein the step of partitioning the active search space further comprises allocating the active search space amongst the plurality of the processing nodes based on available processing power (second processor, dynamic scheduling) associated with each processing node .

US20030200085A1
CLAIM 16
. The method of claim 11 wherein the step of performing a searching operation on the observed acoustic data further comprises defining the search operation as at least one of a Viterbi search algorithm (front end) , a stack decoding algorithm , a multi-pass search algorithm and a forward-backward search algorithm .

US7979277B2
CLAIM 2
. A speech recognition circuit as claimed in claim 1 , wherein the pipelining comprises alternating of front end (search algorithm) and search stage processing on the first processor (first processor) .
US20030200085A1
CLAIM 10
. A method for improving pattern matching in a speech recognition system having a plurality of acoustic models , comprising : receiving continuous speech input ;
generating a sequence of acoustic feature vectors that represent temporal and spectral behavior of the speech input ;
retrieving a first group of acoustic feature vectors from the sequence of acoustic feature vectors into a first memory workspace accessible to a first processor (first processor) ;
retrieving a first acoustic model from the plurality of acoustic models into the first memory workspace ;
retrieving a first group of acoustic feature vectors from the sequence of acoustic feature vectors into a second memory workspace accessible to a second processor ;
retrieving a second acoustic model from the plurality of acoustic models into the second memory workspace ;
and determining a similarity measure for each acoustic feature vector of the first group of acoustic feature vectors in relation to the first acoustic model by the first processor contemporaneously with determining a similarity measure for each acoustic feature vector of the first group of acoustic feature vectors in relation to the second acoustic model by the second processor .

US20030200085A1
CLAIM 16
. The method of claim 11 wherein the step of performing a searching operation on the observed acoustic data further comprises defining the search operation as at least one of a Viterbi search algorithm (front end) , a stack decoding algorithm , a multi-pass search algorithm and a forward-backward search algorithm .

US7979277B2
CLAIM 3
. A speech recognition circuit as claimed in claim 1 , comprising dynamic scheduling (second processor, processing power) whether the first processor (first processor) should run the front end (search algorithm) or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
US20030200085A1
CLAIM 10
. A method for improving pattern matching in a speech recognition system having a plurality of acoustic models , comprising : receiving continuous speech input ;
generating a sequence of acoustic feature vectors that represent temporal and spectral behavior of the speech input ;
retrieving a first group of acoustic feature vectors from the sequence of acoustic feature vectors into a first memory workspace accessible to a first processor (first processor) ;
retrieving a first acoustic model from the plurality of acoustic models into the first memory workspace ;
retrieving a first group of acoustic feature vectors from the sequence of acoustic feature vectors into a second memory workspace accessible to a second processor (second processor, dynamic scheduling) ;
retrieving a second acoustic model from the plurality of acoustic models into the second memory workspace ;
and determining a similarity measure for each acoustic feature vector of the first group of acoustic feature vectors in relation to the first acoustic model by the first processor contemporaneously with determining a similarity measure for each acoustic feature vector of the first group of acoustic feature vectors in relation to the second acoustic model by the second processor .

US20030200085A1
CLAIM 14
. The method of claim 11 wherein the step of partitioning the active search space further comprises allocating the active search space amongst the plurality of the processing nodes based on available processing power (second processor, dynamic scheduling) associated with each processing node .

US20030200085A1
CLAIM 16
. The method of claim 11 wherein the step of performing a searching operation on the observed acoustic data further comprises defining the search operation as at least one of a Viterbi search algorithm (front end) , a stack decoding algorithm , a multi-pass search algorithm and a forward-backward search algorithm .

US7979277B2
CLAIM 4
. A speech recognition circuit as claimed in claim 1 , wherein the first processor (first processor) supports multi-threaded operation , and runs the search stage and front ends as separate threads .
US20030200085A1
CLAIM 10
. A method for improving pattern matching in a speech recognition system having a plurality of acoustic models , comprising : receiving continuous speech input ;
generating a sequence of acoustic feature vectors that represent temporal and spectral behavior of the speech input ;
retrieving a first group of acoustic feature vectors from the sequence of acoustic feature vectors into a first memory workspace accessible to a first processor (first processor) ;
retrieving a first acoustic model from the plurality of acoustic models into the first memory workspace ;
retrieving a first group of acoustic feature vectors from the sequence of acoustic feature vectors into a second memory workspace accessible to a second processor ;
retrieving a second acoustic model from the plurality of acoustic models into the second memory workspace ;
and determining a similarity measure for each acoustic feature vector of the first group of acoustic feature vectors in relation to the first acoustic model by the first processor contemporaneously with determining a similarity measure for each acoustic feature vector of the first group of acoustic feature vectors in relation to the second acoustic model by the second processor .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end (search algorithm) that the accelerator is ready to receive a next feature vector from the front end .
US20030200085A1
CLAIM 16
. The method of claim 11 wherein the step of performing a searching operation on the observed acoustic data further comprises defining the search operation as at least one of a Viterbi search algorithm (front end) , a stack decoding algorithm , a multi-pass search algorithm and a forward-backward search algorithm .

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end (search algorithm) is configured to input a digital audio signal .
US20030200085A1
CLAIM 16
. The method of claim 11 wherein the step of performing a searching operation on the observed acoustic data further comprises defining the search operation as at least one of a Viterbi search algorithm (front end) , a stack decoding algorithm , a multi-pass search algorithm and a forward-backward search algorithm .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end (search algorithm) for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US20030200085A1
CLAIM 16
. The method of claim 11 wherein the step of performing a searching operation on the observed acoustic data further comprises defining the search operation as at least one of a Viterbi search algorithm (front end) , a stack decoding algorithm , a multi-pass search algorithm and a forward-backward search algorithm .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal using an audio front end (search algorithm) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US20030200085A1
CLAIM 16
. The method of claim 11 wherein the step of performing a searching operation on the observed acoustic data further comprises defining the search operation as at least one of a Viterbi search algorithm (front end) , a stack decoding algorithm , a multi-pass search algorithm and a forward-backward search algorithm .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification (searching operation) .
US20030200085A1
CLAIM 5
. The method of claim 2 further comprises updating a search space based on the similarity measures for the first group of acoustic feature vectors ;
and subsequently performing a searching operation (word identification) on the search space .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
KR20020095731A

Filed: 2001-06-15     Issued: 2002-12-28

음성특징 추출장치

(Original Assignee) 주식회사 엑스텔테크놀러지     

김창민, 오상훈, 원영걸, 이수영
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end (청각의) for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (음성구간) ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
KR20020095731A
CLAIM 1
입력된 음성신호를 대역통과필터를 통해 다수의 다른 주파수 대역으로 분할하여 입력된 음성신호의 특징을 추출하는 음성특징추출장치에 있어서 , 상기 각 대역통과필터로부터 출력된 각 대역별 음성신호의 영 교차점과 세기정보를 검출하고 검출값에 청각의 (audio front end) 인지적 특성을 적용하여 잡음에 대한 민감도가 낮은 음성 특징벡터를 산출하는 특징벡터 조성수단 ;
상기 특징벡터 조성수단에서 출력된 음성 특징벡터의 차원 및 개수를 줄이는 특징축약수단 ;
및 상기 특징축약수단으로부터 출력된 음성 특징벡터를 음성구간 (audio time frame) 동안 저장함과 아울러 시간 및 세기에 대하여 정규화처리하는 정규화처리수단을 포함하여 이루어짐을 특징으로 하는 음성특징 추출장치 .

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end (청각의) is configured to input a digital audio signal .
KR20020095731A
CLAIM 1
입력된 음성신호를 대역통과필터를 통해 다수의 다른 주파수 대역으로 분할하여 입력된 음성신호의 특징을 추출하는 음성특징추출장치에 있어서 , 상기 각 대역통과필터로부터 출력된 각 대역별 음성신호의 영 교차점과 세기정보를 검출하고 검출값에 청각의 (audio front end) 인지적 특성을 적용하여 잡음에 대한 민감도가 낮은 음성 특징벡터를 산출하는 특징벡터 조성수단 ;
상기 특징벡터 조성수단에서 출력된 음성 특징벡터의 차원 및 개수를 줄이는 특징축약수단 ;
및 상기 특징축약수단으로부터 출력된 음성 특징벡터를 음성구간동안 저장함과 아울러 시간 및 세기에 대하여 정규화처리하는 정규화처리수단을 포함하여 이루어짐을 특징으로 하는 음성특징 추출장치 .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end (청각의) for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (음성구간) ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
KR20020095731A
CLAIM 1
입력된 음성신호를 대역통과필터를 통해 다수의 다른 주파수 대역으로 분할하여 입력된 음성신호의 특징을 추출하는 음성특징추출장치에 있어서 , 상기 각 대역통과필터로부터 출력된 각 대역별 음성신호의 영 교차점과 세기정보를 검출하고 검출값에 청각의 (audio front end) 인지적 특성을 적용하여 잡음에 대한 민감도가 낮은 음성 특징벡터를 산출하는 특징벡터 조성수단 ;
상기 특징벡터 조성수단에서 출력된 음성 특징벡터의 차원 및 개수를 줄이는 특징축약수단 ;
및 상기 특징축약수단으로부터 출력된 음성 특징벡터를 음성구간 (audio time frame) 동안 저장함과 아울러 시간 및 세기에 대하여 정규화처리하는 정규화처리수단을 포함하여 이루어짐을 특징으로 하는 음성특징 추출장치 .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal using an audio front end (청각의) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (음성구간) ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
KR20020095731A
CLAIM 1
입력된 음성신호를 대역통과필터를 통해 다수의 다른 주파수 대역으로 분할하여 입력된 음성신호의 특징을 추출하는 음성특징추출장치에 있어서 , 상기 각 대역통과필터로부터 출력된 각 대역별 음성신호의 영 교차점과 세기정보를 검출하고 검출값에 청각의 (audio front end) 인지적 특성을 적용하여 잡음에 대한 민감도가 낮은 음성 특징벡터를 산출하는 특징벡터 조성수단 ;
상기 특징벡터 조성수단에서 출력된 음성 특징벡터의 차원 및 개수를 줄이는 특징축약수단 ;
및 상기 특징축약수단으로부터 출력된 음성 특징벡터를 음성구간 (audio time frame) 동안 저장함과 아울러 시간 및 세기에 대하여 정규화처리하는 정규화처리수단을 포함하여 이루어짐을 특징으로 하는 음성특징 추출장치 .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (음성구간) ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
KR20020095731A
CLAIM 1
입력된 음성신호를 대역통과필터를 통해 다수의 다른 주파수 대역으로 분할하여 입력된 음성신호의 특징을 추출하는 음성특징추출장치에 있어서 , 상기 각 대역통과필터로부터 출력된 각 대역별 음성신호의 영 교차점과 세기정보를 검출하고 검출값에 청각의 인지적 특성을 적용하여 잡음에 대한 민감도가 낮은 음성 특징벡터를 산출하는 특징벡터 조성수단 ;
상기 특징벡터 조성수단에서 출력된 음성 특징벡터의 차원 및 개수를 줄이는 특징축약수단 ;
및 상기 특징축약수단으로부터 출력된 음성 특징벡터를 음성구간 (audio time frame) 동안 저장함과 아울러 시간 및 세기에 대하여 정규화처리하는 정규화처리수단을 포함하여 이루어짐을 특징으로 하는 음성특징 추출장치 .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
JP2000293191A

Filed: 1999-04-02     Issued: 2000-10-20

音声認識装置及び音声認識方法並びにその方法に用いられる木構造辞書の作成方法

(Original Assignee) Canon Inc; キヤノン株式会社     

Hiroki Yamamoto, 寛樹 山本
US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature (更に有すること, 備えること) vector from the front end .
JP2000293191A
CLAIM 4
【請求項4】音声認識方法に用いる木構造辞書を作成す る木構造辞書作成方法であって、 複数の単語をその構成音素に基づいて整列するステップ と、 前記整列した単語順に連続したID情報を付与するステ ップと、 単語の先頭音素から共通な音素を同一のノードとして木 構造化するステップと、木構造辞書のノードのうち、親 のノードと到達可能な単語の集合が異なるノードに、そ のノードから到達可能な単語の数と到達可能な単語の前 記ID情報のうち、最小又は最大のID情報とノード情 報として付与するステップと、 を備えること (next feature) を特徴とする木構造辞書作成方法。

JP2000293191A
CLAIM 6
【請求項6】前記入力音声を音響分析するステップと、 音響尤度を求めるステップと、 を更に有すること (next feature) を特徴とする請求項5に記載の音声認 識方法。

US7979277B2
CLAIM 10
. The speech recognition circuit of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory (認識結果) .
JP2000293191A
CLAIM 2
【請求項2】単語を構成する音素をノードとするツリー であって、各単語の先頭音素から同一の音素を共通のノ ードとするツリーによって複数の単語を表現した木構造 辞書と、 前記木構造辞書に含まれる各単語に対応したID情報で あって、共通のノードを有する単語同士は連続するよう に定義されたID情報と、 前記木構造辞書のノード中の親のノードと到達可能な単 語の集合が異なるノードについてのノード情報であっ て、該ノードから到達可能な単語の数と該単語の前記I D情報の内最大又は最小のID情報とを含むノード情報 と、 を用いた音声認識方法であって、 前記ノード情報に対応する言語尤度を参照して入力音声 の累積尤度を更新する累積尤度更新ステップと、 前記入力音声の最終的な累積尤度により、認識結果 (result memory) を出 力するステップと、 を有することを特徴とする音声認識方法。

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words (読メモリ) within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
JP2000293191A
CLAIM 7
【請求項7】音声認識装置を制御するための制御プログ ラムを記憶したコンピュータ可読メモリ (processor to identify words) であって、 前記制御プログラムは、 請求項4に記載の木構造辞書作成方法で木構造辞書を作 成するプログラムと、 音声を取り込むプログラムと、 取り込んだ入力音声を分析して、その音素ごとに前記木 構造辞書の前記ノード情報に対応した言語尤度を導くプ ログラムと、 前記言語尤度を用いて、前記木構造辞書の中で前記入力 音声に最も適合した単語を音声認識結果として出力する プログラムと、 を含むことを特徴とするコンピュータ可読メモリ




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
GB2333172A

Filed: 1998-01-13     Issued: 1999-07-14

Recognition

(Original Assignee) SoftSound Ltd     (Current Assignee) SoftSound Ltd

Tony Robinson
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal (temporal alignment) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

a calculating circuit for calculating distances (determining means) indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
GB2333172A
CLAIM 16
. A method according to any one of claims 12 to 15 wherein if a said determined grouping model state is already stored in said memory means , the calculated accumulated scores and associated information are merged with the stored group of accumulated scores and associated information in temporal alignment (audio signal, digital audio signal) whereby if there is temporal overlap between accumulated scores , the highest accumulated score is stored in said memory means together with the respective associated information .

GB2333172A
CLAIM 17
. A method according to any preceding claim wherein said data comprises digitised speech , said sequential data units comprise sample frames of the speech data (audio time frame, time frame) , said tokens comprise units of speech , and said items comprise spoken words .

GB2333172A
CLAIM 18
. Recognition apparatus for recognising sequential tokens grouped into one or more items , the apparatus comprising : storage means for storing data representing known items as respective finite state sequence models , where each state corresponds to a token and said models having common prefix states are organised in a tree structure such that suffix states comprise branches from common prefix states and there are a plurality of tree structures each having a different prefix state ;
comparing means for comparing each sequential data unit with stored reference data units identified by respective reference tokens to generate scores for each data unit indicating the similarity of the data unit to respective said reference data units ;
determining means (calculating distances) for determining an accumulated score for the final state in the models comprising a) means for sequentially calculating the accumulated score for a model to reach the final state comprising a leaf in the tree , b) means for identifying the closest branch to the leaf corresponding to a next model for which an accumulated score for the final state has not yet been calculated , and c) means for accumulating the score from the identified closest branch for the next model to the final state , wherein the scores are accumulated for the branches of the tree and accumulated for the plurality of trees ;
and means for identifying at least the item corresponding to the model having the highest accumulated score .

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal (temporal alignment) for a predetermined time frame (speech data) .
GB2333172A
CLAIM 16
. A method according to any one of claims 12 to 15 wherein if a said determined grouping model state is already stored in said memory means , the calculated accumulated scores and associated information are merged with the stored group of accumulated scores and associated information in temporal alignment (audio signal, digital audio signal) whereby if there is temporal overlap between accumulated scores , the highest accumulated score is stored in said memory means together with the respective associated information .

GB2333172A
CLAIM 17
. A method according to any preceding claim wherein said data comprises digitised speech , said sequential data units comprise sample frames of the speech data (audio time frame, time frame) , said tokens comprise units of speech , and said items comprise spoken words .

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end is configured to input a digital audio signal (temporal alignment) .
GB2333172A
CLAIM 16
. A method according to any one of claims 12 to 15 wherein if a said determined grouping model state is already stored in said memory means , the calculated accumulated scores and associated information are merged with the stored group of accumulated scores and associated information in temporal alignment (audio signal, digital audio signal) whereby if there is temporal overlap between accumulated scores , the highest accumulated score is stored in said memory means together with the respective associated information .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal (temporal alignment) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

calculating means (calculating means) for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
GB2333172A
CLAIM 16
. A method according to any one of claims 12 to 15 wherein if a said determined grouping model state is already stored in said memory means , the calculated accumulated scores and associated information are merged with the stored group of accumulated scores and associated information in temporal alignment (audio signal, digital audio signal) whereby if there is temporal overlap between accumulated scores , the highest accumulated score is stored in said memory means together with the respective associated information .

GB2333172A
CLAIM 17
. A method according to any preceding claim wherein said data comprises digitised speech , said sequential data units comprise sample frames of the speech data (audio time frame, time frame) , said tokens comprise units of speech , and said items comprise spoken words .

GB2333172A
CLAIM 25
. Recognition apparatus according to claim 24 including second storage means for storing the accumulated scores for the temporal states for a final state of a model of an item as a group together with information identifying a grouping model state used to arrive at the final state , information identifying positions in the tree structure and information identifying the temporal position of the accumulated scores ;
reading means for reading a group of accumulated scores from said second storage means ;
wherein said calculating means (calculating means) is adapted to use the said accumulated scores for the plurality of temporal states as a plurality of temporally different initial scores for the calculation of the accumulated scores for the plurality of temporal states of a final state of a model of a subsequent item ;
and including means for determining the grouping model state used to arrive at the final state of the model of the subsequent item from the identification of the subsequent item and the grouping model state of the read group of accumulated scores .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal (temporal alignment) using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
GB2333172A
CLAIM 16
. A method according to any one of claims 12 to 15 wherein if a said determined grouping model state is already stored in said memory means , the calculated accumulated scores and associated information are merged with the stored group of accumulated scores and associated information in temporal alignment (audio signal, digital audio signal) whereby if there is temporal overlap between accumulated scores , the highest accumulated score is stored in said memory means together with the respective associated information .

GB2333172A
CLAIM 17
. A method according to any preceding claim wherein said data comprises digitised speech , said sequential data units comprise sample frames of the speech data (audio time frame, time frame) , said tokens comprise units of speech , and said items comprise spoken words .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal (temporal alignment) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
GB2333172A
CLAIM 16
. A method according to any one of claims 12 to 15 wherein if a said determined grouping model state is already stored in said memory means , the calculated accumulated scores and associated information are merged with the stored group of accumulated scores and associated information in temporal alignment (audio signal, digital audio signal) whereby if there is temporal overlap between accumulated scores , the highest accumulated score is stored in said memory means together with the respective associated information .

GB2333172A
CLAIM 17
. A method according to any preceding claim wherein said data comprises digitised speech , said sequential data units comprise sample frames of the speech data (audio time frame, time frame) , said tokens comprise units of speech , and said items comprise spoken words .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US5878392A

Filed: 1997-05-27     Issued: 1999-03-02

Speech recognition using recursive time-domain high-pass filtering of spectral feature vectors

(Original Assignee) US Philips Corp     (Current Assignee) US Philips Corp

Peter Meyer, Hans-Wilhelm Ruhl
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal (window function) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (time frame) ;

a calculating circuit (generating means) for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US5878392A
CLAIM 1
. A speech recognition circuit comprising : scanning means for periodically scanning a speech signal to form a sequence of scan values ;
frame forming means for receiving the sequence of scan values and forming a sequence of frames , each frame having a uniform number of scan values ;
windowing means for receiving a sequence of frames and weighting each frame with a window function (audio signal) ;
logarithmizing means for receiving each weighted sequence of frames and generating a logarithmized power density spectrum for each frame ;
spectral feature generating means (calculating circuit) for generating a spectral feature vector from the logarithmized power density spectrum ;
discrete filtering means for performing a recursive high-pass filtering of the spectral feature vector utilizing a previously generated spectral feature vector ;
and comparison means for comparing the filtered spectral feature vector with a reference spectral feature vector and outputting a recognition signal .

US5878392A
CLAIM 4
. The speech recognition circuit of claim 1 , wherein the scanning means subdivides the speech signal into mutually overlapping time frame (time frame) s .

US7979277B2
CLAIM 5
. A speech recognition circuit as claimed in claim 1 , wherein the said calculating circuit (generating means) is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
US5878392A
CLAIM 1
. A speech recognition circuit comprising : scanning means for periodically scanning a speech signal to form a sequence of scan values ;
frame forming means for receiving the sequence of scan values and forming a sequence of frames , each frame having a uniform number of scan values ;
windowing means for receiving a sequence of frames and weighting each frame with a window function ;
logarithmizing means for receiving each weighted sequence of frames and generating a logarithmized power density spectrum for each frame ;
spectral feature generating means (calculating circuit) for generating a spectral feature vector from the logarithmized power density spectrum ;
discrete filtering means for performing a recursive high-pass filtering of the spectral feature vector utilizing a previously generated spectral feature vector ;
and comparison means for comparing the filtered spectral feature vector with a reference spectral feature vector and outputting a recognition signal .

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal (window function) for a predetermined time frame (time frame) .
US5878392A
CLAIM 1
. A speech recognition circuit comprising : scanning means for periodically scanning a speech signal to form a sequence of scan values ;
frame forming means for receiving the sequence of scan values and forming a sequence of frames , each frame having a uniform number of scan values ;
windowing means for receiving a sequence of frames and weighting each frame with a window function (audio signal) ;
logarithmizing means for receiving each weighted sequence of frames and generating a logarithmized power density spectrum for each frame ;
spectral feature generating means for generating a spectral feature vector from the logarithmized power density spectrum ;
discrete filtering means for performing a recursive high-pass filtering of the spectral feature vector utilizing a previously generated spectral feature vector ;
and comparison means for comparing the filtered spectral feature vector with a reference spectral feature vector and outputting a recognition signal .

US5878392A
CLAIM 4
. The speech recognition circuit of claim 1 , wherein the scanning means subdivides the speech signal into mutually overlapping time frame (time frame) s .

US7979277B2
CLAIM 10
. The speech recognition circuit of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory (speech signal) .
US5878392A
CLAIM 1
. A speech recognition circuit comprising : scanning means for periodically scanning a speech signal (result memory) to form a sequence of scan values ;
frame forming means for receiving the sequence of scan values and forming a sequence of frames , each frame having a uniform number of scan values ;
windowing means for receiving a sequence of frames and weighting each frame with a window function ;
logarithmizing means for receiving each weighted sequence of frames and generating a logarithmized power density spectrum for each frame ;
spectral feature generating means for generating a spectral feature vector from the logarithmized power density spectrum ;
discrete filtering means for performing a recursive high-pass filtering of the spectral feature vector utilizing a previously generated spectral feature vector ;
and comparison means for comparing the filtered spectral feature vector with a reference spectral feature vector and outputting a recognition signal .

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end is configured to input a digital audio (spectral component) signal .
US5878392A
CLAIM 1
. A speech recognition circuit comprising : scanning means for periodically scanning a speech signal to form a sequence of scan values ;
frame forming means for receiving the sequence of scan values and forming a sequence of frames , each frame having a uniform number of scan values ;
windowing means for receiving a sequence of frames and weighting each frame with a window function (audio signal) ;
logarithmizing means for receiving each weighted sequence of frames and generating a logarithmized power density spectrum for each frame ;
spectral feature generating means for generating a spectral feature vector from the logarithmized power density spectrum ;
discrete filtering means for performing a recursive high-pass filtering of the spectral feature vector utilizing a previously generated spectral feature vector ;
and comparison means for comparing the filtered spectral feature vector with a reference spectral feature vector and outputting a recognition signal .

US5878392A
CLAIM 2
. The speech recognition circuit of claim 1 , wherein the discrete filtering means performs filtering in accordance with the following relationship : M(n , i)=V(n , i)-V(n-1 , i)+C M(n-1 , i) , wherein V(n , i) is a non-filtered spectral feature vector in the time domain , M(n , i) is filtered spectral feature vector , n represents a particular frame , i represents a spectral component (digital audio) of spectral feature vector M or V and C represents a constant .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal (window function) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (time frame) ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US5878392A
CLAIM 1
. A speech recognition circuit comprising : scanning means for periodically scanning a speech signal to form a sequence of scan values ;
frame forming means for receiving the sequence of scan values and forming a sequence of frames , each frame having a uniform number of scan values ;
windowing means for receiving a sequence of frames and weighting each frame with a window function (audio signal) ;
logarithmizing means for receiving each weighted sequence of frames and generating a logarithmized power density spectrum for each frame ;
spectral feature generating means for generating a spectral feature vector from the logarithmized power density spectrum ;
discrete filtering means for performing a recursive high-pass filtering of the spectral feature vector utilizing a previously generated spectral feature vector ;
and comparison means for comparing the filtered spectral feature vector with a reference spectral feature vector and outputting a recognition signal .

US5878392A
CLAIM 4
. The speech recognition circuit of claim 1 , wherein the scanning means subdivides the speech signal into mutually overlapping time frame (time frame) s .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal (window function) using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (time frame) ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit (generating means) ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US5878392A
CLAIM 1
. A speech recognition circuit comprising : scanning means for periodically scanning a speech signal to form a sequence of scan values ;
frame forming means for receiving the sequence of scan values and forming a sequence of frames , each frame having a uniform number of scan values ;
windowing means for receiving a sequence of frames and weighting each frame with a window function (audio signal) ;
logarithmizing means for receiving each weighted sequence of frames and generating a logarithmized power density spectrum for each frame ;
spectral feature generating means (calculating circuit) for generating a spectral feature vector from the logarithmized power density spectrum ;
discrete filtering means for performing a recursive high-pass filtering of the spectral feature vector utilizing a previously generated spectral feature vector ;
and comparison means for comparing the filtered spectral feature vector with a reference spectral feature vector and outputting a recognition signal .

US5878392A
CLAIM 4
. The speech recognition circuit of claim 1 , wherein the scanning means subdivides the speech signal into mutually overlapping time frame (time frame) s .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal (window function) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (time frame) ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation (pass filtering) , to the distance calculation , and to the word identification .
US5878392A
CLAIM 1
. A speech recognition circuit comprising : scanning means for periodically scanning a speech signal to form a sequence of scan values ;
frame forming means for receiving the sequence of scan values and forming a sequence of frames , each frame having a uniform number of scan values ;
windowing means for receiving a sequence of frames and weighting each frame with a window function (audio signal) ;
logarithmizing means for receiving each weighted sequence of frames and generating a logarithmized power density spectrum for each frame ;
spectral feature generating means for generating a spectral feature vector from the logarithmized power density spectrum ;
discrete filtering means for performing a recursive high-pass filtering (feature calculation) of the spectral feature vector utilizing a previously generated spectral feature vector ;
and comparison means for comparing the filtered spectral feature vector with a reference spectral feature vector and outputting a recognition signal .

US5878392A
CLAIM 4
. The speech recognition circuit of claim 1 , wherein the scanning means subdivides the speech signal into mutually overlapping time frame (time frame) s .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
EP0780828A2

Filed: 1996-12-17     Issued: 1997-06-25

Method and system for performing speech recognition

(Original Assignee) AT&T Corp     (Current Assignee) AT&T Corp

Mazin G. Rahim, Jay Gordon Wilpon
US7979277B2
CLAIM 1
. A speech recognition circuit (recognizing speech) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (domain converter, said signals) ;

a calculating circuit for calculating distances (following steps) indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
EP0780828A2
CLAIM 8
A system for compensating for enhancement of speech signals for optimizing speech recognition performance , the system comprising : an enhancer for selectively varying the gain of select frequencies of time varying speech signals transmitted on a network path ;
a receiver for receiving said enhanced speech signals ;
a frequency domain converter (digital audio, digital audio signal, audio time frame, search stage processing) for converting the enhanced speech signal received at the receiver to frequency domain representations ;
a compensator for receiving the frequency domain representations of the speech signals enhanced by said enhancer , wherein said compensator introduces gain variations to the frequency domain representations of the speech signals transmitted on the path for compensating for gain variations introduced to the speech signals by said enhancer ;
and , a cepstral feature computer for computing cepstral features from the compensated , frequency domain representations of the enhanced speech signals .

EP0780828A2
CLAIM 32
A system for recognizing speech (speech recognition circuit, word identification) signals and for compensating for network enhancement of said signals (digital audio, digital audio signal, audio time frame, search stage processing) comprising : a filter for compensating for a network enhancement component of enhanced speech signals ;
a feature extractor for extracting features based on filtered speech signals from said filter ;
and a speech recognizer for recognizing speech signals based on extracted features from the feature extractor .

EP0780828A2
CLAIM 35
A method of generating feature signals form speech signals comprising the following steps (calculating means, speech recognition method, calculating distances) : receiving the speech signals ;
blocking the speech signals into frames ;
performing in combination linear predictive coding and cepstral recursion analysis on the blocked speech signals to produce mel-LPC cepstrum feature signals .

US7979277B2
CLAIM 2
. A speech recognition circuit (recognizing speech) as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing (domain converter, said signals) on the first processor .
EP0780828A2
CLAIM 8
A system for compensating for enhancement of speech signals for optimizing speech recognition performance , the system comprising : an enhancer for selectively varying the gain of select frequencies of time varying speech signals transmitted on a network path ;
a receiver for receiving said enhanced speech signals ;
a frequency domain converter (digital audio, digital audio signal, audio time frame, search stage processing) for converting the enhanced speech signal received at the receiver to frequency domain representations ;
a compensator for receiving the frequency domain representations of the speech signals enhanced by said enhancer , wherein said compensator introduces gain variations to the frequency domain representations of the speech signals transmitted on the path for compensating for gain variations introduced to the speech signals by said enhancer ;
and , a cepstral feature computer for computing cepstral features from the compensated , frequency domain representations of the enhanced speech signals .

EP0780828A2
CLAIM 32
A system for recognizing speech (speech recognition circuit, word identification) signals and for compensating for network enhancement of said signals (digital audio, digital audio signal, audio time frame, search stage processing) comprising : a filter for compensating for a network enhancement component of enhanced speech signals ;
a feature extractor for extracting features based on filtered speech signals from said filter ;
and a speech recognizer for recognizing speech signals based on extracted features from the feature extractor .

US7979277B2
CLAIM 3
. A speech recognition circuit (recognizing speech) as claimed in claim 1 , comprising dynamic scheduling whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
EP0780828A2
CLAIM 32
A system for recognizing speech (speech recognition circuit, word identification) signals and for compensating for network enhancement of said signals comprising : a filter for compensating for a network enhancement component of enhanced speech signals ;
a feature extractor for extracting features based on filtered speech signals from said filter ;
and a speech recognizer for recognizing speech signals based on extracted features from the feature extractor .

US7979277B2
CLAIM 4
. A speech recognition circuit (recognizing speech) as claimed in claim 1 , wherein the first processor supports multi-threaded operation , and runs the search stage and front ends as separate threads .
EP0780828A2
CLAIM 32
A system for recognizing speech (speech recognition circuit, word identification) signals and for compensating for network enhancement of said signals comprising : a filter for compensating for a network enhancement component of enhanced speech signals ;
a feature extractor for extracting features based on filtered speech signals from said filter ;
and a speech recognizer for recognizing speech signals based on extracted features from the feature extractor .

US7979277B2
CLAIM 5
. A speech recognition circuit (recognizing speech) as claimed in claim 1 , wherein the said calculating circuit is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
EP0780828A2
CLAIM 32
A system for recognizing speech (speech recognition circuit, word identification) signals and for compensating for network enhancement of said signals comprising : a filter for compensating for a network enhancement component of enhanced speech signals ;
a feature extractor for extracting features based on filtered speech signals from said filter ;
and a speech recognizer for recognizing speech signals based on extracted features from the feature extractor .

US7979277B2
CLAIM 6
. The speech recognition circuit (recognizing speech) of claim 1 , comprising control means adapted to implement frame dropping , to discard one or more audio time frames .
EP0780828A2
CLAIM 32
A system for recognizing speech (speech recognition circuit, word identification) signals and for compensating for network enhancement of said signals comprising : a filter for compensating for a network enhancement component of enhanced speech signals ;
a feature extractor for extracting features based on filtered speech signals from said filter ;
and a speech recognizer for recognizing speech signals based on extracted features from the feature extractor .

US7979277B2
CLAIM 7
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal for a predetermined time frame .
EP0780828A2
CLAIM 32
A system for recognizing speech (speech recognition circuit, word identification) signals and for compensating for network enhancement of said signals comprising : a filter for compensating for a network enhancement component of enhanced speech signals ;
a feature extractor for extracting features based on filtered speech signals from said filter ;
and a speech recognizer for recognizing speech signals based on extracted features from the feature extractor .

US7979277B2
CLAIM 8
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the processor is configured to divert to another task if the data flow stalls .
EP0780828A2
CLAIM 32
A system for recognizing speech (speech recognition circuit, word identification) signals and for compensating for network enhancement of said signals comprising : a filter for compensating for a network enhancement component of enhanced speech signals ;
a feature extractor for extracting features based on filtered speech signals from said filter ;
and a speech recognizer for recognizing speech signals based on extracted features from the feature extractor .

US7979277B2
CLAIM 9
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector (filtered signal) from the front end .
EP0780828A2
CLAIM 32
A system for recognizing speech (speech recognition circuit, word identification) signals and for compensating for network enhancement of said signals comprising : a filter for compensating for a network enhancement component of enhanced speech signals ;
a feature extractor for extracting features based on filtered speech signals from said filter ;
and a speech recognizer for recognizing speech signals based on extracted features from the feature extractor .

EP0780828A2
CLAIM 37
The method of claim 35 further comprising the step of : utilizing a mel-filter bank to filter the blocked speech signals and produce mel-filtered signal (next feature vector) s which are then analyzed by performing linear predictive coding and cepstral recursion analysis in combination .

US7979277B2
CLAIM 10
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory .
EP0780828A2
CLAIM 32
A system for recognizing speech (speech recognition circuit, word identification) signals and for compensating for network enhancement of said signals comprising : a filter for compensating for a network enhancement component of enhanced speech signals ;
a feature extractor for extracting features based on filtered speech signals from said filter ;
and a speech recognizer for recognizing speech signals based on extracted features from the feature extractor .

US7979277B2
CLAIM 11
. The speech recognition circuit (recognizing speech) of claim 1 , comprising increasing the pipeline depth by computing extra front frames in advance .
EP0780828A2
CLAIM 32
A system for recognizing speech (speech recognition circuit, word identification) signals and for compensating for network enhancement of said signals comprising : a filter for compensating for a network enhancement component of enhanced speech signals ;
a feature extractor for extracting features based on filtered speech signals from said filter ;
and a speech recognizer for recognizing speech signals based on extracted features from the feature extractor .

US7979277B2
CLAIM 12
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the audio front end is configured to input a digital audio (domain converter, said signals) signal .
EP0780828A2
CLAIM 8
A system for compensating for enhancement of speech signals for optimizing speech recognition performance , the system comprising : an enhancer for selectively varying the gain of select frequencies of time varying speech signals transmitted on a network path ;
a receiver for receiving said enhanced speech signals ;
a frequency domain converter (digital audio, digital audio signal, audio time frame, search stage processing) for converting the enhanced speech signal received at the receiver to frequency domain representations ;
a compensator for receiving the frequency domain representations of the speech signals enhanced by said enhancer , wherein said compensator introduces gain variations to the frequency domain representations of the speech signals transmitted on the path for compensating for gain variations introduced to the speech signals by said enhancer ;
and , a cepstral feature computer for computing cepstral features from the compensated , frequency domain representations of the enhanced speech signals .

EP0780828A2
CLAIM 32
A system for recognizing speech (speech recognition circuit, word identification) signals and for compensating for network enhancement of said signals (digital audio, digital audio signal, audio time frame, search stage processing) comprising : a filter for compensating for a network enhancement component of enhanced speech signals ;
a feature extractor for extracting features based on filtered speech signals from said filter ;
and a speech recognizer for recognizing speech signals based on extracted features from the feature extractor .

US7979277B2
CLAIM 13
. A speech recognition circuit (recognizing speech) of claim 1 , wherein said distance comprises a Mahalanobis distance .
EP0780828A2
CLAIM 32
A system for recognizing speech (speech recognition circuit, word identification) signals and for compensating for network enhancement of said signals comprising : a filter for compensating for a network enhancement component of enhanced speech signals ;
a feature extractor for extracting features based on filtered speech signals from said filter ;
and a speech recognizer for recognizing speech signals based on extracted features from the feature extractor .

US7979277B2
CLAIM 14
. A speech recognition circuit (recognizing speech) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (domain converter, said signals) ;

calculating means (following steps) for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
EP0780828A2
CLAIM 8
A system for compensating for enhancement of speech signals for optimizing speech recognition performance , the system comprising : an enhancer for selectively varying the gain of select frequencies of time varying speech signals transmitted on a network path ;
a receiver for receiving said enhanced speech signals ;
a frequency domain converter (digital audio, digital audio signal, audio time frame, search stage processing) for converting the enhanced speech signal received at the receiver to frequency domain representations ;
a compensator for receiving the frequency domain representations of the speech signals enhanced by said enhancer , wherein said compensator introduces gain variations to the frequency domain representations of the speech signals transmitted on the path for compensating for gain variations introduced to the speech signals by said enhancer ;
and , a cepstral feature computer for computing cepstral features from the compensated , frequency domain representations of the enhanced speech signals .

EP0780828A2
CLAIM 32
A system for recognizing speech (speech recognition circuit, word identification) signals and for compensating for network enhancement of said signals (digital audio, digital audio signal, audio time frame, search stage processing) comprising : a filter for compensating for a network enhancement component of enhanced speech signals ;
a feature extractor for extracting features based on filtered speech signals from said filter ;
and a speech recognizer for recognizing speech signals based on extracted features from the feature extractor .

EP0780828A2
CLAIM 35
A method of generating feature signals form speech signals comprising the following steps (calculating means, speech recognition method, calculating distances) : receiving the speech signals ;
blocking the speech signals into frames ;
performing in combination linear predictive coding and cepstral recursion analysis on the blocked speech signals to produce mel-LPC cepstrum feature signals .

US7979277B2
CLAIM 15
. A speech recognition method (following steps) , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (domain converter, said signals) ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
EP0780828A2
CLAIM 8
A system for compensating for enhancement of speech signals for optimizing speech recognition performance , the system comprising : an enhancer for selectively varying the gain of select frequencies of time varying speech signals transmitted on a network path ;
a receiver for receiving said enhanced speech signals ;
a frequency domain converter (digital audio, digital audio signal, audio time frame, search stage processing) for converting the enhanced speech signal received at the receiver to frequency domain representations ;
a compensator for receiving the frequency domain representations of the speech signals enhanced by said enhancer , wherein said compensator introduces gain variations to the frequency domain representations of the speech signals transmitted on the path for compensating for gain variations introduced to the speech signals by said enhancer ;
and , a cepstral feature computer for computing cepstral features from the compensated , frequency domain representations of the enhanced speech signals .

EP0780828A2
CLAIM 32
A system for recognizing speech signals and for compensating for network enhancement of said signals (digital audio, digital audio signal, audio time frame, search stage processing) comprising : a filter for compensating for a network enhancement component of enhanced speech signals ;
a feature extractor for extracting features based on filtered speech signals from said filter ;
and a speech recognizer for recognizing speech signals based on extracted features from the feature extractor .

EP0780828A2
CLAIM 35
A method of generating feature signals form speech signals comprising the following steps (calculating means, speech recognition method, calculating distances) : receiving the speech signals ;
blocking the speech signals into frames ;
performing in combination linear predictive coding and cepstral recursion analysis on the blocked speech signals to produce mel-LPC cepstrum feature signals .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method (following steps) , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (domain converter, said signals) ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification (recognizing speech) .
EP0780828A2
CLAIM 8
A system for compensating for enhancement of speech signals for optimizing speech recognition performance , the system comprising : an enhancer for selectively varying the gain of select frequencies of time varying speech signals transmitted on a network path ;
a receiver for receiving said enhanced speech signals ;
a frequency domain converter (digital audio, digital audio signal, audio time frame, search stage processing) for converting the enhanced speech signal received at the receiver to frequency domain representations ;
a compensator for receiving the frequency domain representations of the speech signals enhanced by said enhancer , wherein said compensator introduces gain variations to the frequency domain representations of the speech signals transmitted on the path for compensating for gain variations introduced to the speech signals by said enhancer ;
and , a cepstral feature computer for computing cepstral features from the compensated , frequency domain representations of the enhanced speech signals .

EP0780828A2
CLAIM 32
A system for recognizing speech (speech recognition circuit, word identification) signals and for compensating for network enhancement of said signals (digital audio, digital audio signal, audio time frame, search stage processing) comprising : a filter for compensating for a network enhancement component of enhanced speech signals ;
a feature extractor for extracting features based on filtered speech signals from said filter ;
and a speech recognizer for recognizing speech signals based on extracted features from the feature extractor .

EP0780828A2
CLAIM 35
A method of generating feature signals form speech signals comprising the following steps (calculating means, speech recognition method, calculating distances) : receiving the speech signals ;
blocking the speech signals into frames ;
performing in combination linear predictive coding and cepstral recursion analysis on the blocked speech signals to produce mel-LPC cepstrum feature signals .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
EP1093112A2

Filed: 1996-12-17     Issued: 2001-04-18

A method for generating speech feature signals and an apparatus for carrying through this method

(Original Assignee) AT&T Corp     (Current Assignee) AT&T Corp

Mazin G. Rahim, Jay Gordon Wilpon
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal (domain representation, domain converter, said signals) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (domain representation, domain converter, said signals) ;

a calculating circuit for calculating distances (following steps) indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
EP1093112A2
CLAIM 2
A method of generating feature signals form speech signals comprising the following steps (calculating means, speech recognition method, calculating distances) : receiving the speech signals ;
blocking the speech signals into frames ;
performing in combination linear predictive coding and cepstral recursion analysis on the blocked speech signals to produce mel-LPC cepstrum feature signals .

EP1093112A2
CLAIM 5
The method of claim 2 , further comprising the step of preemphasizing the speech signals to achieve spectral flattening of said signals (audio signal, digital audio signal, digital audio, audio time frame, search stage processing) .

EP1093112A2
CLAIM 7
The method of claim 6 , further comprising the step of transforming each of the Hamming window frames from a time representation to a frequency domain representation (audio signal, digital audio signal, digital audio, audio time frame, search stage processing) .

EP1093112A2
CLAIM 11
A feature extractor apparatus for generating speech feature signals characterizing speech signals comprising : a frequency domain converter (audio signal, digital audio signal, digital audio, audio time frame, search stage processing) for generating a set of spectral samples representing the speech signals ;
a weighting unit for selectively weighting the set of spectral samples ;
and a feature computer for generating speech characterizing feature signals based on the weighted set of spectral samples .

US7979277B2
CLAIM 2
. A speech recognition circuit as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing (domain representation, domain converter, said signals) on the first processor .
EP1093112A2
CLAIM 5
The method of claim 2 , further comprising the step of preemphasizing the speech signals to achieve spectral flattening of said signals (audio signal, digital audio signal, digital audio, audio time frame, search stage processing) .

EP1093112A2
CLAIM 7
The method of claim 6 , further comprising the step of transforming each of the Hamming window frames from a time representation to a frequency domain representation (audio signal, digital audio signal, digital audio, audio time frame, search stage processing) .

EP1093112A2
CLAIM 11
A feature extractor apparatus for generating speech feature signals characterizing speech signals comprising : a frequency domain converter (audio signal, digital audio signal, digital audio, audio time frame, search stage processing) for generating a set of spectral samples representing the speech signals ;
a weighting unit for selectively weighting the set of spectral samples ;
and a feature computer for generating speech characterizing feature signals based on the weighted set of spectral samples .

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal (domain representation, domain converter, said signals) for a predetermined time frame .
EP1093112A2
CLAIM 5
The method of claim 2 , further comprising the step of preemphasizing the speech signals to achieve spectral flattening of said signals (audio signal, digital audio signal, digital audio, audio time frame, search stage processing) .

EP1093112A2
CLAIM 7
The method of claim 6 , further comprising the step of transforming each of the Hamming window frames from a time representation to a frequency domain representation (audio signal, digital audio signal, digital audio, audio time frame, search stage processing) .

EP1093112A2
CLAIM 11
A feature extractor apparatus for generating speech feature signals characterizing speech signals comprising : a frequency domain converter (audio signal, digital audio signal, digital audio, audio time frame, search stage processing) for generating a set of spectral samples representing the speech signals ;
a weighting unit for selectively weighting the set of spectral samples ;
and a feature computer for generating speech characterizing feature signals based on the weighted set of spectral samples .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector (filtered signal) from the front end .
EP1093112A2
CLAIM 4
The method of claim 2 , further comprising the step of : utilizing a mel-filter bank to filter the blocked speech signals and produce mel-filtered signal (next feature vector) s which are then analyzed by performing linear predictive coding and cepstral recursion analysis in combination .

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end is configured to input a digital audio signal (domain representation, domain converter, said signals) .
EP1093112A2
CLAIM 5
The method of claim 2 , further comprising the step of preemphasizing the speech signals to achieve spectral flattening of said signals (audio signal, digital audio signal, digital audio, audio time frame, search stage processing) .

EP1093112A2
CLAIM 7
The method of claim 6 , further comprising the step of transforming each of the Hamming window frames from a time representation to a frequency domain representation (audio signal, digital audio signal, digital audio, audio time frame, search stage processing) .

EP1093112A2
CLAIM 11
A feature extractor apparatus for generating speech feature signals characterizing speech signals comprising : a frequency domain converter (audio signal, digital audio signal, digital audio, audio time frame, search stage processing) for generating a set of spectral samples representing the speech signals ;
a weighting unit for selectively weighting the set of spectral samples ;
and a feature computer for generating speech characterizing feature signals based on the weighted set of spectral samples .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal (domain representation, domain converter, said signals) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (domain representation, domain converter, said signals) ;

calculating means (following steps) for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
EP1093112A2
CLAIM 2
A method of generating feature signals form speech signals comprising the following steps (calculating means, speech recognition method, calculating distances) : receiving the speech signals ;
blocking the speech signals into frames ;
performing in combination linear predictive coding and cepstral recursion analysis on the blocked speech signals to produce mel-LPC cepstrum feature signals .

EP1093112A2
CLAIM 5
The method of claim 2 , further comprising the step of preemphasizing the speech signals to achieve spectral flattening of said signals (audio signal, digital audio signal, digital audio, audio time frame, search stage processing) .

EP1093112A2
CLAIM 7
The method of claim 6 , further comprising the step of transforming each of the Hamming window frames from a time representation to a frequency domain representation (audio signal, digital audio signal, digital audio, audio time frame, search stage processing) .

EP1093112A2
CLAIM 11
A feature extractor apparatus for generating speech feature signals characterizing speech signals comprising : a frequency domain converter (audio signal, digital audio signal, digital audio, audio time frame, search stage processing) for generating a set of spectral samples representing the speech signals ;
a weighting unit for selectively weighting the set of spectral samples ;
and a feature computer for generating speech characterizing feature signals based on the weighted set of spectral samples .

US7979277B2
CLAIM 15
. A speech recognition method (following steps) , comprising : calculating a feature vector from an audio signal (domain representation, domain converter, said signals) using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (domain representation, domain converter, said signals) ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
EP1093112A2
CLAIM 2
A method of generating feature signals form speech signals comprising the following steps (calculating means, speech recognition method, calculating distances) : receiving the speech signals ;
blocking the speech signals into frames ;
performing in combination linear predictive coding and cepstral recursion analysis on the blocked speech signals to produce mel-LPC cepstrum feature signals .

EP1093112A2
CLAIM 5
The method of claim 2 , further comprising the step of preemphasizing the speech signals to achieve spectral flattening of said signals (audio signal, digital audio signal, digital audio, audio time frame, search stage processing) .

EP1093112A2
CLAIM 7
The method of claim 6 , further comprising the step of transforming each of the Hamming window frames from a time representation to a frequency domain representation (audio signal, digital audio signal, digital audio, audio time frame, search stage processing) .

EP1093112A2
CLAIM 11
A feature extractor apparatus for generating speech feature signals characterizing speech signals comprising : a frequency domain converter (audio signal, digital audio signal, digital audio, audio time frame, search stage processing) for generating a set of spectral samples representing the speech signals ;
a weighting unit for selectively weighting the set of spectral samples ;
and a feature computer for generating speech characterizing feature signals based on the weighted set of spectral samples .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method (following steps) , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal (domain representation, domain converter, said signals) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (domain representation, domain converter, said signals) ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification (LPC coefficient) .
EP1093112A2
CLAIM 2
A method of generating feature signals form speech signals comprising the following steps (calculating means, speech recognition method, calculating distances) : receiving the speech signals ;
blocking the speech signals into frames ;
performing in combination linear predictive coding and cepstral recursion analysis on the blocked speech signals to produce mel-LPC cepstrum feature signals .

EP1093112A2
CLAIM 5
The method of claim 2 , further comprising the step of preemphasizing the speech signals to achieve spectral flattening of said signals (audio signal, digital audio signal, digital audio, audio time frame, search stage processing) .

EP1093112A2
CLAIM 7
The method of claim 6 , further comprising the step of transforming each of the Hamming window frames from a time representation to a frequency domain representation (audio signal, digital audio signal, digital audio, audio time frame, search stage processing) .

EP1093112A2
CLAIM 10
The method of claim 9 , wherein the linear predictive coding analysis operates to convert the autocorrelation coefficients to LPC coefficient (word identification) s and the cepstral recursion analysis operates to compute cepstral parameters from the LPC coefficients .

EP1093112A2
CLAIM 11
A feature extractor apparatus for generating speech feature signals characterizing speech signals comprising : a frequency domain converter (audio signal, digital audio signal, digital audio, audio time frame, search stage processing) for generating a set of spectral samples representing the speech signals ;
a weighting unit for selectively weighting the set of spectral samples ;
and a feature computer for generating speech characterizing feature signals based on the weighted set of spectral samples .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US5881312A

Filed: 1996-12-09     Issued: 1999-03-09

Memory transfer apparatus and method useful within a pattern recognition system

(Original Assignee) Intel Corp     (Current Assignee) Intel Corp

Carole Dulong
US7979277B2
CLAIM 3
. A speech recognition circuit as claimed in claim 1 , comprising dynamic scheduling (control function) whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
US5881312A
CLAIM 1
. A memory transfer apparatus comprising : processing means for performing comparisons of written characters and for providing control function (comprising dynamic scheduling) s ;
memory transfer means for performing memory transfer operations by generating memory access requests in response to a plurality of parameters provided by said processing means , wherein said plurality of parameters are not modified by performing said memory transfer operations such that subsequent memory transfer operations are performed in accordance with said plurality of parameters ;
an external memory coupled to said memory transfer means , said external memory to store a plurality of reference patterns representing a plurality of written characters ;
and first and second internal memories , coupled to said memory transfer means ;
wherein a first selected reference pattern representing a first written character of said plurality of reference patterns is transferred to said first internal memory from said external memory during a comparison by said processing means of a second selected reference pattern representing a second written character previously transferred to said second internal memory and an unknown pattern representing an unknown written character , and further wherein a third selected reference pattern representing a third written character of said plurality of reference patterns is transferred to said second internal memory during subsequent comparison by said processing means of said first selected reference pattern and said unknown pattern .

US7979277B2
CLAIM 10
. The speech recognition circuit of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory (performing comparison) .
US5881312A
CLAIM 1
. A memory transfer apparatus comprising : processing means for performing comparison (result memory) s of written characters and for providing control functions ;
memory transfer means for performing memory transfer operations by generating memory access requests in response to a plurality of parameters provided by said processing means , wherein said plurality of parameters are not modified by performing said memory transfer operations such that subsequent memory transfer operations are performed in accordance with said plurality of parameters ;
an external memory coupled to said memory transfer means , said external memory to store a plurality of reference patterns representing a plurality of written characters ;
and first and second internal memories , coupled to said memory transfer means ;
wherein a first selected reference pattern representing a first written character of said plurality of reference patterns is transferred to said first internal memory from said external memory during a comparison by said processing means of a second selected reference pattern representing a second written character previously transferred to said second internal memory and an unknown pattern representing an unknown written character , and further wherein a third selected reference pattern representing a third written character of said plurality of reference patterns is transferred to said second internal memory during subsequent comparison by said processing means of said first selected reference pattern and said unknown pattern .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
JPH08110793A

Filed: 1995-07-27     Issued: 1996-04-30

特性ベクトルの前端正規化による音声認識の改良方法及びシステム

(Original Assignee) Microsoft Corp; マイクロソフト コーポレイション     

Alejandro Acero, アセロ アレハンドロ, Xuedong Huang, フアン ケドン
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage (受け取る段階) for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
JPH08110793A
CLAIM 2
【請求項2】 特性ベクトルを受け取る上記段階は、ケ プストラルベクトルを受け取る段階 (search stage, search stage processing, search stage code) を備えた請求項1に 記載の方法。

US7979277B2
CLAIM 2
. A speech recognition circuit as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage (受け取る段階) processing on the first processor .
JPH08110793A
CLAIM 2
【請求項2】 特性ベクトルを受け取る上記段階は、ケ プストラルベクトルを受け取る段階 (search stage, search stage processing, search stage code) を備えた請求項1に 記載の方法。

US7979277B2
CLAIM 3
. A speech recognition circuit as claimed in claim 1 , comprising dynamic scheduling whether the first processor should run the front end or search stage (受け取る段階) code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
JPH08110793A
CLAIM 2
【請求項2】 特性ベクトルを受け取る上記段階は、ケ プストラルベクトルを受け取る段階 (search stage, search stage processing, search stage code) を備えた請求項1に 記載の方法。

US7979277B2
CLAIM 4
. A speech recognition circuit as claimed in claim 1 , wherein the first processor supports multi-threaded operation , and runs the search stage (受け取る段階) and front ends as separate threads .
JPH08110793A
CLAIM 2
【請求項2】 特性ベクトルを受け取る上記段階は、ケ プストラルベクトルを受け取る段階 (search stage, search stage processing, search stage code) を備えた請求項1に 記載の方法。

US7979277B2
CLAIM 10
. The speech recognition circuit of claim 1 , wherein the accelerator signals to the search stage (受け取る段階) when the distances for a new frame are available in a result memory .
JPH08110793A
CLAIM 2
【請求項2】 特性ベクトルを受け取る上記段階は、ケ プストラルベクトルを受け取る段階 (search stage, search stage processing, search stage code) を備えた請求項1に 記載の方法。

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage (受け取る段階) for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
JPH08110793A
CLAIM 2
【請求項2】 特性ベクトルを受け取る上記段階は、ケ プストラルベクトルを受け取る段階 (search stage, search stage processing, search stage code) を備えた請求項1に 記載の方法。

US7979277B2
CLAIM 15
. A speech recognition method (システム) , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage (受け取る段階) to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
JPH08110793A
CLAIM 2
【請求項2】 特性ベクトルを受け取る上記段階は、ケ プストラルベクトルを受け取る段階 (search stage, search stage processing, search stage code) を備えた請求項1に 記載の方法。

JPH08110793A
CLAIM 20
【請求項20】 特性ベクトルの前端正規化により音声 認識を改良するシステム (speech recognition method) であって、上記音声は発音より 成り、各発音は音声のフレームを構成し、該音声の各フ レームは特性ベクトルによって表され、上記システム は、 既知の発音のデータベースであって、平均ノイズ特性ベ クトル及び平均音声特性ベクトルを有する発音のデータ ベースと、 入力正規化装置とを備え、該正規化装置は、 認識されるべき発音において音声のフレームを表す特性 ベクトルを受け取り、音声のフレームはノイズである確 率を有し、発音は平均ノイズ特性ベクトル及び平均音声 特性ベクトルを有し、 更に、音声のフレームがノイズである確率に基づくと共 に、発音及び発音のデータベースに対する平均ノイズ及 び音声特性ベクトルに基づいて修正ベクトルを計算し、 そして特性ベクトル及び修正ベクトルに基づいて正規化 された特性ベクトルを計算することを特徴とするシステ ム。

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method (システム) , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
JPH08110793A
CLAIM 20
【請求項20】 特性ベクトルの前端正規化により音声 認識を改良するシステム (speech recognition method) であって、上記音声は発音より 成り、各発音は音声のフレームを構成し、該音声の各フ レームは特性ベクトルによって表され、上記システム は、 既知の発音のデータベースであって、平均ノイズ特性ベ クトル及び平均音声特性ベクトルを有する発音のデータ ベースと、 入力正規化装置とを備え、該正規化装置は、 認識されるべき発音において音声のフレームを表す特性 ベクトルを受け取り、音声のフレームはノイズである確 率を有し、発音は平均ノイズ特性ベクトル及び平均音声 特性ベクトルを有し、 更に、音声のフレームがノイズである確率に基づくと共 に、発音及び発音のデータベースに対する平均ノイズ及 び音声特性ベクトルに基づいて修正ベクトルを計算し、 そして特性ベクトル及び修正ベクトルに基づいて正規化 された特性ベクトルを計算することを特徴とするシステ ム。




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
CN1151218A

Filed: 1995-04-25     Issued: 1997-06-04

用于语音识别的神经网络的训练方法

(Original Assignee) Motorola Inc     (Current Assignee) Motorola Solutions Inc

沙-平·托马斯·王
US7979277B2
CLAIM 1
. A speech recognition circuit (用于识别, 语音识别) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
CN1151218A
CLAIM 1
. 一种训练用于语音识别 (speech recognition circuit, speech recognition method, word identification) 系统中的多个神经网络的方法,上述每个神经网络都包括多个神经元,上述方法可产生多个训练样例,其中上述训练样例中每一个都包括一个输入部分和一个输出部分,上述方法由下面各步骤构成:(a)接受一个讲出的语词样例;(b)对上述讲出的语词进行模数转换,上述转换生成一个数字化语词;(c)对上述数字化语词进行倒频谱分析,上述分析产生一个数据帧序列;(d)由上述数据帧序列生成多个数据块;(e)从上述多个数据块中选择一个,并使上述多个训练样例中的一个的上述输入部分等于上述所选择的数据块;(f)从上述多个神经网络中选择一个,并确定上述选定的神经网络是否用于识别 (speech recognition circuit, speech recognition method, word identification) 上述选定的数据块;(i)如是,将上述一个训练样例的上述输出部分设置为1;(ii)如不是,将上述一个训练样例的上述输出部分设置为0;(g)存储上述一个训练样例;(h)确定是否存在上述多个数据块的另外一个;(i)如是,返回步骤(e);(ii)如不是,终止上述方法。

US7979277B2
CLAIM 2
. A speech recognition circuit (用于识别, 语音识别) as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor .
CN1151218A
CLAIM 1
. 一种训练用于语音识别 (speech recognition circuit, speech recognition method, word identification) 系统中的多个神经网络的方法,上述每个神经网络都包括多个神经元,上述方法可产生多个训练样例,其中上述训练样例中每一个都包括一个输入部分和一个输出部分,上述方法由下面各步骤构成:(a)接受一个讲出的语词样例;(b)对上述讲出的语词进行模数转换,上述转换生成一个数字化语词;(c)对上述数字化语词进行倒频谱分析,上述分析产生一个数据帧序列;(d)由上述数据帧序列生成多个数据块;(e)从上述多个数据块中选择一个,并使上述多个训练样例中的一个的上述输入部分等于上述所选择的数据块;(f)从上述多个神经网络中选择一个,并确定上述选定的神经网络是否用于识别 (speech recognition circuit, speech recognition method, word identification) 上述选定的数据块;(i)如是,将上述一个训练样例的上述输出部分设置为1;(ii)如不是,将上述一个训练样例的上述输出部分设置为0;(g)存储上述一个训练样例;(h)确定是否存在上述多个数据块的另外一个;(i)如是,返回步骤(e);(ii)如不是,终止上述方法。

US7979277B2
CLAIM 3
. A speech recognition circuit (用于识别, 语音识别) as claimed in claim 1 , comprising dynamic scheduling whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
CN1151218A
CLAIM 1
. 一种训练用于语音识别 (speech recognition circuit, speech recognition method, word identification) 系统中的多个神经网络的方法,上述每个神经网络都包括多个神经元,上述方法可产生多个训练样例,其中上述训练样例中每一个都包括一个输入部分和一个输出部分,上述方法由下面各步骤构成:(a)接受一个讲出的语词样例;(b)对上述讲出的语词进行模数转换,上述转换生成一个数字化语词;(c)对上述数字化语词进行倒频谱分析,上述分析产生一个数据帧序列;(d)由上述数据帧序列生成多个数据块;(e)从上述多个数据块中选择一个,并使上述多个训练样例中的一个的上述输入部分等于上述所选择的数据块;(f)从上述多个神经网络中选择一个,并确定上述选定的神经网络是否用于识别 (speech recognition circuit, speech recognition method, word identification) 上述选定的数据块;(i)如是,将上述一个训练样例的上述输出部分设置为1;(ii)如不是,将上述一个训练样例的上述输出部分设置为0;(g)存储上述一个训练样例;(h)确定是否存在上述多个数据块的另外一个;(i)如是,返回步骤(e);(ii)如不是,终止上述方法。

US7979277B2
CLAIM 4
. A speech recognition circuit (用于识别, 语音识别) as claimed in claim 1 , wherein the first processor supports multi-threaded operation , and runs the search stage and front ends as separate threads .
CN1151218A
CLAIM 1
. 一种训练用于语音识别 (speech recognition circuit, speech recognition method, word identification) 系统中的多个神经网络的方法,上述每个神经网络都包括多个神经元,上述方法可产生多个训练样例,其中上述训练样例中每一个都包括一个输入部分和一个输出部分,上述方法由下面各步骤构成:(a)接受一个讲出的语词样例;(b)对上述讲出的语词进行模数转换,上述转换生成一个数字化语词;(c)对上述数字化语词进行倒频谱分析,上述分析产生一个数据帧序列;(d)由上述数据帧序列生成多个数据块;(e)从上述多个数据块中选择一个,并使上述多个训练样例中的一个的上述输入部分等于上述所选择的数据块;(f)从上述多个神经网络中选择一个,并确定上述选定的神经网络是否用于识别 (speech recognition circuit, speech recognition method, word identification) 上述选定的数据块;(i)如是,将上述一个训练样例的上述输出部分设置为1;(ii)如不是,将上述一个训练样例的上述输出部分设置为0;(g)存储上述一个训练样例;(h)确定是否存在上述多个数据块的另外一个;(i)如是,返回步骤(e);(ii)如不是,终止上述方法。

US7979277B2
CLAIM 5
. A speech recognition circuit (用于识别, 语音识别) as claimed in claim 1 , wherein the said calculating circuit is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
CN1151218A
CLAIM 1
. 一种训练用于语音识别 (speech recognition circuit, speech recognition method, word identification) 系统中的多个神经网络的方法,上述每个神经网络都包括多个神经元,上述方法可产生多个训练样例,其中上述训练样例中每一个都包括一个输入部分和一个输出部分,上述方法由下面各步骤构成:(a)接受一个讲出的语词样例;(b)对上述讲出的语词进行模数转换,上述转换生成一个数字化语词;(c)对上述数字化语词进行倒频谱分析,上述分析产生一个数据帧序列;(d)由上述数据帧序列生成多个数据块;(e)从上述多个数据块中选择一个,并使上述多个训练样例中的一个的上述输入部分等于上述所选择的数据块;(f)从上述多个神经网络中选择一个,并确定上述选定的神经网络是否用于识别 (speech recognition circuit, speech recognition method, word identification) 上述选定的数据块;(i)如是,将上述一个训练样例的上述输出部分设置为1;(ii)如不是,将上述一个训练样例的上述输出部分设置为0;(g)存储上述一个训练样例;(h)确定是否存在上述多个数据块的另外一个;(i)如是,返回步骤(e);(ii)如不是,终止上述方法。

US7979277B2
CLAIM 6
. The speech recognition circuit (用于识别, 语音识别) of claim 1 , comprising control means adapted to implement frame dropping , to discard one or more audio time frames .
CN1151218A
CLAIM 1
. 一种训练用于语音识别 (speech recognition circuit, speech recognition method, word identification) 系统中的多个神经网络的方法,上述每个神经网络都包括多个神经元,上述方法可产生多个训练样例,其中上述训练样例中每一个都包括一个输入部分和一个输出部分,上述方法由下面各步骤构成:(a)接受一个讲出的语词样例;(b)对上述讲出的语词进行模数转换,上述转换生成一个数字化语词;(c)对上述数字化语词进行倒频谱分析,上述分析产生一个数据帧序列;(d)由上述数据帧序列生成多个数据块;(e)从上述多个数据块中选择一个,并使上述多个训练样例中的一个的上述输入部分等于上述所选择的数据块;(f)从上述多个神经网络中选择一个,并确定上述选定的神经网络是否用于识别 (speech recognition circuit, speech recognition method, word identification) 上述选定的数据块;(i)如是,将上述一个训练样例的上述输出部分设置为1;(ii)如不是,将上述一个训练样例的上述输出部分设置为0;(g)存储上述一个训练样例;(h)确定是否存在上述多个数据块的另外一个;(i)如是,返回步骤(e);(ii)如不是,终止上述方法。

US7979277B2
CLAIM 7
. The speech recognition circuit (用于识别, 语音识别) of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal for a predetermined time frame .
CN1151218A
CLAIM 1
. 一种训练用于语音识别 (speech recognition circuit, speech recognition method, word identification) 系统中的多个神经网络的方法,上述每个神经网络都包括多个神经元,上述方法可产生多个训练样例,其中上述训练样例中每一个都包括一个输入部分和一个输出部分,上述方法由下面各步骤构成:(a)接受一个讲出的语词样例;(b)对上述讲出的语词进行模数转换,上述转换生成一个数字化语词;(c)对上述数字化语词进行倒频谱分析,上述分析产生一个数据帧序列;(d)由上述数据帧序列生成多个数据块;(e)从上述多个数据块中选择一个,并使上述多个训练样例中的一个的上述输入部分等于上述所选择的数据块;(f)从上述多个神经网络中选择一个,并确定上述选定的神经网络是否用于识别 (speech recognition circuit, speech recognition method, word identification) 上述选定的数据块;(i)如是,将上述一个训练样例的上述输出部分设置为1;(ii)如不是,将上述一个训练样例的上述输出部分设置为0;(g)存储上述一个训练样例;(h)确定是否存在上述多个数据块的另外一个;(i)如是,返回步骤(e);(ii)如不是,终止上述方法。

US7979277B2
CLAIM 8
. The speech recognition circuit (用于识别, 语音识别) of claim 1 , wherein the processor is configured to divert to another task if the data flow stalls .
CN1151218A
CLAIM 1
. 一种训练用于语音识别 (speech recognition circuit, speech recognition method, word identification) 系统中的多个神经网络的方法,上述每个神经网络都包括多个神经元,上述方法可产生多个训练样例,其中上述训练样例中每一个都包括一个输入部分和一个输出部分,上述方法由下面各步骤构成:(a)接受一个讲出的语词样例;(b)对上述讲出的语词进行模数转换,上述转换生成一个数字化语词;(c)对上述数字化语词进行倒频谱分析,上述分析产生一个数据帧序列;(d)由上述数据帧序列生成多个数据块;(e)从上述多个数据块中选择一个,并使上述多个训练样例中的一个的上述输入部分等于上述所选择的数据块;(f)从上述多个神经网络中选择一个,并确定上述选定的神经网络是否用于识别 (speech recognition circuit, speech recognition method, word identification) 上述选定的数据块;(i)如是,将上述一个训练样例的上述输出部分设置为1;(ii)如不是,将上述一个训练样例的上述输出部分设置为0;(g)存储上述一个训练样例;(h)确定是否存在上述多个数据块的另外一个;(i)如是,返回步骤(e);(ii)如不是,终止上述方法。

US7979277B2
CLAIM 9
. The speech recognition circuit (用于识别, 语音识别) of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
CN1151218A
CLAIM 1
. 一种训练用于语音识别 (speech recognition circuit, speech recognition method, word identification) 系统中的多个神经网络的方法,上述每个神经网络都包括多个神经元,上述方法可产生多个训练样例,其中上述训练样例中每一个都包括一个输入部分和一个输出部分,上述方法由下面各步骤构成:(a)接受一个讲出的语词样例;(b)对上述讲出的语词进行模数转换,上述转换生成一个数字化语词;(c)对上述数字化语词进行倒频谱分析,上述分析产生一个数据帧序列;(d)由上述数据帧序列生成多个数据块;(e)从上述多个数据块中选择一个,并使上述多个训练样例中的一个的上述输入部分等于上述所选择的数据块;(f)从上述多个神经网络中选择一个,并确定上述选定的神经网络是否用于识别 (speech recognition circuit, speech recognition method, word identification) 上述选定的数据块;(i)如是,将上述一个训练样例的上述输出部分设置为1;(ii)如不是,将上述一个训练样例的上述输出部分设置为0;(g)存储上述一个训练样例;(h)确定是否存在上述多个数据块的另外一个;(i)如是,返回步骤(e);(ii)如不是,终止上述方法。

US7979277B2
CLAIM 10
. The speech recognition circuit (用于识别, 语音识别) of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory (上述一个) .
CN1151218A
CLAIM 1
. 一种训练用于语音识别 (speech recognition circuit, speech recognition method, word identification) 系统中的多个神经网络的方法,上述每个神经网络都包括多个神经元,上述方法可产生多个训练样例,其中上述训练样例中每一个都包括一个输入部分和一个输出部分,上述方法由下面各步骤构成:(a)接受一个讲出的语词样例;(b)对上述讲出的语词进行模数转换,上述转换生成一个数字化语词;(c)对上述数字化语词进行倒频谱分析,上述分析产生一个数据帧序列;(d)由上述数据帧序列生成多个数据块;(e)从上述多个数据块中选择一个,并使上述多个训练样例中的一个的上述输入部分等于上述所选择的数据块;(f)从上述多个神经网络中选择一个,并确定上述选定的神经网络是否用于识别 (speech recognition circuit, speech recognition method, word identification) 上述选定的数据块;(i)如是,将上述一个 (result memory) 训练样例的上述输出部分设置为1;(ii)如不是,将上述一个训练样例的上述输出部分设置为0;(g)存储上述一个训练样例;(h)确定是否存在上述多个数据块的另外一个;(i)如是,返回步骤(e);(ii)如不是,终止上述方法。

US7979277B2
CLAIM 11
. The speech recognition circuit (用于识别, 语音识别) of claim 1 , comprising increasing the pipeline depth by computing extra front frames in advance .
CN1151218A
CLAIM 1
. 一种训练用于语音识别 (speech recognition circuit, speech recognition method, word identification) 系统中的多个神经网络的方法,上述每个神经网络都包括多个神经元,上述方法可产生多个训练样例,其中上述训练样例中每一个都包括一个输入部分和一个输出部分,上述方法由下面各步骤构成:(a)接受一个讲出的语词样例;(b)对上述讲出的语词进行模数转换,上述转换生成一个数字化语词;(c)对上述数字化语词进行倒频谱分析,上述分析产生一个数据帧序列;(d)由上述数据帧序列生成多个数据块;(e)从上述多个数据块中选择一个,并使上述多个训练样例中的一个的上述输入部分等于上述所选择的数据块;(f)从上述多个神经网络中选择一个,并确定上述选定的神经网络是否用于识别 (speech recognition circuit, speech recognition method, word identification) 上述选定的数据块;(i)如是,将上述一个训练样例的上述输出部分设置为1;(ii)如不是,将上述一个训练样例的上述输出部分设置为0;(g)存储上述一个训练样例;(h)确定是否存在上述多个数据块的另外一个;(i)如是,返回步骤(e);(ii)如不是,终止上述方法。

US7979277B2
CLAIM 12
. The speech recognition circuit (用于识别, 语音识别) of claim 1 , wherein the audio front end is configured to input a digital audio signal .
CN1151218A
CLAIM 1
. 一种训练用于语音识别 (speech recognition circuit, speech recognition method, word identification) 系统中的多个神经网络的方法,上述每个神经网络都包括多个神经元,上述方法可产生多个训练样例,其中上述训练样例中每一个都包括一个输入部分和一个输出部分,上述方法由下面各步骤构成:(a)接受一个讲出的语词样例;(b)对上述讲出的语词进行模数转换,上述转换生成一个数字化语词;(c)对上述数字化语词进行倒频谱分析,上述分析产生一个数据帧序列;(d)由上述数据帧序列生成多个数据块;(e)从上述多个数据块中选择一个,并使上述多个训练样例中的一个的上述输入部分等于上述所选择的数据块;(f)从上述多个神经网络中选择一个,并确定上述选定的神经网络是否用于识别 (speech recognition circuit, speech recognition method, word identification) 上述选定的数据块;(i)如是,将上述一个训练样例的上述输出部分设置为1;(ii)如不是,将上述一个训练样例的上述输出部分设置为0;(g)存储上述一个训练样例;(h)确定是否存在上述多个数据块的另外一个;(i)如是,返回步骤(e);(ii)如不是,终止上述方法。

US7979277B2
CLAIM 13
. A speech recognition circuit (用于识别, 语音识别) of claim 1 , wherein said distance comprises a Mahalanobis distance .
CN1151218A
CLAIM 1
. 一种训练用于语音识别 (speech recognition circuit, speech recognition method, word identification) 系统中的多个神经网络的方法,上述每个神经网络都包括多个神经元,上述方法可产生多个训练样例,其中上述训练样例中每一个都包括一个输入部分和一个输出部分,上述方法由下面各步骤构成:(a)接受一个讲出的语词样例;(b)对上述讲出的语词进行模数转换,上述转换生成一个数字化语词;(c)对上述数字化语词进行倒频谱分析,上述分析产生一个数据帧序列;(d)由上述数据帧序列生成多个数据块;(e)从上述多个数据块中选择一个,并使上述多个训练样例中的一个的上述输入部分等于上述所选择的数据块;(f)从上述多个神经网络中选择一个,并确定上述选定的神经网络是否用于识别 (speech recognition circuit, speech recognition method, word identification) 上述选定的数据块;(i)如是,将上述一个训练样例的上述输出部分设置为1;(ii)如不是,将上述一个训练样例的上述输出部分设置为0;(g)存储上述一个训练样例;(h)确定是否存在上述多个数据块的另外一个;(i)如是,返回步骤(e);(ii)如不是,终止上述方法。

US7979277B2
CLAIM 14
. A speech recognition circuit (用于识别, 语音识别) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
CN1151218A
CLAIM 1
. 一种训练用于语音识别 (speech recognition circuit, speech recognition method, word identification) 系统中的多个神经网络的方法,上述每个神经网络都包括多个神经元,上述方法可产生多个训练样例,其中上述训练样例中每一个都包括一个输入部分和一个输出部分,上述方法由下面各步骤构成:(a)接受一个讲出的语词样例;(b)对上述讲出的语词进行模数转换,上述转换生成一个数字化语词;(c)对上述数字化语词进行倒频谱分析,上述分析产生一个数据帧序列;(d)由上述数据帧序列生成多个数据块;(e)从上述多个数据块中选择一个,并使上述多个训练样例中的一个的上述输入部分等于上述所选择的数据块;(f)从上述多个神经网络中选择一个,并确定上述选定的神经网络是否用于识别 (speech recognition circuit, speech recognition method, word identification) 上述选定的数据块;(i)如是,将上述一个训练样例的上述输出部分设置为1;(ii)如不是,将上述一个训练样例的上述输出部分设置为0;(g)存储上述一个训练样例;(h)确定是否存在上述多个数据块的另外一个;(i)如是,返回步骤(e);(ii)如不是,终止上述方法。

US7979277B2
CLAIM 15
. A speech recognition method (用于识别, 语音识别) , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
CN1151218A
CLAIM 1
. 一种训练用于语音识别 (speech recognition circuit, speech recognition method, word identification) 系统中的多个神经网络的方法,上述每个神经网络都包括多个神经元,上述方法可产生多个训练样例,其中上述训练样例中每一个都包括一个输入部分和一个输出部分,上述方法由下面各步骤构成:(a)接受一个讲出的语词样例;(b)对上述讲出的语词进行模数转换,上述转换生成一个数字化语词;(c)对上述数字化语词进行倒频谱分析,上述分析产生一个数据帧序列;(d)由上述数据帧序列生成多个数据块;(e)从上述多个数据块中选择一个,并使上述多个训练样例中的一个的上述输入部分等于上述所选择的数据块;(f)从上述多个神经网络中选择一个,并确定上述选定的神经网络是否用于识别 (speech recognition circuit, speech recognition method, word identification) 上述选定的数据块;(i)如是,将上述一个训练样例的上述输出部分设置为1;(ii)如不是,将上述一个训练样例的上述输出部分设置为0;(g)存储上述一个训练样例;(h)确定是否存在上述多个数据块的另外一个;(i)如是,返回步骤(e);(ii)如不是,终止上述方法。

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method (用于识别, 语音识别) , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification (用于识别, 语音识别) .
CN1151218A
CLAIM 1
. 一种训练用于语音识别 (speech recognition circuit, speech recognition method, word identification) 系统中的多个神经网络的方法,上述每个神经网络都包括多个神经元,上述方法可产生多个训练样例,其中上述训练样例中每一个都包括一个输入部分和一个输出部分,上述方法由下面各步骤构成:(a)接受一个讲出的语词样例;(b)对上述讲出的语词进行模数转换,上述转换生成一个数字化语词;(c)对上述数字化语词进行倒频谱分析,上述分析产生一个数据帧序列;(d)由上述数据帧序列生成多个数据块;(e)从上述多个数据块中选择一个,并使上述多个训练样例中的一个的上述输入部分等于上述所选择的数据块;(f)从上述多个神经网络中选择一个,并确定上述选定的神经网络是否用于识别 (speech recognition circuit, speech recognition method, word identification) 上述选定的数据块;(i)如是,将上述一个训练样例的上述输出部分设置为1;(ii)如不是,将上述一个训练样例的上述输出部分设置为0;(g)存储上述一个训练样例;(h)确定是否存在上述多个数据块的另外一个;(i)如是,返回步骤(e);(ii)如不是,终止上述方法。




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
EP0627726A1

Filed: 1994-06-01     Issued: 1994-12-07

Pattern recognition with a tree structure used for reference pattern feature vectors or for HMM

(Original Assignee) NEC Corp     (Current Assignee) NEC Corp

Takao C/O Nec Corporation Watanabe
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector (recognition method) from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit (calculating step, control means) for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
EP0627726A1
CLAIM 1
A pattern recognition method (feature vector) for locating an input pattern among a plurality of reference patterns represented by a set of characteristic data , comprising the steps of :    representing said input pattern as a time sequence of input pattern feature vectors ;
   representing said set of characteristic data as a tree structure comprising a root node representing on a root stage said set , a plurality of leaf nodes representing individually said characteristic data on a leaf stage farthest from said root stage , and a plurality of intermediate nodes representing subsets of said set on at least one intermediate stage between said root and said leaf stages , said subsets and the characteristic data represented by said leaf node being used as cluster data , respectively ;
   calculating cluster similarity measures between each input pattern feature vector and specific data represented among cluster data by specified nodes specified among said intermediate and said leaf nodes on a single specified stage ;
   selecting at least one selected node among daughter nodes of a mother node , said selected node representing ones of said cluster data for which an extremum of said cluster similarity measures is calculated , said daughter nodes being on a stage next farther from said root stage than a stage of said mother node ;
   controlling said calculating step (control means, calculating circuit, distance calculation) to specify said specified stage consecutively towards said leaf stage from a stage nearest to said root stage in said at least one intermediate stage with said specified nodes given first by the daughter nodes of said root node and subsequently by the daughter nodes of each of said at least one selected node ;
   controlling said selecting step to select said selected node from said intermediate nodes ;
   calculating pattern similarity measures between said input pattern and said reference patterns with each pattern similarity measure calculated by using said cluster similarity measures along a path from each of said at least one selected node selected with said root node used as the mother node and along branches branched from said path to ones of said leaf nodes when said ones of leaf nodes are used as the daughter nodes of each of said at least one selected node selected ultimately in each branch from said intermediate nodes ;
and    locating said input pattern as one of said reference patterns for which an extremum of said pattern similarity measures is calculated .

EP0627726A1
CLAIM 10
A device as claimed in any one of Claims 6 to 9 , said at least one intermediate stage comprising a first plurality of intermediate stages , characterised in that said reference pattern memory means (13) comprises :    frame vector tree memorizing means (29) for storing said tree structure ;
   clustering vector memory means (31) preliminarily loaded with said set at clustering vectors ;
   cluster vector calculating means (35) for clustering said clustering vectors into a second plurality of cluster groups with clusters of said cluster groups represented by said cluster vectors , respectively , said second plurality being equal to said first plurality less one ;
and    control means (control means, calculating circuit, distance calculation) (33) for making in said frame vector tree memory means (29) the intermediate nodes of said intermediate stages and said leaf nodes represent said cluster vectors with said cluster groups successively assigned to said intermediate stages except for one of said intermediate stages that is nearest to said root stage .

US7979277B2
CLAIM 5
. A speech recognition circuit as claimed in claim 1 , wherein the said calculating circuit (calculating step, control means) is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
EP0627726A1
CLAIM 1
A pattern recognition method for locating an input pattern among a plurality of reference patterns represented by a set of characteristic data , comprising the steps of :    representing said input pattern as a time sequence of input pattern feature vectors ;
   representing said set of characteristic data as a tree structure comprising a root node representing on a root stage said set , a plurality of leaf nodes representing individually said characteristic data on a leaf stage farthest from said root stage , and a plurality of intermediate nodes representing subsets of said set on at least one intermediate stage between said root and said leaf stages , said subsets and the characteristic data represented by said leaf node being used as cluster data , respectively ;
   calculating cluster similarity measures between each input pattern feature vector and specific data represented among cluster data by specified nodes specified among said intermediate and said leaf nodes on a single specified stage ;
   selecting at least one selected node among daughter nodes of a mother node , said selected node representing ones of said cluster data for which an extremum of said cluster similarity measures is calculated , said daughter nodes being on a stage next farther from said root stage than a stage of said mother node ;
   controlling said calculating step (control means, calculating circuit, distance calculation) to specify said specified stage consecutively towards said leaf stage from a stage nearest to said root stage in said at least one intermediate stage with said specified nodes given first by the daughter nodes of said root node and subsequently by the daughter nodes of each of said at least one selected node ;
   controlling said selecting step to select said selected node from said intermediate nodes ;
   calculating pattern similarity measures between said input pattern and said reference patterns with each pattern similarity measure calculated by using said cluster similarity measures along a path from each of said at least one selected node selected with said root node used as the mother node and along branches branched from said path to ones of said leaf nodes when said ones of leaf nodes are used as the daughter nodes of each of said at least one selected node selected ultimately in each branch from said intermediate nodes ;
and    locating said input pattern as one of said reference patterns for which an extremum of said pattern similarity measures is calculated .

EP0627726A1
CLAIM 10
A device as claimed in any one of Claims 6 to 9 , said at least one intermediate stage comprising a first plurality of intermediate stages , characterised in that said reference pattern memory means (13) comprises :    frame vector tree memorizing means (29) for storing said tree structure ;
   clustering vector memory means (31) preliminarily loaded with said set at clustering vectors ;
   cluster vector calculating means (35) for clustering said clustering vectors into a second plurality of cluster groups with clusters of said cluster groups represented by said cluster vectors , respectively , said second plurality being equal to said first plurality less one ;
and    control means (control means, calculating circuit, distance calculation) (33) for making in said frame vector tree memory means (29) the intermediate nodes of said intermediate stages and said leaf nodes represent said cluster vectors with said cluster groups successively assigned to said intermediate stages except for one of said intermediate stages that is nearest to said root stage .

US7979277B2
CLAIM 6
. The speech recognition circuit of claim 1 , comprising control means (calculating step, control means) adapted to implement frame dropping , to discard one or more audio time frames .
EP0627726A1
CLAIM 1
A pattern recognition method for locating an input pattern among a plurality of reference patterns represented by a set of characteristic data , comprising the steps of :    representing said input pattern as a time sequence of input pattern feature vectors ;
   representing said set of characteristic data as a tree structure comprising a root node representing on a root stage said set , a plurality of leaf nodes representing individually said characteristic data on a leaf stage farthest from said root stage , and a plurality of intermediate nodes representing subsets of said set on at least one intermediate stage between said root and said leaf stages , said subsets and the characteristic data represented by said leaf node being used as cluster data , respectively ;
   calculating cluster similarity measures between each input pattern feature vector and specific data represented among cluster data by specified nodes specified among said intermediate and said leaf nodes on a single specified stage ;
   selecting at least one selected node among daughter nodes of a mother node , said selected node representing ones of said cluster data for which an extremum of said cluster similarity measures is calculated , said daughter nodes being on a stage next farther from said root stage than a stage of said mother node ;
   controlling said calculating step (control means, calculating circuit, distance calculation) to specify said specified stage consecutively towards said leaf stage from a stage nearest to said root stage in said at least one intermediate stage with said specified nodes given first by the daughter nodes of said root node and subsequently by the daughter nodes of each of said at least one selected node ;
   controlling said selecting step to select said selected node from said intermediate nodes ;
   calculating pattern similarity measures between said input pattern and said reference patterns with each pattern similarity measure calculated by using said cluster similarity measures along a path from each of said at least one selected node selected with said root node used as the mother node and along branches branched from said path to ones of said leaf nodes when said ones of leaf nodes are used as the daughter nodes of each of said at least one selected node selected ultimately in each branch from said intermediate nodes ;
and    locating said input pattern as one of said reference patterns for which an extremum of said pattern similarity measures is calculated .

EP0627726A1
CLAIM 10
A device as claimed in any one of Claims 6 to 9 , said at least one intermediate stage comprising a first plurality of intermediate stages , characterised in that said reference pattern memory means (13) comprises :    frame vector tree memorizing means (29) for storing said tree structure ;
   clustering vector memory means (31) preliminarily loaded with said set at clustering vectors ;
   cluster vector calculating means (35) for clustering said clustering vectors into a second plurality of cluster groups with clusters of said cluster groups represented by said cluster vectors , respectively , said second plurality being equal to said first plurality less one ;
and    control means (control means, calculating circuit, distance calculation) (33) for making in said frame vector tree memory means (29) the intermediate nodes of said intermediate stages and said leaf nodes represent said cluster vectors with said cluster groups successively assigned to said intermediate stages except for one of said intermediate stages that is nearest to said root stage .

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector (recognition method) comprises a plurality of spectral components of an audio signal for a predetermined time frame .
EP0627726A1
CLAIM 1
A pattern recognition method (feature vector) for locating an input pattern among a plurality of reference patterns represented by a set of characteristic data , comprising the steps of :    representing said input pattern as a time sequence of input pattern feature vectors ;
   representing said set of characteristic data as a tree structure comprising a root node representing on a root stage said set , a plurality of leaf nodes representing individually said characteristic data on a leaf stage farthest from said root stage , and a plurality of intermediate nodes representing subsets of said set on at least one intermediate stage between said root and said leaf stages , said subsets and the characteristic data represented by said leaf node being used as cluster data , respectively ;
   calculating cluster similarity measures between each input pattern feature vector and specific data represented among cluster data by specified nodes specified among said intermediate and said leaf nodes on a single specified stage ;
   selecting at least one selected node among daughter nodes of a mother node , said selected node representing ones of said cluster data for which an extremum of said cluster similarity measures is calculated , said daughter nodes being on a stage next farther from said root stage than a stage of said mother node ;
   controlling said calculating step to specify said specified stage consecutively towards said leaf stage from a stage nearest to said root stage in said at least one intermediate stage with said specified nodes given first by the daughter nodes of said root node and subsequently by the daughter nodes of each of said at least one selected node ;
   controlling said selecting step to select said selected node from said intermediate nodes ;
   calculating pattern similarity measures between said input pattern and said reference patterns with each pattern similarity measure calculated by using said cluster similarity measures along a path from each of said at least one selected node selected with said root node used as the mother node and along branches branched from said path to ones of said leaf nodes when said ones of leaf nodes are used as the daughter nodes of each of said at least one selected node selected ultimately in each branch from said intermediate nodes ;
and    locating said input pattern as one of said reference patterns for which an extremum of said pattern similarity measures is calculated .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector (recognition method) from the front end .
EP0627726A1
CLAIM 1
A pattern recognition method (feature vector) for locating an input pattern among a plurality of reference patterns represented by a set of characteristic data , comprising the steps of :    representing said input pattern as a time sequence of input pattern feature vectors ;
   representing said set of characteristic data as a tree structure comprising a root node representing on a root stage said set , a plurality of leaf nodes representing individually said characteristic data on a leaf stage farthest from said root stage , and a plurality of intermediate nodes representing subsets of said set on at least one intermediate stage between said root and said leaf stages , said subsets and the characteristic data represented by said leaf node being used as cluster data , respectively ;
   calculating cluster similarity measures between each input pattern feature vector and specific data represented among cluster data by specified nodes specified among said intermediate and said leaf nodes on a single specified stage ;
   selecting at least one selected node among daughter nodes of a mother node , said selected node representing ones of said cluster data for which an extremum of said cluster similarity measures is calculated , said daughter nodes being on a stage next farther from said root stage than a stage of said mother node ;
   controlling said calculating step to specify said specified stage consecutively towards said leaf stage from a stage nearest to said root stage in said at least one intermediate stage with said specified nodes given first by the daughter nodes of said root node and subsequently by the daughter nodes of each of said at least one selected node ;
   controlling said selecting step to select said selected node from said intermediate nodes ;
   calculating pattern similarity measures between said input pattern and said reference patterns with each pattern similarity measure calculated by using said cluster similarity measures along a path from each of said at least one selected node selected with said root node used as the mother node and along branches branched from said path to ones of said leaf nodes when said ones of leaf nodes are used as the daughter nodes of each of said at least one selected node selected ultimately in each branch from said intermediate nodes ;
and    locating said input pattern as one of said reference patterns for which an extremum of said pattern similarity measures is calculated .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector (recognition method) from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
EP0627726A1
CLAIM 1
A pattern recognition method (feature vector) for locating an input pattern among a plurality of reference patterns represented by a set of characteristic data , comprising the steps of :    representing said input pattern as a time sequence of input pattern feature vectors ;
   representing said set of characteristic data as a tree structure comprising a root node representing on a root stage said set , a plurality of leaf nodes representing individually said characteristic data on a leaf stage farthest from said root stage , and a plurality of intermediate nodes representing subsets of said set on at least one intermediate stage between said root and said leaf stages , said subsets and the characteristic data represented by said leaf node being used as cluster data , respectively ;
   calculating cluster similarity measures between each input pattern feature vector and specific data represented among cluster data by specified nodes specified among said intermediate and said leaf nodes on a single specified stage ;
   selecting at least one selected node among daughter nodes of a mother node , said selected node representing ones of said cluster data for which an extremum of said cluster similarity measures is calculated , said daughter nodes being on a stage next farther from said root stage than a stage of said mother node ;
   controlling said calculating step to specify said specified stage consecutively towards said leaf stage from a stage nearest to said root stage in said at least one intermediate stage with said specified nodes given first by the daughter nodes of said root node and subsequently by the daughter nodes of each of said at least one selected node ;
   controlling said selecting step to select said selected node from said intermediate nodes ;
   calculating pattern similarity measures between said input pattern and said reference patterns with each pattern similarity measure calculated by using said cluster similarity measures along a path from each of said at least one selected node selected with said root node used as the mother node and along branches branched from said path to ones of said leaf nodes when said ones of leaf nodes are used as the daughter nodes of each of said at least one selected node selected ultimately in each branch from said intermediate nodes ;
and    locating said input pattern as one of said reference patterns for which an extremum of said pattern similarity measures is calculated .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector (recognition method) from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit (calculating step, control means) ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
EP0627726A1
CLAIM 1
A pattern recognition method (feature vector) for locating an input pattern among a plurality of reference patterns represented by a set of characteristic data , comprising the steps of :    representing said input pattern as a time sequence of input pattern feature vectors ;
   representing said set of characteristic data as a tree structure comprising a root node representing on a root stage said set , a plurality of leaf nodes representing individually said characteristic data on a leaf stage farthest from said root stage , and a plurality of intermediate nodes representing subsets of said set on at least one intermediate stage between said root and said leaf stages , said subsets and the characteristic data represented by said leaf node being used as cluster data , respectively ;
   calculating cluster similarity measures between each input pattern feature vector and specific data represented among cluster data by specified nodes specified among said intermediate and said leaf nodes on a single specified stage ;
   selecting at least one selected node among daughter nodes of a mother node , said selected node representing ones of said cluster data for which an extremum of said cluster similarity measures is calculated , said daughter nodes being on a stage next farther from said root stage than a stage of said mother node ;
   controlling said calculating step (control means, calculating circuit, distance calculation) to specify said specified stage consecutively towards said leaf stage from a stage nearest to said root stage in said at least one intermediate stage with said specified nodes given first by the daughter nodes of said root node and subsequently by the daughter nodes of each of said at least one selected node ;
   controlling said selecting step to select said selected node from said intermediate nodes ;
   calculating pattern similarity measures between said input pattern and said reference patterns with each pattern similarity measure calculated by using said cluster similarity measures along a path from each of said at least one selected node selected with said root node used as the mother node and along branches branched from said path to ones of said leaf nodes when said ones of leaf nodes are used as the daughter nodes of each of said at least one selected node selected ultimately in each branch from said intermediate nodes ;
and    locating said input pattern as one of said reference patterns for which an extremum of said pattern similarity measures is calculated .

EP0627726A1
CLAIM 10
A device as claimed in any one of Claims 6 to 9 , said at least one intermediate stage comprising a first plurality of intermediate stages , characterised in that said reference pattern memory means (13) comprises :    frame vector tree memorizing means (29) for storing said tree structure ;
   clustering vector memory means (31) preliminarily loaded with said set at clustering vectors ;
   cluster vector calculating means (35) for clustering said clustering vectors into a second plurality of cluster groups with clusters of said cluster groups represented by said cluster vectors , respectively , said second plurality being equal to said first plurality less one ;
and    control means (control means, calculating circuit, distance calculation) (33) for making in said frame vector tree memory means (29) the intermediate nodes of said intermediate stages and said leaf nodes represent said cluster vectors with said cluster groups successively assigned to said intermediate stages except for one of said intermediate stages that is nearest to said root stage .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector (recognition method) from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation (calculating step, control means) , and to the word identification .
EP0627726A1
CLAIM 1
A pattern recognition method (feature vector) for locating an input pattern among a plurality of reference patterns represented by a set of characteristic data , comprising the steps of :    representing said input pattern as a time sequence of input pattern feature vectors ;
   representing said set of characteristic data as a tree structure comprising a root node representing on a root stage said set , a plurality of leaf nodes representing individually said characteristic data on a leaf stage farthest from said root stage , and a plurality of intermediate nodes representing subsets of said set on at least one intermediate stage between said root and said leaf stages , said subsets and the characteristic data represented by said leaf node being used as cluster data , respectively ;
   calculating cluster similarity measures between each input pattern feature vector and specific data represented among cluster data by specified nodes specified among said intermediate and said leaf nodes on a single specified stage ;
   selecting at least one selected node among daughter nodes of a mother node , said selected node representing ones of said cluster data for which an extremum of said cluster similarity measures is calculated , said daughter nodes being on a stage next farther from said root stage than a stage of said mother node ;
   controlling said calculating step (control means, calculating circuit, distance calculation) to specify said specified stage consecutively towards said leaf stage from a stage nearest to said root stage in said at least one intermediate stage with said specified nodes given first by the daughter nodes of said root node and subsequently by the daughter nodes of each of said at least one selected node ;
   controlling said selecting step to select said selected node from said intermediate nodes ;
   calculating pattern similarity measures between said input pattern and said reference patterns with each pattern similarity measure calculated by using said cluster similarity measures along a path from each of said at least one selected node selected with said root node used as the mother node and along branches branched from said path to ones of said leaf nodes when said ones of leaf nodes are used as the daughter nodes of each of said at least one selected node selected ultimately in each branch from said intermediate nodes ;
and    locating said input pattern as one of said reference patterns for which an extremum of said pattern similarity measures is calculated .

EP0627726A1
CLAIM 10
A device as claimed in any one of Claims 6 to 9 , said at least one intermediate stage comprising a first plurality of intermediate stages , characterised in that said reference pattern memory means (13) comprises :    frame vector tree memorizing means (29) for storing said tree structure ;
   clustering vector memory means (31) preliminarily loaded with said set at clustering vectors ;
   cluster vector calculating means (35) for clustering said clustering vectors into a second plurality of cluster groups with clusters of said cluster groups represented by said cluster vectors , respectively , said second plurality being equal to said first plurality less one ;
and    control means (control means, calculating circuit, distance calculation) (33) for making in said frame vector tree memory means (29) the intermediate nodes of said intermediate stages and said leaf nodes represent said cluster vectors with said cluster groups successively assigned to said intermediate stages except for one of said intermediate stages that is nearest to said root stage .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
JPH07248790A

Filed: 1994-03-10     Issued: 1995-09-26

音声認識システム

(Original Assignee) Fujitsu Ltd; 富士通株式会社     

Ryosuke Hamazaki, 良介 濱崎, Akihiro Kimura, 晋太 木村
US7979277B2
CLAIM 1
. A speech recognition circuit (音声信号) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit (演算処理) for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
JPH07248790A
CLAIM 1
【請求項1】 入力されてくる音声信号 (speech recognition circuit) の特徴データを 抽出し、辞書データとの照合処理を行うことで、音声信 号の認識処理を実行する音声認識システムにおいて、 システムが動作する計算機の持つハードウェアの実行環 境を検出する検出手段(17)と、 上記検出手段(17)の検出結果に応じて音声認識の認識パ ラメータを決定して設定する設定手段(18)とを備え、 上記設定手段(18)の設定する認識パラメータに従って音 声信号の特徴データを抽出していくよう処理すること を、 特徴とする音声認識システム。

JPH07248790A
CLAIM 3
【請求項3】 請求項2記載の音声認識システムにおい て、 設定手段(18)は、テーブルから読み出される検出実行環 境値の指す認識パラメータ値の組み合わせに対して、規 定の演算処理 (calculating circuit) を施すことで設定対象の認識パラメータを 決定していくよう処理することを、 特徴とする音声認識システム。

US7979277B2
CLAIM 2
. A speech recognition circuit (音声信号) as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor .
JPH07248790A
CLAIM 1
【請求項1】 入力されてくる音声信号 (speech recognition circuit) の特徴データを 抽出し、辞書データとの照合処理を行うことで、音声信 号の認識処理を実行する音声認識システムにおいて、 システムが動作する計算機の持つハードウェアの実行環 境を検出する検出手段(17)と、 上記検出手段(17)の検出結果に応じて音声認識の認識パ ラメータを決定して設定する設定手段(18)とを備え、 上記設定手段(18)の設定する認識パラメータに従って音 声信号の特徴データを抽出していくよう処理すること を、 特徴とする音声認識システム。

US7979277B2
CLAIM 3
. A speech recognition circuit (音声信号) as claimed in claim 1 , comprising dynamic scheduling whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results (検出結果) and/or availability of space for storing more feature vectors and/or distance results .
JPH07248790A
CLAIM 1
【請求項1】 入力されてくる音声信号 (speech recognition circuit) の特徴データを 抽出し、辞書データとの照合処理を行うことで、音声信 号の認識処理を実行する音声認識システムにおいて、 システムが動作する計算機の持つハードウェアの実行環 境を検出する検出手段(17)と、 上記検出手段(17)の検出結果 (distance results, result memory) に応じて音声認識の認識パ ラメータを決定して設定する設定手段(18)とを備え、 上記設定手段(18)の設定する認識パラメータに従って音 声信号の特徴データを抽出していくよう処理すること を、 特徴とする音声認識システム。

US7979277B2
CLAIM 4
. A speech recognition circuit (音声信号) as claimed in claim 1 , wherein the first processor supports multi-threaded operation , and runs the search stage and front ends as separate threads .
JPH07248790A
CLAIM 1
【請求項1】 入力されてくる音声信号 (speech recognition circuit) の特徴データを 抽出し、辞書データとの照合処理を行うことで、音声信 号の認識処理を実行する音声認識システムにおいて、 システムが動作する計算機の持つハードウェアの実行環 境を検出する検出手段(17)と、 上記検出手段(17)の検出結果に応じて音声認識の認識パ ラメータを決定して設定する設定手段(18)とを備え、 上記設定手段(18)の設定する認識パラメータに従って音 声信号の特徴データを抽出していくよう処理すること を、 特徴とする音声認識システム。

US7979277B2
CLAIM 5
. A speech recognition circuit (音声信号) as claimed in claim 1 , wherein the said calculating circuit (演算処理) is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
JPH07248790A
CLAIM 1
【請求項1】 入力されてくる音声信号 (speech recognition circuit) の特徴データを 抽出し、辞書データとの照合処理を行うことで、音声信 号の認識処理を実行する音声認識システムにおいて、 システムが動作する計算機の持つハードウェアの実行環 境を検出する検出手段(17)と、 上記検出手段(17)の検出結果に応じて音声認識の認識パ ラメータを決定して設定する設定手段(18)とを備え、 上記設定手段(18)の設定する認識パラメータに従って音 声信号の特徴データを抽出していくよう処理すること を、 特徴とする音声認識システム。

JPH07248790A
CLAIM 3
【請求項3】 請求項2記載の音声認識システムにおい て、 設定手段(18)は、テーブルから読み出される検出実行環 境値の指す認識パラメータ値の組み合わせに対して、規 定の演算処理 (calculating circuit) を施すことで設定対象の認識パラメータを 決定していくよう処理することを、 特徴とする音声認識システム。

US7979277B2
CLAIM 6
. The speech recognition circuit (音声信号) of claim 1 , comprising control means adapted to implement frame dropping , to discard one or more audio time frames .
JPH07248790A
CLAIM 1
【請求項1】 入力されてくる音声信号 (speech recognition circuit) の特徴データを 抽出し、辞書データとの照合処理を行うことで、音声信 号の認識処理を実行する音声認識システムにおいて、 システムが動作する計算機の持つハードウェアの実行環 境を検出する検出手段(17)と、 上記検出手段(17)の検出結果に応じて音声認識の認識パ ラメータを決定して設定する設定手段(18)とを備え、 上記設定手段(18)の設定する認識パラメータに従って音 声信号の特徴データを抽出していくよう処理すること を、 特徴とする音声認識システム。

US7979277B2
CLAIM 7
. The speech recognition circuit (音声信号) of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal for a predetermined time frame .
JPH07248790A
CLAIM 1
【請求項1】 入力されてくる音声信号 (speech recognition circuit) の特徴データを 抽出し、辞書データとの照合処理を行うことで、音声信 号の認識処理を実行する音声認識システムにおいて、 システムが動作する計算機の持つハードウェアの実行環 境を検出する検出手段(17)と、 上記検出手段(17)の検出結果に応じて音声認識の認識パ ラメータを決定して設定する設定手段(18)とを備え、 上記設定手段(18)の設定する認識パラメータに従って音 声信号の特徴データを抽出していくよう処理すること を、 特徴とする音声認識システム。

US7979277B2
CLAIM 8
. The speech recognition circuit (音声信号) of claim 1 , wherein the processor is configured to divert to another task if the data flow stalls .
JPH07248790A
CLAIM 1
【請求項1】 入力されてくる音声信号 (speech recognition circuit) の特徴データを 抽出し、辞書データとの照合処理を行うことで、音声信 号の認識処理を実行する音声認識システムにおいて、 システムが動作する計算機の持つハードウェアの実行環 境を検出する検出手段(17)と、 上記検出手段(17)の検出結果に応じて音声認識の認識パ ラメータを決定して設定する設定手段(18)とを備え、 上記設定手段(18)の設定する認識パラメータに従って音 声信号の特徴データを抽出していくよう処理すること を、 特徴とする音声認識システム。

US7979277B2
CLAIM 9
. The speech recognition circuit (音声信号) of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature (備えること) vector from the front end .
JPH07248790A
CLAIM 1
【請求項1】 入力されてくる音声信号 (speech recognition circuit) の特徴データを 抽出し、辞書データとの照合処理を行うことで、音声信 号の認識処理を実行する音声認識システムにおいて、 システムが動作する計算機の持つハードウェアの実行環 境を検出する検出手段(17)と、 上記検出手段(17)の検出結果に応じて音声認識の認識パ ラメータを決定して設定する設定手段(18)とを備え、 上記設定手段(18)の設定する認識パラメータに従って音 声信号の特徴データを抽出していくよう処理すること を、 特徴とする音声認識システム。

JPH07248790A
CLAIM 4
【請求項4】 請求項1、2又は3記載の音声認識シス テムにおいて、 設定手段(18)の設定する認識パラメータに従って抽出さ れる音声信号の特徴データを辞書に登録する登録手段(1 3)を備えること (next feature) を、 特徴とする音声認識システム。

US7979277B2
CLAIM 10
. The speech recognition circuit (音声信号) of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory (検出結果) .
JPH07248790A
CLAIM 1
【請求項1】 入力されてくる音声信号 (speech recognition circuit) の特徴データを 抽出し、辞書データとの照合処理を行うことで、音声信 号の認識処理を実行する音声認識システムにおいて、 システムが動作する計算機の持つハードウェアの実行環 境を検出する検出手段(17)と、 上記検出手段(17)の検出結果 (distance results, result memory) に応じて音声認識の認識パ ラメータを決定して設定する設定手段(18)とを備え、 上記設定手段(18)の設定する認識パラメータに従って音 声信号の特徴データを抽出していくよう処理すること を、 特徴とする音声認識システム。

US7979277B2
CLAIM 11
. The speech recognition circuit (音声信号) of claim 1 , comprising increasing the pipeline depth by computing extra front frames in advance .
JPH07248790A
CLAIM 1
【請求項1】 入力されてくる音声信号 (speech recognition circuit) の特徴データを 抽出し、辞書データとの照合処理を行うことで、音声信 号の認識処理を実行する音声認識システムにおいて、 システムが動作する計算機の持つハードウェアの実行環 境を検出する検出手段(17)と、 上記検出手段(17)の検出結果に応じて音声認識の認識パ ラメータを決定して設定する設定手段(18)とを備え、 上記設定手段(18)の設定する認識パラメータに従って音 声信号の特徴データを抽出していくよう処理すること を、 特徴とする音声認識システム。

US7979277B2
CLAIM 12
. The speech recognition circuit (音声信号) of claim 1 , wherein the audio front end is configured to input a digital audio signal .
JPH07248790A
CLAIM 1
【請求項1】 入力されてくる音声信号 (speech recognition circuit) の特徴データを 抽出し、辞書データとの照合処理を行うことで、音声信 号の認識処理を実行する音声認識システムにおいて、 システムが動作する計算機の持つハードウェアの実行環 境を検出する検出手段(17)と、 上記検出手段(17)の検出結果に応じて音声認識の認識パ ラメータを決定して設定する設定手段(18)とを備え、 上記設定手段(18)の設定する認識パラメータに従って音 声信号の特徴データを抽出していくよう処理すること を、 特徴とする音声認識システム。

US7979277B2
CLAIM 13
. A speech recognition circuit (音声信号) of claim 1 , wherein said distance comprises a Mahalanobis distance .
JPH07248790A
CLAIM 1
【請求項1】 入力されてくる音声信号 (speech recognition circuit) の特徴データを 抽出し、辞書データとの照合処理を行うことで、音声信 号の認識処理を実行する音声認識システムにおいて、 システムが動作する計算機の持つハードウェアの実行環 境を検出する検出手段(17)と、 上記検出手段(17)の検出結果に応じて音声認識の認識パ ラメータを決定して設定する設定手段(18)とを備え、 上記設定手段(18)の設定する認識パラメータに従って音 声信号の特徴データを抽出していくよう処理すること を、 特徴とする音声認識システム。

US7979277B2
CLAIM 14
. A speech recognition circuit (音声信号) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
JPH07248790A
CLAIM 1
【請求項1】 入力されてくる音声信号 (speech recognition circuit) の特徴データを 抽出し、辞書データとの照合処理を行うことで、音声信 号の認識処理を実行する音声認識システムにおいて、 システムが動作する計算機の持つハードウェアの実行環 境を検出する検出手段(17)と、 上記検出手段(17)の検出結果に応じて音声認識の認識パ ラメータを決定して設定する設定手段(18)とを備え、 上記設定手段(18)の設定する認識パラメータに従って音 声信号の特徴データを抽出していくよう処理すること を、 特徴とする音声認識システム。

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit (演算処理) ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
JPH07248790A
CLAIM 3
【請求項3】 請求項2記載の音声認識システムにおい て、 設定手段(18)は、テーブルから読み出される検出実行環 境値の指す認識パラメータ値の組み合わせに対して、規 定の演算処理 (calculating circuit) を施すことで設定対象の認識パラメータを 決定していくよう処理することを、 特徴とする音声認識システム。




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
JPH06348292A

Filed: 1993-06-03     Issued: 1994-12-22

音声認識システム

(Original Assignee) Nec Corp; 日本電気株式会社     

Takao Watanabe, 隆夫 渡辺
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances (の距離) indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
JPH06348292A
CLAIM 1
【請求項1】 フレームベクトル群の時系列として表現 された標準パタンを用いて認識を行うシステムにおい て、入力パタンのベクトル時系列を保持する記憶手段 と、木構造表現された、標準パタンのフレームベクトル の集合を保持する記憶手段と、入力パタンの各時刻のベ クトルと指定されたノード(クラスタ)に対応するクラ スタベクトルとの距離 (distance calculation, calculating distances) を計算するクラスタ距離計算手段 と、指定されたノード(クラスタ)の子ノードのうち、 前記距離の値の小さい方から1つ以上の子ノードを選択 する子ノード選択手段と、前記クラスタ距離計算手段お よび前記子ノード選択手段を制御して、ルートノードか ら選択された子ノードを辿って順次計算されるクラスタ 距離を用いて、フレームベクトルとの距離(フレーム距 離)の群を算出するフレーム距離計算手段と、前記フレ ーム距離の群を用いて、標準パタンとのマッチングを行 う手段、とを含んで構成されることを特徴とする音声認 識システム。

US7979277B2
CLAIM 15
. A speech recognition method (フレームベクトル, システム) , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
JPH06348292A
CLAIM 1
【請求項1】 フレームベクトル (speech recognition method) 群の時系列として表現 された標準パタンを用いて認識を行うシステム (speech recognition method) におい て、入力パタンのベクトル時系列を保持する記憶手段 と、木構造表現された、標準パタンのフレームベクトル の集合を保持する記憶手段と、入力パタンの各時刻のベ クトルと指定されたノード(クラスタ)に対応するクラ スタベクトルとの距離を計算するクラスタ距離計算手段 と、指定されたノード(クラスタ)の子ノードのうち、 前記距離の値の小さい方から1つ以上の子ノードを選択 する子ノード選択手段と、前記クラスタ距離計算手段お よび前記子ノード選択手段を制御して、ルートノードか ら選択された子ノードを辿って順次計算されるクラスタ 距離を用いて、フレームベクトルとの距離(フレーム距 離)の群を算出するフレーム距離計算手段と、前記フレ ーム距離の群を用いて、標準パタンとのマッチングを行 う手段、とを含んで構成されることを特徴とする音声認 識システム

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method (フレームベクトル, システム) , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation (の距離) , and to the word identification .
JPH06348292A
CLAIM 1
【請求項1】 フレームベクトル (speech recognition method) 群の時系列として表現 された標準パタンを用いて認識を行うシステム (speech recognition method) におい て、入力パタンのベクトル時系列を保持する記憶手段 と、木構造表現された、標準パタンのフレームベクトル の集合を保持する記憶手段と、入力パタンの各時刻のベ クトルと指定されたノード(クラスタ)に対応するクラ スタベクトルとの距離 (distance calculation, calculating distances) を計算するクラスタ距離計算手段 と、指定されたノード(クラスタ)の子ノードのうち、 前記距離の値の小さい方から1つ以上の子ノードを選択 する子ノード選択手段と、前記クラスタ距離計算手段お よび前記子ノード選択手段を制御して、ルートノードか ら選択された子ノードを辿って順次計算されるクラスタ 距離を用いて、フレームベクトルの距離(フレーム距 離)の群を算出するフレーム距離計算手段と、前記フレ ーム距離の群を用いて、標準パタンとのマッチングを行 う手段、とを含んで構成されることを特徴とする音声認 識システム




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
EP1400814A2

Filed: 2003-09-17     Issued: 2004-03-24

Directional setting apparatus, directional setting system, directional setting method and directional setting program

(Original Assignee) Toshiba Corp     (Current Assignee) Toshiba Corp

Amada Tadashi, Takumi Yamamoto
US7979277B2
CLAIM 3
. A speech recognition circuit as claimed in claim 1 , comprising dynamic scheduling whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results (detection period) and/or availability of space for storing more feature vectors and/or distance results .
EP1400814A2
CLAIM 1
A directional setting apparatus , comprising : a voice recognition unit which detects a certain voice included in a sound signal outputted from a microphone array having a plurality of microphones and a directional determination period indicating a detection period (distance results) of said certain voice ;
a voice direction detector which detects occurrence direction of said certain voice in said directional determination period ;
and a directional controller which controls directivity of a prescribed apparatus based on the sound signals inputted from said plurality of microphones in said directional determination period .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means (signals output) for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
EP1400814A2
CLAIM 2
The directional setting apparatus according to claim 1 , wherein said directional controller controls the directivity of said prescribed apparatus , based on the sound signal which is generated by delaying the sound signals output (calculating means) ted from said plurality of microphones in said directional determination period with locations of said microphones and the amount of delay based on the direction of arrival of the sound signals and adding the sound signals to each other .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US20040054532A1

Filed: 2003-09-11     Issued: 2004-03-18

Method and processor system for processing of an audio signal

(Original Assignee) International Business Machines Corp     (Current Assignee) Nuance Communications Inc

Dieter Staiger
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal (audio signal) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit (control means) for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US20040054532A1
CLAIM 1
. A processor system for speech recognition processing of an audio signal (audio signal, speech accelerator) , the processor system comprising : a) front-end processing means for preprocessing of the audio signal and for generating an output of first data , b) core processing means for performing speech recognition processing on the first data , c) dual access storage means for buffering of the first data , the dual access storage means being coupled between the front-end processing means and the core processing means , d) means for invoking the core processing means for performing the speech recognition processing after a time interval following a start of the preprocessing by the front-end processing means , e) means for stopping the execution of the speech recognition processing by the core processing means when the amount of first data stored in the dual access storage means falls below a predefined threshold level , f) means for triggering the execution of the speech recognition processing by the core processing means when the dual access storage means is refilled by first data to a level equal to or above the threshold level .

US20040054532A1
CLAIM 6
. The processor system of claim 1 further comprising clock control means (calculating circuit, control means, distance calculation) for controlling a first clock signal supplied to the front-end processing means and for controlling a second clock signal supplied to the application program processing means , the clock control means being adapted to invoke the second clock signal the time interval after the first clock signal has been invoked .

US7979277B2
CLAIM 5
. A speech recognition circuit as claimed in claim 1 , wherein the said calculating circuit (control means) is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
US20040054532A1
CLAIM 6
. The processor system of claim 1 further comprising clock control means (calculating circuit, control means, distance calculation) for controlling a first clock signal supplied to the front-end processing means and for controlling a second clock signal supplied to the application program processing means , the clock control means being adapted to invoke the second clock signal the time interval after the first clock signal has been invoked .

US7979277B2
CLAIM 6
. The speech recognition circuit of claim 1 , comprising control means (control means) adapted to implement frame dropping , to discard one or more audio time frames .
US20040054532A1
CLAIM 6
. The processor system of claim 1 further comprising clock control means (calculating circuit, control means, distance calculation) for controlling a first clock signal supplied to the front-end processing means and for controlling a second clock signal supplied to the application program processing means , the clock control means being adapted to invoke the second clock signal the time interval after the first clock signal has been invoked .

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal (audio signal) for a predetermined time frame .
US20040054532A1
CLAIM 1
. A processor system for speech recognition processing of an audio signal (audio signal, speech accelerator) , the processor system comprising : a) front-end processing means for preprocessing of the audio signal and for generating an output of first data , b) core processing means for performing speech recognition processing on the first data , c) dual access storage means for buffering of the first data , the dual access storage means being coupled between the front-end processing means and the core processing means , d) means for invoking the core processing means for performing the speech recognition processing after a time interval following a start of the preprocessing by the front-end processing means , e) means for stopping the execution of the speech recognition processing by the core processing means when the amount of first data stored in the dual access storage means falls below a predefined threshold level , f) means for triggering the execution of the speech recognition processing by the core processing means when the dual access storage means is refilled by first data to a level equal to or above the threshold level .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator (audio signal) has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
US20040054532A1
CLAIM 1
. A processor system for speech recognition processing of an audio signal (audio signal, speech accelerator) , the processor system comprising : a) front-end processing means for preprocessing of the audio signal and for generating an output of first data , b) core processing means for performing speech recognition processing on the first data , c) dual access storage means for buffering of the first data , the dual access storage means being coupled between the front-end processing means and the core processing means , d) means for invoking the core processing means for performing the speech recognition processing after a time interval following a start of the preprocessing by the front-end processing means , e) means for stopping the execution of the speech recognition processing by the core processing means when the amount of first data stored in the dual access storage means falls below a predefined threshold level , f) means for triggering the execution of the speech recognition processing by the core processing means when the dual access storage means is refilled by first data to a level equal to or above the threshold level .

US7979277B2
CLAIM 10
. The speech recognition circuit of claim 1 , wherein the accelerator signals (control signal) to the search stage when the distances for a new frame are available in a result memory .
US20040054532A1
CLAIM 11
. The processor system of claim 9 further comprising clock control means for controlling a first clock signal supplied to the core processing means and for controlling of a second control signal (accelerator signals) supplied to the back-end processing means , the clock control means being adapted to invoke the second clock signal the time interval after the first clock signal .

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end is configured to input a digital audio signal (audio signal) .
US20040054532A1
CLAIM 1
. A processor system for speech recognition processing of an audio signal (audio signal, speech accelerator) , the processor system comprising : a) front-end processing means for preprocessing of the audio signal and for generating an output of first data , b) core processing means for performing speech recognition processing on the first data , c) dual access storage means for buffering of the first data , the dual access storage means being coupled between the front-end processing means and the core processing means , d) means for invoking the core processing means for performing the speech recognition processing after a time interval following a start of the preprocessing by the front-end processing means , e) means for stopping the execution of the speech recognition processing by the core processing means when the amount of first data stored in the dual access storage means falls below a predefined threshold level , f) means for triggering the execution of the speech recognition processing by the core processing means when the dual access storage means is refilled by first data to a level equal to or above the threshold level .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal (audio signal) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US20040054532A1
CLAIM 1
. A processor system for speech recognition processing of an audio signal (audio signal, speech accelerator) , the processor system comprising : a) front-end processing means for preprocessing of the audio signal and for generating an output of first data , b) core processing means for performing speech recognition processing on the first data , c) dual access storage means for buffering of the first data , the dual access storage means being coupled between the front-end processing means and the core processing means , d) means for invoking the core processing means for performing the speech recognition processing after a time interval following a start of the preprocessing by the front-end processing means , e) means for stopping the execution of the speech recognition processing by the core processing means when the amount of first data stored in the dual access storage means falls below a predefined threshold level , f) means for triggering the execution of the speech recognition processing by the core processing means when the dual access storage means is refilled by first data to a level equal to or above the threshold level .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal (audio signal) using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit (control means) ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US20040054532A1
CLAIM 1
. A processor system for speech recognition processing of an audio signal (audio signal, speech accelerator) , the processor system comprising : a) front-end processing means for preprocessing of the audio signal and for generating an output of first data , b) core processing means for performing speech recognition processing on the first data , c) dual access storage means for buffering of the first data , the dual access storage means being coupled between the front-end processing means and the core processing means , d) means for invoking the core processing means for performing the speech recognition processing after a time interval following a start of the preprocessing by the front-end processing means , e) means for stopping the execution of the speech recognition processing by the core processing means when the amount of first data stored in the dual access storage means falls below a predefined threshold level , f) means for triggering the execution of the speech recognition processing by the core processing means when the dual access storage means is refilled by first data to a level equal to or above the threshold level .

US20040054532A1
CLAIM 6
. The processor system of claim 1 further comprising clock control means (calculating circuit, control means, distance calculation) for controlling a first clock signal supplied to the front-end processing means and for controlling a second clock signal supplied to the application program processing means , the clock control means being adapted to invoke the second clock signal the time interval after the first clock signal has been invoked .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal (audio signal) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation (control means) , and to the word identification .
US20040054532A1
CLAIM 1
. A processor system for speech recognition processing of an audio signal (audio signal, speech accelerator) , the processor system comprising : a) front-end processing means for preprocessing of the audio signal and for generating an output of first data , b) core processing means for performing speech recognition processing on the first data , c) dual access storage means for buffering of the first data , the dual access storage means being coupled between the front-end processing means and the core processing means , d) means for invoking the core processing means for performing the speech recognition processing after a time interval following a start of the preprocessing by the front-end processing means , e) means for stopping the execution of the speech recognition processing by the core processing means when the amount of first data stored in the dual access storage means falls below a predefined threshold level , f) means for triggering the execution of the speech recognition processing by the core processing means when the dual access storage means is refilled by first data to a level equal to or above the threshold level .

US20040054532A1
CLAIM 6
. The processor system of claim 1 further comprising clock control means (calculating circuit, control means, distance calculation) for controlling a first clock signal supplied to the front-end processing means and for controlling a second clock signal supplied to the application program processing means , the clock control means being adapted to invoke the second clock signal the time interval after the first clock signal has been invoked .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US20040117189A1

Filed: 2003-08-29     Issued: 2004-06-17

Query engine for processing voice based queries including semantic decoding

(Original Assignee) Bennett Ian M.     (Current Assignee) Nuance Communications Inc

Ian Bennett
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US20040117189A1
CLAIM 12
. The system of claim 11 , wherein said client generates an amount of speech data (audio time frame, time frame) that is optimized to reduce recognition latencies .

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal for a predetermined time frame (speech data) .
US20040117189A1
CLAIM 12
. The system of claim 11 , wherein said client generates an amount of speech data (audio time frame, time frame) that is optimized to reduce recognition latencies .

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end is configured to input a digital audio (server architecture) signal .
US20040117189A1
CLAIM 11
. The system of claim 1 , wherein said speech recognition is distributed across a client-server architecture (digital audio, digital audio signal) .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US20040117189A1
CLAIM 12
. The system of claim 11 , wherein said client generates an amount of speech data (audio time frame, time frame) that is optimized to reduce recognition latencies .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US20040117189A1
CLAIM 12
. The system of claim 11 , wherein said client generates an amount of speech data (audio time frame, time frame) that is optimized to reduce recognition latencies .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
US20040117189A1
CLAIM 12
. The system of claim 11 , wherein said client generates an amount of speech data (audio time frame, time frame) that is optimized to reduce recognition latencies .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US20040083092A1

Filed: 2003-08-08     Issued: 2004-04-29

Apparatus and methods for developing conversational applications

(Original Assignee) Valles Luis Calixto     (Current Assignee) Gyrus Logic Inc

Luis Valles
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances (calculated distance) indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US20040083092A1
CLAIM 24
: The method of claim 23 , further comprising the step of calculating a weight representing a correspondence between the relation between two words in said vector of objects to the relation between two nodes in said tree , by multiplying the pre-calculated distance (calculating distances) between the two semantic objects times said score whereby the semantic objects , in said vector of objects , with the highest and lowest offset positions provide the boundaries of the answer within the text , and such positions are used to retrieve the answer and to store it in an answer object .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation (second program) , and to the word identification .
US20040083092A1
CLAIM 6
: The apparatus of claim 1 , wherein said program logic responsive to said natural language phrases input , further comprises : a . first program object code for parsing said natural language phrase with a user grammar that contains production rules for sentences specific to a transactional application b . second program (distance calculation) object code for parsing said natural language phrase with a predefined universal grammar that contains a comprehensive set of syntactical rules of a specific natural language and are independent of any application c . third program object code for converting said natural language phrase into its semantic representation d . forth program object code for determining : whether said semantic representation is to be interpreted as a precise transaction and , whether said semantic representation is to be interpreted as a precise query  whenever the first program object code successfully parses the natural language phrase with said user grammar e . fifth program object code for determining whether said semantic representation is to be interpreted as a fuzzy query and cause a speculative search , whenever the first program object code is not successful and the second program object code is successful .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
CN1474379A

Filed: 2003-07-02     Issued: 2004-02-11

语音识别/响应系统、语音/识别响应程序及其记录介质

(Original Assignee) Pioneer Corp     (Current Assignee) Pioneer Corp

小林载, 彦, 市原直彦, 智, 小田川智
US7979277B2
CLAIM 1
. A speech recognition circuit (语音识别, 识别用户, 识别单元) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
CN1474379A
CLAIM 1
. 一种语音识别 (speech recognition circuit) /响应系统,包括:发言识别单元 (speech recognition circuit) (10),通过用户的语音输入识别用户 (speech recognition circuit) 的发言内容,并且输出识别结果;对话控制处理单元(40),根据所述识别结果控制同用户的对话进程,以便确定针对所述用户的响应内容;发言特征分析单元(20),分析所述用户的发言特征以便产生发言特征信息;和响应语音产生单元(30),根据所述响应内容和所述发言特征信息产生针对所述用户的响应语音。

US7979277B2
CLAIM 2
. A speech recognition circuit (语音识别, 识别用户, 识别单元) as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor .
CN1474379A
CLAIM 1
. 一种语音识别 (speech recognition circuit) /响应系统,包括:发言识别单元 (speech recognition circuit) (10),通过用户的语音输入识别用户 (speech recognition circuit) 的发言内容,并且输出识别结果;对话控制处理单元(40),根据所述识别结果控制同用户的对话进程,以便确定针对所述用户的响应内容;发言特征分析单元(20),分析所述用户的发言特征以便产生发言特征信息;和响应语音产生单元(30),根据所述响应内容和所述发言特征信息产生针对所述用户的响应语音。

US7979277B2
CLAIM 3
. A speech recognition circuit (语音识别, 识别用户, 识别单元) as claimed in claim 1 , comprising dynamic scheduling whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
CN1474379A
CLAIM 1
. 一种语音识别 (speech recognition circuit) /响应系统,包括:发言识别单元 (speech recognition circuit) (10),通过用户的语音输入识别用户 (speech recognition circuit) 的发言内容,并且输出识别结果;对话控制处理单元(40),根据所述识别结果控制同用户的对话进程,以便确定针对所述用户的响应内容;发言特征分析单元(20),分析所述用户的发言特征以便产生发言特征信息;和响应语音产生单元(30),根据所述响应内容和所述发言特征信息产生针对所述用户的响应语音。

US7979277B2
CLAIM 4
. A speech recognition circuit (语音识别, 识别用户, 识别单元) as claimed in claim 1 , wherein the first processor supports multi-threaded operation , and runs the search stage and front ends as separate threads .
CN1474379A
CLAIM 1
. 一种语音识别 (speech recognition circuit) /响应系统,包括:发言识别单元 (speech recognition circuit) (10),通过用户的语音输入识别用户 (speech recognition circuit) 的发言内容,并且输出识别结果;对话控制处理单元(40),根据所述识别结果控制同用户的对话进程,以便确定针对所述用户的响应内容;发言特征分析单元(20),分析所述用户的发言特征以便产生发言特征信息;和响应语音产生单元(30),根据所述响应内容和所述发言特征信息产生针对所述用户的响应语音。

US7979277B2
CLAIM 5
. A speech recognition circuit (语音识别, 识别用户, 识别单元) as claimed in claim 1 , wherein the said calculating circuit is configured to autonomously calculate distances for every acoustic state (通过将) defined by the acoustic model .
CN1474379A
CLAIM 1
. 一种语音识别 (speech recognition circuit) /响应系统,包括:发言识别单元 (speech recognition circuit) (10),通过用户的语音输入识别用户 (speech recognition circuit) 的发言内容,并且输出识别结果;对话控制处理单元(40),根据所述识别结果控制同用户的对话进程,以便确定针对所述用户的响应内容;发言特征分析单元(20),分析所述用户的发言特征以便产生发言特征信息;和响应语音产生单元(30),根据所述响应内容和所述发言特征信息产生针对所述用户的响应语音。

CN1474379A
CLAIM 2
. 根据权利要求1的系统,其中:所述发言特征信息包括多个发言特征类别,所述发言特征类别通过将 (acoustic state) 用户的发言特征分类为多个组获得,所述发言特征分析单元(20)根据所述识别结果从所述多个发言特征类别中选择发言特征类别,以便输出所述发言特征类别。

US7979277B2
CLAIM 6
. The speech recognition circuit (语音识别, 识别用户, 识别单元) of claim 1 , comprising control means adapted to implement frame dropping , to discard one or more audio time frames .
CN1474379A
CLAIM 1
. 一种语音识别 (speech recognition circuit) /响应系统,包括:发言识别单元 (speech recognition circuit) (10),通过用户的语音输入识别用户 (speech recognition circuit) 的发言内容,并且输出识别结果;对话控制处理单元(40),根据所述识别结果控制同用户的对话进程,以便确定针对所述用户的响应内容;发言特征分析单元(20),分析所述用户的发言特征以便产生发言特征信息;和响应语音产生单元(30),根据所述响应内容和所述发言特征信息产生针对所述用户的响应语音。

US7979277B2
CLAIM 7
. The speech recognition circuit (语音识别, 识别用户, 识别单元) of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal for a predetermined time frame .
CN1474379A
CLAIM 1
. 一种语音识别 (speech recognition circuit) /响应系统,包括:发言识别单元 (speech recognition circuit) (10),通过用户的语音输入识别用户 (speech recognition circuit) 的发言内容,并且输出识别结果;对话控制处理单元(40),根据所述识别结果控制同用户的对话进程,以便确定针对所述用户的响应内容;发言特征分析单元(20),分析所述用户的发言特征以便产生发言特征信息;和响应语音产生单元(30),根据所述响应内容和所述发言特征信息产生针对所述用户的响应语音。

US7979277B2
CLAIM 8
. The speech recognition circuit (语音识别, 识别用户, 识别单元) of claim 1 , wherein the processor is configured to divert to another task if the data flow stalls .
CN1474379A
CLAIM 1
. 一种语音识别 (speech recognition circuit) /响应系统,包括:发言识别单元 (speech recognition circuit) (10),通过用户的语音输入识别用户 (speech recognition circuit) 的发言内容,并且输出识别结果;对话控制处理单元(40),根据所述识别结果控制同用户的对话进程,以便确定针对所述用户的响应内容;发言特征分析单元(20),分析所述用户的发言特征以便产生发言特征信息;和响应语音产生单元(30),根据所述响应内容和所述发言特征信息产生针对所述用户的响应语音。

US7979277B2
CLAIM 9
. The speech recognition circuit (语音识别, 识别用户, 识别单元) of claim 1 , wherein the speech accelerator (语音输入) has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
CN1474379A
CLAIM 1
. 一种语音识别 (speech recognition circuit) /响应系统,包括:发言识别单元 (speech recognition circuit) (10),通过用户的语音输入 (speech accelerator) 识别用户 (speech recognition circuit) 的发言内容,并且输出识别结果;对话控制处理单元(40),根据所述识别结果控制同用户的对话进程,以便确定针对所述用户的响应内容;发言特征分析单元(20),分析所述用户的发言特征以便产生发言特征信息;和响应语音产生单元(30),根据所述响应内容和所述发言特征信息产生针对所述用户的响应语音。

US7979277B2
CLAIM 10
. The speech recognition circuit (语音识别, 识别用户, 识别单元) of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory .
CN1474379A
CLAIM 1
. 一种语音识别 (speech recognition circuit) /响应系统,包括:发言识别单元 (speech recognition circuit) (10),通过用户的语音输入识别用户 (speech recognition circuit) 的发言内容,并且输出识别结果;对话控制处理单元(40),根据所述识别结果控制同用户的对话进程,以便确定针对所述用户的响应内容;发言特征分析单元(20),分析所述用户的发言特征以便产生发言特征信息;和响应语音产生单元(30),根据所述响应内容和所述发言特征信息产生针对所述用户的响应语音。

US7979277B2
CLAIM 11
. The speech recognition circuit (语音识别, 识别用户, 识别单元) of claim 1 , comprising increasing the pipeline depth by computing extra front frames in advance .
CN1474379A
CLAIM 1
. 一种语音识别 (speech recognition circuit) /响应系统,包括:发言识别单元 (speech recognition circuit) (10),通过用户的语音输入识别用户 (speech recognition circuit) 的发言内容,并且输出识别结果;对话控制处理单元(40),根据所述识别结果控制同用户的对话进程,以便确定针对所述用户的响应内容;发言特征分析单元(20),分析所述用户的发言特征以便产生发言特征信息;和响应语音产生单元(30),根据所述响应内容和所述发言特征信息产生针对所述用户的响应语音。

US7979277B2
CLAIM 12
. The speech recognition circuit (语音识别, 识别用户, 识别单元) of claim 1 , wherein the audio front end is configured to input a digital audio signal .
CN1474379A
CLAIM 1
. 一种语音识别 (speech recognition circuit) /响应系统,包括:发言识别单元 (speech recognition circuit) (10),通过用户的语音输入识别用户 (speech recognition circuit) 的发言内容,并且输出识别结果;对话控制处理单元(40),根据所述识别结果控制同用户的对话进程,以便确定针对所述用户的响应内容;发言特征分析单元(20),分析所述用户的发言特征以便产生发言特征信息;和响应语音产生单元(30),根据所述响应内容和所述发言特征信息产生针对所述用户的响应语音。

US7979277B2
CLAIM 13
. A speech recognition circuit (语音识别, 识别用户, 识别单元) of claim 1 , wherein said distance comprises a Mahalanobis distance .
CN1474379A
CLAIM 1
. 一种语音识别 (speech recognition circuit) /响应系统,包括:发言识别单元 (speech recognition circuit) (10),通过用户的语音输入识别用户 (speech recognition circuit) 的发言内容,并且输出识别结果;对话控制处理单元(40),根据所述识别结果控制同用户的对话进程,以便确定针对所述用户的响应内容;发言特征分析单元(20),分析所述用户的发言特征以便产生发言特征信息;和响应语音产生单元(30),根据所述响应内容和所述发言特征信息产生针对所述用户的响应语音。

US7979277B2
CLAIM 14
. A speech recognition circuit (语音识别, 识别用户, 识别单元) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state (通过将) of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
CN1474379A
CLAIM 1
. 一种语音识别 (speech recognition circuit) /响应系统,包括:发言识别单元 (speech recognition circuit) (10),通过用户的语音输入识别用户 (speech recognition circuit) 的发言内容,并且输出识别结果;对话控制处理单元(40),根据所述识别结果控制同用户的对话进程,以便确定针对所述用户的响应内容;发言特征分析单元(20),分析所述用户的发言特征以便产生发言特征信息;和响应语音产生单元(30),根据所述响应内容和所述发言特征信息产生针对所述用户的响应语音。

CN1474379A
CLAIM 2
. 根据权利要求1的系统,其中:所述发言特征信息包括多个发言特征类别,所述发言特征类别通过将 (acoustic state) 用户的发言特征分类为多个组获得,所述发言特征分析单元(20)根据所述识别结果从所述多个发言特征类别中选择发言特征类别,以便输出所述发言特征类别。

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state (通过将) of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
CN1474379A
CLAIM 2
. 根据权利要求1的系统,其中:所述发言特征信息包括多个发言特征类别,所述发言特征类别通过将 (acoustic state) 用户的发言特征分类为多个组获得,所述发言特征分析单元(20)根据所述识别结果从所述多个发言特征类别中选择发言特征类别,以便输出所述发言特征类别。

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state (通过将) of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
CN1474379A
CLAIM 2
. 根据权利要求1的系统,其中:所述发言特征信息包括多个发言特征类别,所述发言特征类别通过将 (acoustic state) 用户的发言特征分类为多个组获得,所述发言特征分析单元(20)根据所述识别结果从所述多个发言特征类别中选择发言特征类别,以便输出所述发言特征类别。




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US20040148170A1

Filed: 2003-05-30     Issued: 2004-07-29

Statistical classifiers for spoken language understanding and command/control scenarios

(Original Assignee) Microsoft Corp     (Current Assignee) Microsoft Technology Licensing LLC

Alejandro Acero, Ciprian Chelba, YeYi Wang, Leon Wong, Ravi Shahani, Michael Calcagno, Domenic Cipollone, Curtis Huttenhower
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector (feature vector) from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US20040148170A1
CLAIM 1
. A text classifier in a natural language interface that receives a natural language user input , the text classifier comprising : a feature extractor extracting a feature vector (feature vector) from a textual input indicative of the natural language user input ;
a statistical classifier coupled to the feature extractor outputting a class identifier identifying a target class associated with the textual input based on the feature vector .

US7979277B2
CLAIM 4
. A speech recognition circuit as claimed in claim 1 , wherein the first processor supports multi-threaded operation , and runs the search stage and front ends (search service) as separate threads .
US20040148170A1
CLAIM 74
. A computer-implemented method of processing textual input , comprising : performing statistical classification on the textual input to obtain a target class associated with the textual input ;
and forwarding the textual input to a search service (front ends) if the target class identified relates to the textual input comprising a search query .

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector (feature vector) comprises a plurality of spectral components of an audio signal for a predetermined time frame .
US20040148170A1
CLAIM 1
. A text classifier in a natural language interface that receives a natural language user input , the text classifier comprising : a feature extractor extracting a feature vector (feature vector) from a textual input indicative of the natural language user input ;
a statistical classifier coupled to the feature extractor outputting a class identifier identifying a target class associated with the textual input based on the feature vector .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator (statistical language) has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector (feature vector) from the front end .
US20040148170A1
CLAIM 1
. A text classifier in a natural language interface that receives a natural language user input , the text classifier comprising : a feature extractor extracting a feature vector (feature vector) from a textual input indicative of the natural language user input ;
a statistical classifier coupled to the feature extractor outputting a class identifier identifying a target class associated with the textual input based on the feature vector .

US20040148170A1
CLAIM 17
. The text classifier of claim 1 wherein the statistical classifier comprises a plurality of class-specific statistical language (speech accelerator) models .

US7979277B2
CLAIM 10
. The speech recognition circuit of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory (speech signal) .
US20040148170A1
CLAIM 19
. The text classifier of claim 1 and further comprising : a speech recognizer receiving a speech signal (result memory) indicative of the natural language input and providing the textual input .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector (feature vector) from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US20040148170A1
CLAIM 1
. A text classifier in a natural language interface that receives a natural language user input , the text classifier comprising : a feature extractor extracting a feature vector (feature vector) from a textual input indicative of the natural language user input ;
a statistical classifier coupled to the feature extractor outputting a class identifier identifying a target class associated with the textual input based on the feature vector .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector (feature vector) from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US20040148170A1
CLAIM 1
. A text classifier in a natural language interface that receives a natural language user input , the text classifier comprising : a feature extractor extracting a feature vector (feature vector) from a textual input indicative of the natural language user input ;
a statistical classifier coupled to the feature extractor outputting a class identifier identifying a target class associated with the textual input based on the feature vector .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector (feature vector) from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
US20040148170A1
CLAIM 1
. A text classifier in a natural language interface that receives a natural language user input , the text classifier comprising : a feature extractor extracting a feature vector (feature vector) from a textual input indicative of the natural language user input ;
a statistical classifier coupled to the feature extractor outputting a class identifier identifying a target class associated with the textual input based on the feature vector .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US20040078202A1

Filed: 2003-02-14     Issued: 2004-04-22

Speech input communication system, user terminal and center system

(Original Assignee) Sharp Corp     (Current Assignee) Sharp Corp

Shin Kamiya
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit (control means) for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US20040078202A1
CLAIM 4
. The voice-input communication system as defined in claim 3 , wherein at least a final-stage center system is provided with output control means (calculating circuit, control means, distance calculation) for outputting an instruction content recognized by the voice instruction recognition processing means .

US7979277B2
CLAIM 5
. A speech recognition circuit as claimed in claim 1 , wherein the said calculating circuit (control means) is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
US20040078202A1
CLAIM 4
. The voice-input communication system as defined in claim 3 , wherein at least a final-stage center system is provided with output control means (calculating circuit, control means, distance calculation) for outputting an instruction content recognized by the voice instruction recognition processing means .

US7979277B2
CLAIM 6
. The speech recognition circuit of claim 1 , comprising control means (control means) adapted to implement frame dropping , to discard one or more audio time frames .
US20040078202A1
CLAIM 4
. The voice-input communication system as defined in claim 3 , wherein at least a final-stage center system is provided with output control means (calculating circuit, control means, distance calculation) for outputting an instruction content recognized by the voice instruction recognition processing means .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator (voice recognition) has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
US20040078202A1
CLAIM 5
. The voice-input communication system as defined in claim 2 , wherein either the user terminal or the user-side system in the user system is provided with voice recognition (speech accelerator) means as the partial voice instruction recognition processing means for recognizing an inputted voice and outputting an interim recognition result , and provided with transmission control means for transmitting the interim recognition result to the center system through the first communication line .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit (control means) ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US20040078202A1
CLAIM 4
. The voice-input communication system as defined in claim 3 , wherein at least a final-stage center system is provided with output control means (calculating circuit, control means, distance calculation) for outputting an instruction content recognized by the voice instruction recognition processing means .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation (control means) , and to the word identification .
US20040078202A1
CLAIM 4
. The voice-input communication system as defined in claim 3 , wherein at least a final-stage center system is provided with output control means (calculating circuit, control means, distance calculation) for outputting an instruction content recognized by the voice instruction recognition processing means .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
JP2004234273A

Filed: 2003-01-30     Issued: 2004-08-19

対話型端末装置及び対話アプリケーション提供方法

(Original Assignee) Hitachi Ltd; 株式会社日立製作所     

Toshihiro Kujirai, 俊宏 鯨井
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (ワーク) (ワーク) ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
JP2004234273A
CLAIM 4
外部サーバとネットワーク (audio time frame, time frame) を介して接続される通信部をさらに有し、 上記制御部は、上記通信部を介して上記アプリケーションを取得した際に該アプリケーションに含まれるグローバルコマンドを上記記録部に読み込むことを特徴とする請求項1乃至3の何れかに記載の端末装置。

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal for a predetermined time frame (ワーク) .
JP2004234273A
CLAIM 4
外部サーバとネットワーク (audio time frame, time frame) を介して接続される通信部をさらに有し、 上記制御部は、上記通信部を介して上記アプリケーションを取得した際に該アプリケーションに含まれるグローバルコマンドを上記記録部に読み込むことを特徴とする請求項1乃至3の何れかに記載の端末装置。

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature (基づき) vector from the front end .
JP2004234273A
CLAIM 3
入力部をさらに有し、 上記制御部は、上記入力部を介した入力に基づき (next feature) 上記グローバルコマンドを追加、削除又は変更の何れかを行うことを特徴とする請求項1又は2に記載の端末装置。

US7979277B2
CLAIM 10
. The speech recognition circuit of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory (認識結果) .
JP2004234273A
CLAIM 1
複数の音声対話アプリケーションを制御する制御部と、 音声入力部と、 上記音声入力部を介した入力を音声認識する音声認識エンジンと、 対話中のアプリケーション及び該対話中のアプリケーション以外の上記複数のアプリケーションとの対話を可能とするグローバルコマンドを記録する記録部とを有し、 上記制御部は、上記複数の各アプリケーションの動作状態毎に対応づけて上記グローバルコマンドを管理し、上記音声認識結果 (result memory) が上記グローバルコマンドである場合には該グローバルコマンドに対応づけられた処理を実行することを特徴とする端末装置。

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (ワーク) (ワーク) ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
JP2004234273A
CLAIM 4
外部サーバとネットワーク (audio time frame, time frame) を介して接続される通信部をさらに有し、 上記制御部は、上記通信部を介して上記アプリケーションを取得した際に該アプリケーションに含まれるグローバルコマンドを上記記録部に読み込むことを特徴とする請求項1乃至3の何れかに記載の端末装置。

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (ワーク) (ワーク) ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
JP2004234273A
CLAIM 4
外部サーバとネットワーク (audio time frame, time frame) を介して接続される通信部をさらに有し、 上記制御部は、上記通信部を介して上記アプリケーションを取得した際に該アプリケーションに含まれるグローバルコマンドを上記記録部に読み込むことを特徴とする請求項1乃至3の何れかに記載の端末装置。

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (ワーク) (ワーク) ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
JP2004234273A
CLAIM 4
外部サーバとネットワーク (audio time frame, time frame) を介して接続される通信部をさらに有し、 上記制御部は、上記通信部を介して上記アプリケーションを取得した際に該アプリケーションに含まれるグローバルコマンドを上記記録部に読み込むことを特徴とする請求項1乃至3の何れかに記載の端末装置。




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
JP2004226881A

Filed: 2003-01-27     Issued: 2004-08-12

会話システム及び会話処理プログラム

(Original Assignee) Casio Comput Co Ltd; カシオ計算機株式会社     

Takashi Matsuda, 隆 松田
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector (前記発) from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor (メモリ) , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
JP2004226881A
CLAIM 6
認識結果情報の中に否定または肯定の意味を持つ特定の語句がある場合に、その語句の音量変化を調べて否定または肯定の意味であるかを判定する第2の判定手段をさらに備え、 前記発 (feature vector) 言作成手段は、前記第2の判定手段の判定結果に応じて当該ユーザの発言に対する返事としての発言を作成することを特徴とする請求項4記載の会話システム。

JP2004226881A
CLAIM 17
ユーザとの間で会話を行うコンピュータに用いられる会話処理プログラムであって、 前記コンピュータに、 会話相手であるユーザの発言を入力する機能と、 前記入力されたユーザの発言を音声認識する機能と、 ユーザの発言に含まれる各語句の音量情報またはピッチ情報を抽出する機能と、 前記抽出された各語句の音量情報またはピッチ情報に基づいて、前記音声認識によって得られた認識結果情報の中の音量的あるいはピッチ的に特徴のある語句をその特徴に応じて加工する機能と、 前記加工後の語句を含む認識結果情報を発言日時と共に過去の発言情報としてメモリ (first processor) に記憶する機能と、 前記メモリに記憶された過去の発言情報を利用してユーザの発言に対する返事としての発言を作成する機能と、 前記作成された発言を出力する機能と を実現させるための会話処理プログラム。

US7979277B2
CLAIM 2
. A speech recognition circuit as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor (メモリ) .
JP2004226881A
CLAIM 17
ユーザとの間で会話を行うコンピュータに用いられる会話処理プログラムであって、 前記コンピュータに、 会話相手であるユーザの発言を入力する機能と、 前記入力されたユーザの発言を音声認識する機能と、 ユーザの発言に含まれる各語句の音量情報またはピッチ情報を抽出する機能と、 前記抽出された各語句の音量情報またはピッチ情報に基づいて、前記音声認識によって得られた認識結果情報の中の音量的あるいはピッチ的に特徴のある語句をその特徴に応じて加工する機能と、 前記加工後の語句を含む認識結果情報を発言日時と共に過去の発言情報としてメモリ (first processor) に記憶する機能と、 前記メモリに記憶された過去の発言情報を利用してユーザの発言に対する返事としての発言を作成する機能と、 前記作成された発言を出力する機能と を実現させるための会話処理プログラム。

US7979277B2
CLAIM 3
. A speech recognition circuit as claimed in claim 1 , comprising dynamic scheduling whether the first processor (メモリ) should run the front end or search stage code , based on availability or unavailability of distance results (判定結果) and/or availability of space for storing more feature vectors and/or distance results .
JP2004226881A
CLAIM 4
ユーザとの間で会話を行う会話システムであって、 会話相手であるユーザの発言を入力する入力手段と、 この入力手段から入力されたユーザの発言を音声認識する音声認識手段と、 ユーザの発言に含まれる各語句のピッチ情報を抽出するピッチ抽出手段と、 このピッチ抽出手段によって抽出された各語句のピッチ情報に基づいて、前記音声認識手段によって得られた認識結果情報の中に疑問の意味が含まれているか否かを判定する第1の判定手段と、 この第1の判定手段の判定結果 (distance results, result memory) に応じてユーザの発言に対する返事としての発言を作成する発言作成手段と、 この発言作成手段によって作成された発言を出力する出力手段と を具備したことを特徴とする会話システム。

JP2004226881A
CLAIM 17
ユーザとの間で会話を行うコンピュータに用いられる会話処理プログラムであって、 前記コンピュータに、 会話相手であるユーザの発言を入力する機能と、 前記入力されたユーザの発言を音声認識する機能と、 ユーザの発言に含まれる各語句の音量情報またはピッチ情報を抽出する機能と、 前記抽出された各語句の音量情報またはピッチ情報に基づいて、前記音声認識によって得られた認識結果情報の中の音量的あるいはピッチ的に特徴のある語句をその特徴に応じて加工する機能と、 前記加工後の語句を含む認識結果情報を発言日時と共に過去の発言情報としてメモリ (first processor) に記憶する機能と、 前記メモリに記憶された過去の発言情報を利用してユーザの発言に対する返事としての発言を作成する機能と、 前記作成された発言を出力する機能と を実現させるための会話処理プログラム。

US7979277B2
CLAIM 4
. A speech recognition circuit as claimed in claim 1 , wherein the first processor (メモリ) supports multi-threaded operation , and runs the search stage and front ends as separate threads .
JP2004226881A
CLAIM 17
ユーザとの間で会話を行うコンピュータに用いられる会話処理プログラムであって、 前記コンピュータに、 会話相手であるユーザの発言を入力する機能と、 前記入力されたユーザの発言を音声認識する機能と、 ユーザの発言に含まれる各語句の音量情報またはピッチ情報を抽出する機能と、 前記抽出された各語句の音量情報またはピッチ情報に基づいて、前記音声認識によって得られた認識結果情報の中の音量的あるいはピッチ的に特徴のある語句をその特徴に応じて加工する機能と、 前記加工後の語句を含む認識結果情報を発言日時と共に過去の発言情報としてメモリ (first processor) に記憶する機能と、 前記メモリに記憶された過去の発言情報を利用してユーザの発言に対する返事としての発言を作成する機能と、 前記作成された発言を出力する機能と を実現させるための会話処理プログラム。

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector (前記発) comprises a plurality of spectral components of an audio signal for a predetermined time frame .
JP2004226881A
CLAIM 6
認識結果情報の中に否定または肯定の意味を持つ特定の語句がある場合に、その語句の音量変化を調べて否定または肯定の意味であるかを判定する第2の判定手段をさらに備え、 前記発 (feature vector) 言作成手段は、前記第2の判定手段の判定結果に応じて当該ユーザの発言に対する返事としての発言を作成することを特徴とする請求項4記載の会話システム。

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector (前記発) from the front end .
JP2004226881A
CLAIM 6
認識結果情報の中に否定または肯定の意味を持つ特定の語句がある場合に、その語句の音量変化を調べて否定または肯定の意味であるかを判定する第2の判定手段をさらに備え、 前記発 (feature vector) 言作成手段は、前記第2の判定手段の判定結果に応じて当該ユーザの発言に対する返事としての発言を作成することを特徴とする請求項4記載の会話システム。

US7979277B2
CLAIM 10
. The speech recognition circuit of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory (判定結果) .
JP2004226881A
CLAIM 4
ユーザとの間で会話を行う会話システムであって、 会話相手であるユーザの発言を入力する入力手段と、 この入力手段から入力されたユーザの発言を音声認識する音声認識手段と、 ユーザの発言に含まれる各語句のピッチ情報を抽出するピッチ抽出手段と、 このピッチ抽出手段によって抽出された各語句のピッチ情報に基づいて、前記音声認識手段によって得られた認識結果情報の中に疑問の意味が含まれているか否かを判定する第1の判定手段と、 この第1の判定手段の判定結果 (distance results, result memory) に応じてユーザの発言に対する返事としての発言を作成する発言作成手段と、 この発言作成手段によって作成された発言を出力する出力手段と を具備したことを特徴とする会話システム。

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector (前記発) from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
JP2004226881A
CLAIM 6
認識結果情報の中に否定または肯定の意味を持つ特定の語句がある場合に、その語句の音量変化を調べて否定または肯定の意味であるかを判定する第2の判定手段をさらに備え、 前記発 (feature vector) 言作成手段は、前記第2の判定手段の判定結果に応じて当該ユーザの発言に対する返事としての発言を作成することを特徴とする請求項4記載の会話システム。

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector (前記発) from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
JP2004226881A
CLAIM 6
認識結果情報の中に否定または肯定の意味を持つ特定の語句がある場合に、その語句の音量変化を調べて否定または肯定の意味であるかを判定する第2の判定手段をさらに備え、 前記発 (feature vector) 言作成手段は、前記第2の判定手段の判定結果に応じて当該ユーザの発言に対する返事としての発言を作成することを特徴とする請求項4記載の会話システム。

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector (前記発) from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
JP2004226881A
CLAIM 6
認識結果情報の中に否定または肯定の意味を持つ特定の語句がある場合に、その語句の音量変化を調べて否定または肯定の意味であるかを判定する第2の判定手段をさらに備え、 前記発 (feature vector) 言作成手段は、前記第2の判定手段の判定結果に応じて当該ユーザの発言に対する返事としての発言を作成することを特徴とする請求項4記載の会話システム。




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US20040128137A1

Filed: 2003-01-27     Issued: 2004-07-01

Hands-free, voice-operated remote control transmitter

(Original Assignee) Bush William Stuart; Roura Carlos Ferdinand     

William Bush, Carlos Roura
US7979277B2
CLAIM 1
. A speech recognition circuit (speech recognition circuit) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US20040128137A1
CLAIM 5
. A voice actuated control system for controlling one or more appliances , the control system comprising : a microphone for receiving audio signals and converting said audio signals to electrical signals ;
an audio circuit for receiving said electrical signals from said microphone and coupling said microphone to a sound activation circuit in one or more modes including a sleep mode and a speech recognition mode ;
a sound activation circuit coupled to said microphone in a sleep mode configured to detect when electrical signals from said microphone exceed a predetermined threshold and generate a wake-up signal to said speech recognition circuit (speech recognition circuit) , causing circuit to switch to said speech recognition circuit , and ;
a speech recognition circuit having one or more modes of operation including a speech recognition mode for generating first control signals when the electrical signals from said microphone represent one or more predetermined command signals or alternatively generating a sleep signal to cause said audio circuit to return to sleep mode after a predetermined time period when no command signals are detected ;
and an appliance control circuit for receiving said first control signals from said speech recognition circuit and generating second control signals to cause the appliance to perform the desired command .

US7979277B2
CLAIM 2
. A speech recognition circuit (speech recognition circuit) as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor .
US20040128137A1
CLAIM 5
. A voice actuated control system for controlling one or more appliances , the control system comprising : a microphone for receiving audio signals and converting said audio signals to electrical signals ;
an audio circuit for receiving said electrical signals from said microphone and coupling said microphone to a sound activation circuit in one or more modes including a sleep mode and a speech recognition mode ;
a sound activation circuit coupled to said microphone in a sleep mode configured to detect when electrical signals from said microphone exceed a predetermined threshold and generate a wake-up signal to said speech recognition circuit (speech recognition circuit) , causing circuit to switch to said speech recognition circuit , and ;
a speech recognition circuit having one or more modes of operation including a speech recognition mode for generating first control signals when the electrical signals from said microphone represent one or more predetermined command signals or alternatively generating a sleep signal to cause said audio circuit to return to sleep mode after a predetermined time period when no command signals are detected ;
and an appliance control circuit for receiving said first control signals from said speech recognition circuit and generating second control signals to cause the appliance to perform the desired command .

US7979277B2
CLAIM 3
. A speech recognition circuit (speech recognition circuit) as claimed in claim 1 , comprising dynamic scheduling whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
US20040128137A1
CLAIM 5
. A voice actuated control system for controlling one or more appliances , the control system comprising : a microphone for receiving audio signals and converting said audio signals to electrical signals ;
an audio circuit for receiving said electrical signals from said microphone and coupling said microphone to a sound activation circuit in one or more modes including a sleep mode and a speech recognition mode ;
a sound activation circuit coupled to said microphone in a sleep mode configured to detect when electrical signals from said microphone exceed a predetermined threshold and generate a wake-up signal to said speech recognition circuit (speech recognition circuit) , causing circuit to switch to said speech recognition circuit , and ;
a speech recognition circuit having one or more modes of operation including a speech recognition mode for generating first control signals when the electrical signals from said microphone represent one or more predetermined command signals or alternatively generating a sleep signal to cause said audio circuit to return to sleep mode after a predetermined time period when no command signals are detected ;
and an appliance control circuit for receiving said first control signals from said speech recognition circuit and generating second control signals to cause the appliance to perform the desired command .

US7979277B2
CLAIM 4
. A speech recognition circuit (speech recognition circuit) as claimed in claim 1 , wherein the first processor supports multi-threaded operation , and runs the search stage and front ends as separate threads .
US20040128137A1
CLAIM 5
. A voice actuated control system for controlling one or more appliances , the control system comprising : a microphone for receiving audio signals and converting said audio signals to electrical signals ;
an audio circuit for receiving said electrical signals from said microphone and coupling said microphone to a sound activation circuit in one or more modes including a sleep mode and a speech recognition mode ;
a sound activation circuit coupled to said microphone in a sleep mode configured to detect when electrical signals from said microphone exceed a predetermined threshold and generate a wake-up signal to said speech recognition circuit (speech recognition circuit) , causing circuit to switch to said speech recognition circuit , and ;
a speech recognition circuit having one or more modes of operation including a speech recognition mode for generating first control signals when the electrical signals from said microphone represent one or more predetermined command signals or alternatively generating a sleep signal to cause said audio circuit to return to sleep mode after a predetermined time period when no command signals are detected ;
and an appliance control circuit for receiving said first control signals from said speech recognition circuit and generating second control signals to cause the appliance to perform the desired command .

US7979277B2
CLAIM 5
. A speech recognition circuit (speech recognition circuit) as claimed in claim 1 , wherein the said calculating circuit is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
US20040128137A1
CLAIM 5
. A voice actuated control system for controlling one or more appliances , the control system comprising : a microphone for receiving audio signals and converting said audio signals to electrical signals ;
an audio circuit for receiving said electrical signals from said microphone and coupling said microphone to a sound activation circuit in one or more modes including a sleep mode and a speech recognition mode ;
a sound activation circuit coupled to said microphone in a sleep mode configured to detect when electrical signals from said microphone exceed a predetermined threshold and generate a wake-up signal to said speech recognition circuit (speech recognition circuit) , causing circuit to switch to said speech recognition circuit , and ;
a speech recognition circuit having one or more modes of operation including a speech recognition mode for generating first control signals when the electrical signals from said microphone represent one or more predetermined command signals or alternatively generating a sleep signal to cause said audio circuit to return to sleep mode after a predetermined time period when no command signals are detected ;
and an appliance control circuit for receiving said first control signals from said speech recognition circuit and generating second control signals to cause the appliance to perform the desired command .

US7979277B2
CLAIM 6
. The speech recognition circuit (speech recognition circuit) of claim 1 , comprising control means adapted to implement frame dropping , to discard one or more audio time frames .
US20040128137A1
CLAIM 5
. A voice actuated control system for controlling one or more appliances , the control system comprising : a microphone for receiving audio signals and converting said audio signals to electrical signals ;
an audio circuit for receiving said electrical signals from said microphone and coupling said microphone to a sound activation circuit in one or more modes including a sleep mode and a speech recognition mode ;
a sound activation circuit coupled to said microphone in a sleep mode configured to detect when electrical signals from said microphone exceed a predetermined threshold and generate a wake-up signal to said speech recognition circuit (speech recognition circuit) , causing circuit to switch to said speech recognition circuit , and ;
a speech recognition circuit having one or more modes of operation including a speech recognition mode for generating first control signals when the electrical signals from said microphone represent one or more predetermined command signals or alternatively generating a sleep signal to cause said audio circuit to return to sleep mode after a predetermined time period when no command signals are detected ;
and an appliance control circuit for receiving said first control signals from said speech recognition circuit and generating second control signals to cause the appliance to perform the desired command .

US7979277B2
CLAIM 7
. The speech recognition circuit (speech recognition circuit) of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal for a predetermined time frame .
US20040128137A1
CLAIM 5
. A voice actuated control system for controlling one or more appliances , the control system comprising : a microphone for receiving audio signals and converting said audio signals to electrical signals ;
an audio circuit for receiving said electrical signals from said microphone and coupling said microphone to a sound activation circuit in one or more modes including a sleep mode and a speech recognition mode ;
a sound activation circuit coupled to said microphone in a sleep mode configured to detect when electrical signals from said microphone exceed a predetermined threshold and generate a wake-up signal to said speech recognition circuit (speech recognition circuit) , causing circuit to switch to said speech recognition circuit , and ;
a speech recognition circuit having one or more modes of operation including a speech recognition mode for generating first control signals when the electrical signals from said microphone represent one or more predetermined command signals or alternatively generating a sleep signal to cause said audio circuit to return to sleep mode after a predetermined time period when no command signals are detected ;
and an appliance control circuit for receiving said first control signals from said speech recognition circuit and generating second control signals to cause the appliance to perform the desired command .

US7979277B2
CLAIM 8
. The speech recognition circuit (speech recognition circuit) of claim 1 , wherein the processor is configured to divert to another task if the data flow stalls .
US20040128137A1
CLAIM 5
. A voice actuated control system for controlling one or more appliances , the control system comprising : a microphone for receiving audio signals and converting said audio signals to electrical signals ;
an audio circuit for receiving said electrical signals from said microphone and coupling said microphone to a sound activation circuit in one or more modes including a sleep mode and a speech recognition mode ;
a sound activation circuit coupled to said microphone in a sleep mode configured to detect when electrical signals from said microphone exceed a predetermined threshold and generate a wake-up signal to said speech recognition circuit (speech recognition circuit) , causing circuit to switch to said speech recognition circuit , and ;
a speech recognition circuit having one or more modes of operation including a speech recognition mode for generating first control signals when the electrical signals from said microphone represent one or more predetermined command signals or alternatively generating a sleep signal to cause said audio circuit to return to sleep mode after a predetermined time period when no command signals are detected ;
and an appliance control circuit for receiving said first control signals from said speech recognition circuit and generating second control signals to cause the appliance to perform the desired command .

US7979277B2
CLAIM 9
. The speech recognition circuit (speech recognition circuit) of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
US20040128137A1
CLAIM 5
. A voice actuated control system for controlling one or more appliances , the control system comprising : a microphone for receiving audio signals and converting said audio signals to electrical signals ;
an audio circuit for receiving said electrical signals from said microphone and coupling said microphone to a sound activation circuit in one or more modes including a sleep mode and a speech recognition mode ;
a sound activation circuit coupled to said microphone in a sleep mode configured to detect when electrical signals from said microphone exceed a predetermined threshold and generate a wake-up signal to said speech recognition circuit (speech recognition circuit) , causing circuit to switch to said speech recognition circuit , and ;
a speech recognition circuit having one or more modes of operation including a speech recognition mode for generating first control signals when the electrical signals from said microphone represent one or more predetermined command signals or alternatively generating a sleep signal to cause said audio circuit to return to sleep mode after a predetermined time period when no command signals are detected ;
and an appliance control circuit for receiving said first control signals from said speech recognition circuit and generating second control signals to cause the appliance to perform the desired command .

US7979277B2
CLAIM 10
. The speech recognition circuit (speech recognition circuit) of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory .
US20040128137A1
CLAIM 5
. A voice actuated control system for controlling one or more appliances , the control system comprising : a microphone for receiving audio signals and converting said audio signals to electrical signals ;
an audio circuit for receiving said electrical signals from said microphone and coupling said microphone to a sound activation circuit in one or more modes including a sleep mode and a speech recognition mode ;
a sound activation circuit coupled to said microphone in a sleep mode configured to detect when electrical signals from said microphone exceed a predetermined threshold and generate a wake-up signal to said speech recognition circuit (speech recognition circuit) , causing circuit to switch to said speech recognition circuit , and ;
a speech recognition circuit having one or more modes of operation including a speech recognition mode for generating first control signals when the electrical signals from said microphone represent one or more predetermined command signals or alternatively generating a sleep signal to cause said audio circuit to return to sleep mode after a predetermined time period when no command signals are detected ;
and an appliance control circuit for receiving said first control signals from said speech recognition circuit and generating second control signals to cause the appliance to perform the desired command .

US7979277B2
CLAIM 11
. The speech recognition circuit (speech recognition circuit) of claim 1 , comprising increasing the pipeline depth by computing extra front frames in advance .
US20040128137A1
CLAIM 5
. A voice actuated control system for controlling one or more appliances , the control system comprising : a microphone for receiving audio signals and converting said audio signals to electrical signals ;
an audio circuit for receiving said electrical signals from said microphone and coupling said microphone to a sound activation circuit in one or more modes including a sleep mode and a speech recognition mode ;
a sound activation circuit coupled to said microphone in a sleep mode configured to detect when electrical signals from said microphone exceed a predetermined threshold and generate a wake-up signal to said speech recognition circuit (speech recognition circuit) , causing circuit to switch to said speech recognition circuit , and ;
a speech recognition circuit having one or more modes of operation including a speech recognition mode for generating first control signals when the electrical signals from said microphone represent one or more predetermined command signals or alternatively generating a sleep signal to cause said audio circuit to return to sleep mode after a predetermined time period when no command signals are detected ;
and an appliance control circuit for receiving said first control signals from said speech recognition circuit and generating second control signals to cause the appliance to perform the desired command .

US7979277B2
CLAIM 12
. The speech recognition circuit (speech recognition circuit) of claim 1 , wherein the audio front end is configured to input a digital audio (voice signals) signal .
US20040128137A1
CLAIM 5
. A voice actuated control system for controlling one or more appliances , the control system comprising : a microphone for receiving audio signals and converting said audio signals to electrical signals ;
an audio circuit for receiving said electrical signals from said microphone and coupling said microphone to a sound activation circuit in one or more modes including a sleep mode and a speech recognition mode ;
a sound activation circuit coupled to said microphone in a sleep mode configured to detect when electrical signals from said microphone exceed a predetermined threshold and generate a wake-up signal to said speech recognition circuit (speech recognition circuit) , causing circuit to switch to said speech recognition circuit , and ;
a speech recognition circuit having one or more modes of operation including a speech recognition mode for generating first control signals when the electrical signals from said microphone represent one or more predetermined command signals or alternatively generating a sleep signal to cause said audio circuit to return to sleep mode after a predetermined time period when no command signals are detected ;
and an appliance control circuit for receiving said first control signals from said speech recognition circuit and generating second control signals to cause the appliance to perform the desired command .

US20040128137A1
CLAIM 9
. A voice-activated control system for controlling one or more selected appliances , the control system comprising : a voice recognition system for receiving representative voice signals (digital audio, digital audio signal) and converting said representative voice signals to one or more command signals for controlling one or more selected appliances , the voice recognition system including a plurality of linked speech recognition vocabulary sets .

US7979277B2
CLAIM 13
. A speech recognition circuit (speech recognition circuit) of claim 1 , wherein said distance comprises a Mahalanobis distance .
US20040128137A1
CLAIM 5
. A voice actuated control system for controlling one or more appliances , the control system comprising : a microphone for receiving audio signals and converting said audio signals to electrical signals ;
an audio circuit for receiving said electrical signals from said microphone and coupling said microphone to a sound activation circuit in one or more modes including a sleep mode and a speech recognition mode ;
a sound activation circuit coupled to said microphone in a sleep mode configured to detect when electrical signals from said microphone exceed a predetermined threshold and generate a wake-up signal to said speech recognition circuit (speech recognition circuit) , causing circuit to switch to said speech recognition circuit , and ;
a speech recognition circuit having one or more modes of operation including a speech recognition mode for generating first control signals when the electrical signals from said microphone represent one or more predetermined command signals or alternatively generating a sleep signal to cause said audio circuit to return to sleep mode after a predetermined time period when no command signals are detected ;
and an appliance control circuit for receiving said first control signals from said speech recognition circuit and generating second control signals to cause the appliance to perform the desired command .

US7979277B2
CLAIM 14
. A speech recognition circuit (speech recognition circuit) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US20040128137A1
CLAIM 5
. A voice actuated control system for controlling one or more appliances , the control system comprising : a microphone for receiving audio signals and converting said audio signals to electrical signals ;
an audio circuit for receiving said electrical signals from said microphone and coupling said microphone to a sound activation circuit in one or more modes including a sleep mode and a speech recognition mode ;
a sound activation circuit coupled to said microphone in a sleep mode configured to detect when electrical signals from said microphone exceed a predetermined threshold and generate a wake-up signal to said speech recognition circuit (speech recognition circuit) , causing circuit to switch to said speech recognition circuit , and ;
a speech recognition circuit having one or more modes of operation including a speech recognition mode for generating first control signals when the electrical signals from said microphone represent one or more predetermined command signals or alternatively generating a sleep signal to cause said audio circuit to return to sleep mode after a predetermined time period when no command signals are detected ;
and an appliance control circuit for receiving said first control signals from said speech recognition circuit and generating second control signals to cause the appliance to perform the desired command .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation (control unit) , and to the word identification .
US20040128137A1
CLAIM 1
. A voice actuated control system for controlling appliances comprising : a microphone for receiving audio signals ;
a speech recognition system for receiving audio signals ;
said speech recognition system having a low power sound activation mode for detecting the presence of audio signals and a speech recognition mode for decoding said audio signals and generating control signals for controlling one or more appliances , said speech recognition system including a circuit for switching between said sound activation mode and a speech recognition mode as a function of the amplitude of said audio signals ;
and an appliance control circuit which includes a transmitter , said appliance control unit (distance calculation) for receiving said control signals from said speech recognition system and generating and transmitting an appliance control signal to a selected appliance .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US20040138890A1

Filed: 2003-01-09     Issued: 2004-07-15

Voice browser dialog enabler for a communication system

(Original Assignee) Motorola Inc     (Current Assignee) Google Technology Holdings LLC

James Ferrans, Jonathan Engelsma, Michael Pearce, Mark Randolph, Jerome Vogedes
US7979277B2
CLAIM 1
. A speech recognition circuit (recognizing speech) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US20040138890A1
CLAIM 13
. A method for enabling dialog with a voice browser for a communication system , the method comprising the steps of : providing a voice browser driver resident on a communication device and a voice browser implementation containing a plurality speech grammars resident on a remote voice server ;
running a speech recognition application comprising a plurality of units of application interaction , wherein each unit has associated voice dialog forms defining fragments ;
defining identifiers associated with each fragment ;
supplying the fragments to the voice browser implementation ;
focusing on a field in one of the units of application interaction ;
sending a speech recognition request including the identifier of the form associated with the focused field from the voice browser driver to the voice browser implementation ;
inputting and recognizing speech (speech recognition circuit, word identification) ;
matching the speech to the acceptable speech grammar associated with the identifier ;
and obtaining speech recognition results .

US7979277B2
CLAIM 2
. A speech recognition circuit (recognizing speech) as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor .
US20040138890A1
CLAIM 13
. A method for enabling dialog with a voice browser for a communication system , the method comprising the steps of : providing a voice browser driver resident on a communication device and a voice browser implementation containing a plurality speech grammars resident on a remote voice server ;
running a speech recognition application comprising a plurality of units of application interaction , wherein each unit has associated voice dialog forms defining fragments ;
defining identifiers associated with each fragment ;
supplying the fragments to the voice browser implementation ;
focusing on a field in one of the units of application interaction ;
sending a speech recognition request including the identifier of the form associated with the focused field from the voice browser driver to the voice browser implementation ;
inputting and recognizing speech (speech recognition circuit, word identification) ;
matching the speech to the acceptable speech grammar associated with the identifier ;
and obtaining speech recognition results .

US7979277B2
CLAIM 3
. A speech recognition circuit (recognizing speech) as claimed in claim 1 , comprising dynamic scheduling whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
US20040138890A1
CLAIM 13
. A method for enabling dialog with a voice browser for a communication system , the method comprising the steps of : providing a voice browser driver resident on a communication device and a voice browser implementation containing a plurality speech grammars resident on a remote voice server ;
running a speech recognition application comprising a plurality of units of application interaction , wherein each unit has associated voice dialog forms defining fragments ;
defining identifiers associated with each fragment ;
supplying the fragments to the voice browser implementation ;
focusing on a field in one of the units of application interaction ;
sending a speech recognition request including the identifier of the form associated with the focused field from the voice browser driver to the voice browser implementation ;
inputting and recognizing speech (speech recognition circuit, word identification) ;
matching the speech to the acceptable speech grammar associated with the identifier ;
and obtaining speech recognition results .

US7979277B2
CLAIM 4
. A speech recognition circuit (recognizing speech) as claimed in claim 1 , wherein the first processor supports multi-threaded operation , and runs the search stage and front ends as separate threads .
US20040138890A1
CLAIM 13
. A method for enabling dialog with a voice browser for a communication system , the method comprising the steps of : providing a voice browser driver resident on a communication device and a voice browser implementation containing a plurality speech grammars resident on a remote voice server ;
running a speech recognition application comprising a plurality of units of application interaction , wherein each unit has associated voice dialog forms defining fragments ;
defining identifiers associated with each fragment ;
supplying the fragments to the voice browser implementation ;
focusing on a field in one of the units of application interaction ;
sending a speech recognition request including the identifier of the form associated with the focused field from the voice browser driver to the voice browser implementation ;
inputting and recognizing speech (speech recognition circuit, word identification) ;
matching the speech to the acceptable speech grammar associated with the identifier ;
and obtaining speech recognition results .

US7979277B2
CLAIM 5
. A speech recognition circuit (recognizing speech) as claimed in claim 1 , wherein the said calculating circuit is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
US20040138890A1
CLAIM 13
. A method for enabling dialog with a voice browser for a communication system , the method comprising the steps of : providing a voice browser driver resident on a communication device and a voice browser implementation containing a plurality speech grammars resident on a remote voice server ;
running a speech recognition application comprising a plurality of units of application interaction , wherein each unit has associated voice dialog forms defining fragments ;
defining identifiers associated with each fragment ;
supplying the fragments to the voice browser implementation ;
focusing on a field in one of the units of application interaction ;
sending a speech recognition request including the identifier of the form associated with the focused field from the voice browser driver to the voice browser implementation ;
inputting and recognizing speech (speech recognition circuit, word identification) ;
matching the speech to the acceptable speech grammar associated with the identifier ;
and obtaining speech recognition results .

US7979277B2
CLAIM 6
. The speech recognition circuit (recognizing speech) of claim 1 , comprising control means adapted to implement frame dropping , to discard one or more audio time frames .
US20040138890A1
CLAIM 13
. A method for enabling dialog with a voice browser for a communication system , the method comprising the steps of : providing a voice browser driver resident on a communication device and a voice browser implementation containing a plurality speech grammars resident on a remote voice server ;
running a speech recognition application comprising a plurality of units of application interaction , wherein each unit has associated voice dialog forms defining fragments ;
defining identifiers associated with each fragment ;
supplying the fragments to the voice browser implementation ;
focusing on a field in one of the units of application interaction ;
sending a speech recognition request including the identifier of the form associated with the focused field from the voice browser driver to the voice browser implementation ;
inputting and recognizing speech (speech recognition circuit, word identification) ;
matching the speech to the acceptable speech grammar associated with the identifier ;
and obtaining speech recognition results .

US7979277B2
CLAIM 7
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal for a predetermined time frame .
US20040138890A1
CLAIM 13
. A method for enabling dialog with a voice browser for a communication system , the method comprising the steps of : providing a voice browser driver resident on a communication device and a voice browser implementation containing a plurality speech grammars resident on a remote voice server ;
running a speech recognition application comprising a plurality of units of application interaction , wherein each unit has associated voice dialog forms defining fragments ;
defining identifiers associated with each fragment ;
supplying the fragments to the voice browser implementation ;
focusing on a field in one of the units of application interaction ;
sending a speech recognition request including the identifier of the form associated with the focused field from the voice browser driver to the voice browser implementation ;
inputting and recognizing speech (speech recognition circuit, word identification) ;
matching the speech to the acceptable speech grammar associated with the identifier ;
and obtaining speech recognition results .

US7979277B2
CLAIM 8
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the processor is configured to divert to another task if the data flow stalls .
US20040138890A1
CLAIM 13
. A method for enabling dialog with a voice browser for a communication system , the method comprising the steps of : providing a voice browser driver resident on a communication device and a voice browser implementation containing a plurality speech grammars resident on a remote voice server ;
running a speech recognition application comprising a plurality of units of application interaction , wherein each unit has associated voice dialog forms defining fragments ;
defining identifiers associated with each fragment ;
supplying the fragments to the voice browser implementation ;
focusing on a field in one of the units of application interaction ;
sending a speech recognition request including the identifier of the form associated with the focused field from the voice browser driver to the voice browser implementation ;
inputting and recognizing speech (speech recognition circuit, word identification) ;
matching the speech to the acceptable speech grammar associated with the identifier ;
and obtaining speech recognition results .

US7979277B2
CLAIM 9
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
US20040138890A1
CLAIM 13
. A method for enabling dialog with a voice browser for a communication system , the method comprising the steps of : providing a voice browser driver resident on a communication device and a voice browser implementation containing a plurality speech grammars resident on a remote voice server ;
running a speech recognition application comprising a plurality of units of application interaction , wherein each unit has associated voice dialog forms defining fragments ;
defining identifiers associated with each fragment ;
supplying the fragments to the voice browser implementation ;
focusing on a field in one of the units of application interaction ;
sending a speech recognition request including the identifier of the form associated with the focused field from the voice browser driver to the voice browser implementation ;
inputting and recognizing speech (speech recognition circuit, word identification) ;
matching the speech to the acceptable speech grammar associated with the identifier ;
and obtaining speech recognition results .

US7979277B2
CLAIM 10
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory .
US20040138890A1
CLAIM 13
. A method for enabling dialog with a voice browser for a communication system , the method comprising the steps of : providing a voice browser driver resident on a communication device and a voice browser implementation containing a plurality speech grammars resident on a remote voice server ;
running a speech recognition application comprising a plurality of units of application interaction , wherein each unit has associated voice dialog forms defining fragments ;
defining identifiers associated with each fragment ;
supplying the fragments to the voice browser implementation ;
focusing on a field in one of the units of application interaction ;
sending a speech recognition request including the identifier of the form associated with the focused field from the voice browser driver to the voice browser implementation ;
inputting and recognizing speech (speech recognition circuit, word identification) ;
matching the speech to the acceptable speech grammar associated with the identifier ;
and obtaining speech recognition results .

US7979277B2
CLAIM 11
. The speech recognition circuit (recognizing speech) of claim 1 , comprising increasing the pipeline depth by computing extra front frames in advance .
US20040138890A1
CLAIM 13
. A method for enabling dialog with a voice browser for a communication system , the method comprising the steps of : providing a voice browser driver resident on a communication device and a voice browser implementation containing a plurality speech grammars resident on a remote voice server ;
running a speech recognition application comprising a plurality of units of application interaction , wherein each unit has associated voice dialog forms defining fragments ;
defining identifiers associated with each fragment ;
supplying the fragments to the voice browser implementation ;
focusing on a field in one of the units of application interaction ;
sending a speech recognition request including the identifier of the form associated with the focused field from the voice browser driver to the voice browser implementation ;
inputting and recognizing speech (speech recognition circuit, word identification) ;
matching the speech to the acceptable speech grammar associated with the identifier ;
and obtaining speech recognition results .

US7979277B2
CLAIM 12
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the audio front end is configured to input a digital audio (subsequent input) signal .
US20040138890A1
CLAIM 1
. A voice browser dialog enabler for a communication system , the browser enabler comprising : a speech recognition application comprising a plurality of units of application interaction , wherein each unit has associated voice dialog forms defining fragments ;
a voice browser driver , the voice browser driver resident on a communication device ;
the voice browser driver providing the fragments from the application and generating identifiers that identify the fragments ;
and a voice browser implementation resident on a remote voice server , the voice browser implementation receiving the fragments from the voice browser driver and downloading a plurality of speech grammars , wherein subsequent input (digital audio, digital audio signal) speech is matched against those speech grammars associated with the corresponding identifiers received in a speech recognition request from the voice browser driver .

US20040138890A1
CLAIM 13
. A method for enabling dialog with a voice browser for a communication system , the method comprising the steps of : providing a voice browser driver resident on a communication device and a voice browser implementation containing a plurality speech grammars resident on a remote voice server ;
running a speech recognition application comprising a plurality of units of application interaction , wherein each unit has associated voice dialog forms defining fragments ;
defining identifiers associated with each fragment ;
supplying the fragments to the voice browser implementation ;
focusing on a field in one of the units of application interaction ;
sending a speech recognition request including the identifier of the form associated with the focused field from the voice browser driver to the voice browser implementation ;
inputting and recognizing speech (speech recognition circuit, word identification) ;
matching the speech to the acceptable speech grammar associated with the identifier ;
and obtaining speech recognition results .

US7979277B2
CLAIM 13
. A speech recognition circuit (recognizing speech) of claim 1 , wherein said distance comprises a Mahalanobis distance .
US20040138890A1
CLAIM 13
. A method for enabling dialog with a voice browser for a communication system , the method comprising the steps of : providing a voice browser driver resident on a communication device and a voice browser implementation containing a plurality speech grammars resident on a remote voice server ;
running a speech recognition application comprising a plurality of units of application interaction , wherein each unit has associated voice dialog forms defining fragments ;
defining identifiers associated with each fragment ;
supplying the fragments to the voice browser implementation ;
focusing on a field in one of the units of application interaction ;
sending a speech recognition request including the identifier of the form associated with the focused field from the voice browser driver to the voice browser implementation ;
inputting and recognizing speech (speech recognition circuit, word identification) ;
matching the speech to the acceptable speech grammar associated with the identifier ;
and obtaining speech recognition results .

US7979277B2
CLAIM 14
. A speech recognition circuit (recognizing speech) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US20040138890A1
CLAIM 13
. A method for enabling dialog with a voice browser for a communication system , the method comprising the steps of : providing a voice browser driver resident on a communication device and a voice browser implementation containing a plurality speech grammars resident on a remote voice server ;
running a speech recognition application comprising a plurality of units of application interaction , wherein each unit has associated voice dialog forms defining fragments ;
defining identifiers associated with each fragment ;
supplying the fragments to the voice browser implementation ;
focusing on a field in one of the units of application interaction ;
sending a speech recognition request including the identifier of the form associated with the focused field from the voice browser driver to the voice browser implementation ;
inputting and recognizing speech (speech recognition circuit, word identification) ;
matching the speech to the acceptable speech grammar associated with the identifier ;
and obtaining speech recognition results .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification (recognizing speech) .
US20040138890A1
CLAIM 13
. A method for enabling dialog with a voice browser for a communication system , the method comprising the steps of : providing a voice browser driver resident on a communication device and a voice browser implementation containing a plurality speech grammars resident on a remote voice server ;
running a speech recognition application comprising a plurality of units of application interaction , wherein each unit has associated voice dialog forms defining fragments ;
defining identifiers associated with each fragment ;
supplying the fragments to the voice browser implementation ;
focusing on a field in one of the units of application interaction ;
sending a speech recognition request including the identifier of the form associated with the focused field from the voice browser driver to the voice browser implementation ;
inputting and recognizing speech (speech recognition circuit, word identification) ;
matching the speech to the acceptable speech grammar associated with the identifier ;
and obtaining speech recognition results .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
JP2004212641A

Filed: 2002-12-27     Issued: 2004-07-29

音声入力システム及び音声入力システムを備えた端末装置

(Original Assignee) Toshiba Corp; 株式会社東芝     

Masahide Arisei, 政秀 蟻生
US7979277B2
CLAIM 1
. A speech recognition circuit (音声信号) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
JP2004212641A
CLAIM 1
音声信号 (speech recognition circuit) を受信する受信手段と、 前記音声信号に対して信号処理を施す信号処理手段と、 時間に関連づけられた環境情報を記憶する記憶手段と、 時間を計測する時間計測手段と、 前記計測された時間から関連する環境情報を前記記憶手段から取り出し、当該環境情報に基づいて前記信号処理手段を制御する制御手段と、 を備えたことを特徴とする音声入力システム。

US7979277B2
CLAIM 2
. A speech recognition circuit (音声信号) as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor .
JP2004212641A
CLAIM 1
音声信号 (speech recognition circuit) を受信する受信手段と、 前記音声信号に対して信号処理を施す信号処理手段と、 時間に関連づけられた環境情報を記憶する記憶手段と、 時間を計測する時間計測手段と、 前記計測された時間から関連する環境情報を前記記憶手段から取り出し、当該環境情報に基づいて前記信号処理手段を制御する制御手段と、 を備えたことを特徴とする音声入力システム。

US7979277B2
CLAIM 3
. A speech recognition circuit (音声信号) as claimed in claim 1 , comprising dynamic scheduling whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
JP2004212641A
CLAIM 1
音声信号 (speech recognition circuit) を受信する受信手段と、 前記音声信号に対して信号処理を施す信号処理手段と、 時間に関連づけられた環境情報を記憶する記憶手段と、 時間を計測する時間計測手段と、 前記計測された時間から関連する環境情報を前記記憶手段から取り出し、当該環境情報に基づいて前記信号処理手段を制御する制御手段と、 を備えたことを特徴とする音声入力システム。

US7979277B2
CLAIM 4
. A speech recognition circuit (音声信号) as claimed in claim 1 , wherein the first processor supports multi-threaded operation , and runs the search stage and front ends as separate threads .
JP2004212641A
CLAIM 1
音声信号 (speech recognition circuit) を受信する受信手段と、 前記音声信号に対して信号処理を施す信号処理手段と、 時間に関連づけられた環境情報を記憶する記憶手段と、 時間を計測する時間計測手段と、 前記計測された時間から関連する環境情報を前記記憶手段から取り出し、当該環境情報に基づいて前記信号処理手段を制御する制御手段と、 を備えたことを特徴とする音声入力システム。

US7979277B2
CLAIM 5
. A speech recognition circuit (音声信号) as claimed in claim 1 , wherein the said calculating circuit is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
JP2004212641A
CLAIM 1
音声信号 (speech recognition circuit) を受信する受信手段と、 前記音声信号に対して信号処理を施す信号処理手段と、 時間に関連づけられた環境情報を記憶する記憶手段と、 時間を計測する時間計測手段と、 前記計測された時間から関連する環境情報を前記記憶手段から取り出し、当該環境情報に基づいて前記信号処理手段を制御する制御手段と、 を備えたことを特徴とする音声入力システム。

US7979277B2
CLAIM 6
. The speech recognition circuit (音声信号) of claim 1 , comprising control means adapted to implement frame dropping , to discard one or more audio time frames .
JP2004212641A
CLAIM 1
音声信号 (speech recognition circuit) を受信する受信手段と、 前記音声信号に対して信号処理を施す信号処理手段と、 時間に関連づけられた環境情報を記憶する記憶手段と、 時間を計測する時間計測手段と、 前記計測された時間から関連する環境情報を前記記憶手段から取り出し、当該環境情報に基づいて前記信号処理手段を制御する制御手段と、 を備えたことを特徴とする音声入力システム。

US7979277B2
CLAIM 7
. The speech recognition circuit (音声信号) of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal for a predetermined time frame .
JP2004212641A
CLAIM 1
音声信号 (speech recognition circuit) を受信する受信手段と、 前記音声信号に対して信号処理を施す信号処理手段と、 時間に関連づけられた環境情報を記憶する記憶手段と、 時間を計測する時間計測手段と、 前記計測された時間から関連する環境情報を前記記憶手段から取り出し、当該環境情報に基づいて前記信号処理手段を制御する制御手段と、 を備えたことを特徴とする音声入力システム。

US7979277B2
CLAIM 8
. The speech recognition circuit (音声信号) of claim 1 , wherein the processor is configured to divert to another task if the data flow stalls .
JP2004212641A
CLAIM 1
音声信号 (speech recognition circuit) を受信する受信手段と、 前記音声信号に対して信号処理を施す信号処理手段と、 時間に関連づけられた環境情報を記憶する記憶手段と、 時間を計測する時間計測手段と、 前記計測された時間から関連する環境情報を前記記憶手段から取り出し、当該環境情報に基づいて前記信号処理手段を制御する制御手段と、 を備えたことを特徴とする音声入力システム。

US7979277B2
CLAIM 9
. The speech recognition circuit (音声信号) of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
JP2004212641A
CLAIM 1
音声信号 (speech recognition circuit) を受信する受信手段と、 前記音声信号に対して信号処理を施す信号処理手段と、 時間に関連づけられた環境情報を記憶する記憶手段と、 時間を計測する時間計測手段と、 前記計測された時間から関連する環境情報を前記記憶手段から取り出し、当該環境情報に基づいて前記信号処理手段を制御する制御手段と、 を備えたことを特徴とする音声入力システム。

US7979277B2
CLAIM 10
. The speech recognition circuit (音声信号) of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory (の結果) .
JP2004212641A
CLAIM 1
音声信号 (speech recognition circuit) を受信する受信手段と、 前記音声信号に対して信号処理を施す信号処理手段と、 時間に関連づけられた環境情報を記憶する記憶手段と、 時間を計測する時間計測手段と、 前記計測された時間から関連する環境情報を前記記憶手段から取り出し、当該環境情報に基づいて前記信号処理手段を制御する制御手段と、 を備えたことを特徴とする音声入力システム。

JP2004212641A
CLAIM 3
さらに、前記信号処理の結果 (result memory) を反映して前記環境情報や前記パラメータの内容を変更する手段を具備したことを特徴とする請求項2に記載の音声入力システム。

US7979277B2
CLAIM 11
. The speech recognition circuit (音声信号) of claim 1 , comprising increasing the pipeline depth by computing extra front frames in advance .
JP2004212641A
CLAIM 1
音声信号 (speech recognition circuit) を受信する受信手段と、 前記音声信号に対して信号処理を施す信号処理手段と、 時間に関連づけられた環境情報を記憶する記憶手段と、 時間を計測する時間計測手段と、 前記計測された時間から関連する環境情報を前記記憶手段から取り出し、当該環境情報に基づいて前記信号処理手段を制御する制御手段と、 を備えたことを特徴とする音声入力システム。

US7979277B2
CLAIM 12
. The speech recognition circuit (音声信号) of claim 1 , wherein the audio front end is configured to input a digital audio signal .
JP2004212641A
CLAIM 1
音声信号 (speech recognition circuit) を受信する受信手段と、 前記音声信号に対して信号処理を施す信号処理手段と、 時間に関連づけられた環境情報を記憶する記憶手段と、 時間を計測する時間計測手段と、 前記計測された時間から関連する環境情報を前記記憶手段から取り出し、当該環境情報に基づいて前記信号処理手段を制御する制御手段と、 を備えたことを特徴とする音声入力システム。

US7979277B2
CLAIM 13
. A speech recognition circuit (音声信号) of claim 1 , wherein said distance comprises a Mahalanobis distance .
JP2004212641A
CLAIM 1
音声信号 (speech recognition circuit) を受信する受信手段と、 前記音声信号に対して信号処理を施す信号処理手段と、 時間に関連づけられた環境情報を記憶する記憶手段と、 時間を計測する時間計測手段と、 前記計測された時間から関連する環境情報を前記記憶手段から取り出し、当該環境情報に基づいて前記信号処理手段を制御する制御手段と、 を備えたことを特徴とする音声入力システム。

US7979277B2
CLAIM 14
. A speech recognition circuit (音声信号) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
JP2004212641A
CLAIM 1
音声信号 (speech recognition circuit) を受信する受信手段と、 前記音声信号に対して信号処理を施す信号処理手段と、 時間に関連づけられた環境情報を記憶する記憶手段と、 時間を計測する時間計測手段と、 前記計測された時間から関連する環境情報を前記記憶手段から取り出し、当該環境情報に基づいて前記信号処理手段を制御する制御手段と、 を備えたことを特徴とする音声入力システム。




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US20040111264A1

Filed: 2002-12-10     Issued: 2004-06-10

Name entity extraction using language models

(Original Assignee) International Business Machines Corp     (Current Assignee) Nuance Communications Inc

Zhong-Hua Wang, David Lubeneky
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances (following steps) indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor (start node) , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US20040111264A1
CLAIM 1
. A method , in a data processing system , for name entity extraction , comprising : providing a general language model and one or more name entity language models , wherein the general language model and the one or more name entity language models are associated with a plurality of states including a general language model state and one or more name entity language model states associated with the one or more name entity language models ;
receiving a natural language utterance ;
starting at a start node (second processor) in the general language model , for each word in the natural language utterance , determining a best current state in the plurality of states and a best previous state in the plurality of states , until an end node in the general language model is reached ;
tracing back to determine a best path from the start node to the end node ;
identifying two or more adjacent words in the natural language utterance aligned with a name entity language model state as a name entity ;
and extracting the name entity .

US20040111264A1
CLAIM 17
. An apparatus for name entity extraction , comprising : a processor ;
and a memory electrically connected to the processor , the memory having stored therein a program to be executed on the processor for performing the following steps (calculating means, speech recognition method, calculating distances) : a) providing a general language model ;
b) setting initial probabilities in the general language model ;
c) extracting name entities in a plurality of training sentences ;
d) replacing the extracted name entities with name entity labels in the plurality of training sentences to form modified training sentences ;
e) training the general language model using the modified training sentences ;
and f) repeating steps (c) to (e) until probabilities the general language model do not change .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means (following steps) for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US20040111264A1
CLAIM 17
. An apparatus for name entity extraction , comprising : a processor ;
and a memory electrically connected to the processor , the memory having stored therein a program to be executed on the processor for performing the following steps (calculating means, speech recognition method, calculating distances) : a) providing a general language model ;
b) setting initial probabilities in the general language model ;
c) extracting name entities in a plurality of training sentences ;
d) replacing the extracted name entities with name entity labels in the plurality of training sentences to form modified training sentences ;
e) training the general language model using the modified training sentences ;
and f) repeating steps (c) to (e) until probabilities the general language model do not change .

US7979277B2
CLAIM 15
. A speech recognition method (following steps) , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US20040111264A1
CLAIM 17
. An apparatus for name entity extraction , comprising : a processor ;
and a memory electrically connected to the processor , the memory having stored therein a program to be executed on the processor for performing the following steps (calculating means, speech recognition method, calculating distances) : a) providing a general language model ;
b) setting initial probabilities in the general language model ;
c) extracting name entities in a plurality of training sentences ;
d) replacing the extracted name entities with name entity labels in the plurality of training sentences to form modified training sentences ;
e) training the general language model using the modified training sentences ;
and f) repeating steps (c) to (e) until probabilities the general language model do not change .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method (following steps) , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
US20040111264A1
CLAIM 17
. An apparatus for name entity extraction , comprising : a processor ;
and a memory electrically connected to the processor , the memory having stored therein a program to be executed on the processor for performing the following steps (calculating means, speech recognition method, calculating distances) : a) providing a general language model ;
b) setting initial probabilities in the general language model ;
c) extracting name entities in a plurality of training sentences ;
d) replacing the extracted name entities with name entity labels in the plurality of training sentences to form modified training sentences ;
e) training the general language model using the modified training sentences ;
and f) repeating steps (c) to (e) until probabilities the general language model do not change .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
JP2004153732A

Filed: 2002-11-01     Issued: 2004-05-27

介護施設監視システム

(Original Assignee) Toshiba Eng Co Ltd; 東芝エンジニアリング株式会社     

Yasuhiko Katsuki, 康彦 香月, Koichi Kinoshita, 広一 木下, Hidenori Kitamura, 秀紀 北村
US7979277B2
CLAIM 3
. A speech recognition circuit as claimed in claim 1 , comprising dynamic scheduling whether the first processor should run the front end or search stage code (LAN) , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
JP2004153732A
CLAIM 2
前記音声追尾型カメラは、少なくとも前記被介護者の居室に配置されており、前記介護施設内にLAN (search stage code) が構築されていて、このLANには前記音声追尾型カメラが接続されているとともに、中央処理装置が接続されており、前記音声追尾型カメラによって収集された映像、音声は前記LANを経由して前記中央処理装置に格納される如く構成されていることを特徴とする請求項1記載の介護施設監視システム。

US7979277B2
CLAIM 15
. A speech recognition method (システム) , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
JP2004153732A
CLAIM 1
介護施設内の各所に音声追尾型カメラを配置し、前記介護施設内の被介護者に基いて発生する一定レベル以上の音声に応じて前記音声追尾型カメラを当該音声発生源である被介護者へ向ける如く追尾せしめ、この音声追尾型カメラから収集された映像は予めアクセス権限の登録されている者によってのみ見ることが可能な如く構成されていることを特徴とする介護施設監視システム (speech recognition method)

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method (システム) , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
JP2004153732A
CLAIM 1
介護施設内の各所に音声追尾型カメラを配置し、前記介護施設内の被介護者に基いて発生する一定レベル以上の音声に応じて前記音声追尾型カメラを当該音声発生源である被介護者へ向ける如く追尾せしめ、この音声追尾型カメラから収集された映像は予めアクセス権限の登録されている者によってのみ見ることが可能な如く構成されていることを特徴とする介護施設監視システム (speech recognition method)




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US20030110033A1

Filed: 2002-10-22     Issued: 2003-06-12

Method and system for real-time speech recognition

(Original Assignee) Dspfactory Ltd     (Current Assignee) AMI Semiconductor Inc

Hamid Sheikhzadeh-Nadjar, Etienne Cornu, Robert Brennan, Nicolas Destrez, Alain Dufaux
US7979277B2
CLAIM 1
. A speech recognition circuit (recognizing speech) , comprising : an audio front end for calculating a feature vector (Discrete Cosine Transform) from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor (first processor) , and said calculating circuit is implemented using a second processor (second processor) , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US20030110033A1
CLAIM 1
. A system for recognizing speech (speech recognition circuit, word identification) in real-time , the system comprising : an input processor for receiving samples of speech and creating a frame ;
and at least two processor units having functionality of feature extraction and pattern matching , the functionality being divided and assigned to each processor unit , the processor units operating sequentially or substantially in parallel .

US20030110033A1
CLAIM 4
. The system as claimed in claim 2 , wherein the processor units includes a first processor (first processor) unit for performing Fast Fourier Transform (FFT) and bin energy factorization , and one or more second processor (second processor, dynamic scheduling) units for determining FFT band energies , energy bins and Mel Frequency Cepstrum Coefficient (MFCC) coefficients .

US20030110033A1
CLAIM 8
. The system as claimed in claim 6 , wherein the one or more second processor units calculate the Inverse Discrete Cosine Transform (feature vector, search stage to identify words) (IDCT) based on the results of the mapping .

US7979277B2
CLAIM 2
. A speech recognition circuit (recognizing speech) as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor (first processor) .
US20030110033A1
CLAIM 1
. A system for recognizing speech (speech recognition circuit, word identification) in real-time , the system comprising : an input processor for receiving samples of speech and creating a frame ;
and at least two processor units having functionality of feature extraction and pattern matching , the functionality being divided and assigned to each processor unit , the processor units operating sequentially or substantially in parallel .

US20030110033A1
CLAIM 4
. The system as claimed in claim 2 , wherein the processor units includes a first processor (first processor) unit for performing Fast Fourier Transform (FFT) and bin energy factorization , and one or more second processor units for determining FFT band energies , energy bins and Mel Frequency Cepstrum Coefficient (MFCC) coefficients .

US7979277B2
CLAIM 3
. A speech recognition circuit (recognizing speech) as claimed in claim 1 , comprising dynamic scheduling (second processor) whether the first processor (first processor) should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
US20030110033A1
CLAIM 1
. A system for recognizing speech (speech recognition circuit, word identification) in real-time , the system comprising : an input processor for receiving samples of speech and creating a frame ;
and at least two processor units having functionality of feature extraction and pattern matching , the functionality being divided and assigned to each processor unit , the processor units operating sequentially or substantially in parallel .

US20030110033A1
CLAIM 4
. The system as claimed in claim 2 , wherein the processor units includes a first processor (first processor) unit for performing Fast Fourier Transform (FFT) and bin energy factorization , and one or more second processor (second processor, dynamic scheduling) units for determining FFT band energies , energy bins and Mel Frequency Cepstrum Coefficient (MFCC) coefficients .

US7979277B2
CLAIM 4
. A speech recognition circuit (recognizing speech) as claimed in claim 1 , wherein the first processor (first processor) supports multi-threaded operation , and runs the search stage and front ends as separate threads .
US20030110033A1
CLAIM 1
. A system for recognizing speech (speech recognition circuit, word identification) in real-time , the system comprising : an input processor for receiving samples of speech and creating a frame ;
and at least two processor units having functionality of feature extraction and pattern matching , the functionality being divided and assigned to each processor unit , the processor units operating sequentially or substantially in parallel .

US20030110033A1
CLAIM 4
. The system as claimed in claim 2 , wherein the processor units includes a first processor (first processor) unit for performing Fast Fourier Transform (FFT) and bin energy factorization , and one or more second processor units for determining FFT band energies , energy bins and Mel Frequency Cepstrum Coefficient (MFCC) coefficients .

US7979277B2
CLAIM 5
. A speech recognition circuit (recognizing speech) as claimed in claim 1 , wherein the said calculating circuit is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
US20030110033A1
CLAIM 1
. A system for recognizing speech (speech recognition circuit, word identification) in real-time , the system comprising : an input processor for receiving samples of speech and creating a frame ;
and at least two processor units having functionality of feature extraction and pattern matching , the functionality being divided and assigned to each processor unit , the processor units operating sequentially or substantially in parallel .

US7979277B2
CLAIM 6
. The speech recognition circuit (recognizing speech) of claim 1 , comprising control means (calculating step) adapted to implement frame dropping , to discard one or more audio time frames .
US20030110033A1
CLAIM 1
. A system for recognizing speech (speech recognition circuit, word identification) in real-time , the system comprising : an input processor for receiving samples of speech and creating a frame ;
and at least two processor units having functionality of feature extraction and pattern matching , the functionality being divided and assigned to each processor unit , the processor units operating sequentially or substantially in parallel .

US20030110033A1
CLAIM 18
. A method of claim 17 , wherein the extracting step includes the step of : performing Fast Fourier Transform (FFT) ;
calculating an Inverse Discrete Cosine Transform (IDCT) based on FFT band energies and generating a Mel Frequency Cepstrum Coefficient (MFCC) based on the IDCT ;
and performing bin energy factorization using vector multiplication which multiples the FFT band energies by a vector , the performing step and the calculating step (control means) being implemented substantially in parallel to the creating step for a next frame .

US7979277B2
CLAIM 7
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the feature vector (Discrete Cosine Transform) comprises a plurality of spectral components of an audio signal for a predetermined time frame .
US20030110033A1
CLAIM 1
. A system for recognizing speech (speech recognition circuit, word identification) in real-time , the system comprising : an input processor for receiving samples of speech and creating a frame ;
and at least two processor units having functionality of feature extraction and pattern matching , the functionality being divided and assigned to each processor unit , the processor units operating sequentially or substantially in parallel .

US20030110033A1
CLAIM 8
. The system as claimed in claim 6 , wherein the one or more second processor units calculate the Inverse Discrete Cosine Transform (feature vector, search stage to identify words) (IDCT) based on the results of the mapping .

US7979277B2
CLAIM 8
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the processor is configured to divert to another task if the data flow stalls .
US20030110033A1
CLAIM 1
. A system for recognizing speech (speech recognition circuit, word identification) in real-time , the system comprising : an input processor for receiving samples of speech and creating a frame ;
and at least two processor units having functionality of feature extraction and pattern matching , the functionality being divided and assigned to each processor unit , the processor units operating sequentially or substantially in parallel .

US7979277B2
CLAIM 9
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the speech accelerator has an interrupt signal (second buffer) to inform the front end that the accelerator is ready to receive a next feature vector (Discrete Cosine Transform) from the front end .
US20030110033A1
CLAIM 1
. A system for recognizing speech (speech recognition circuit, word identification) in real-time , the system comprising : an input processor for receiving samples of speech and creating a frame ;
and at least two processor units having functionality of feature extraction and pattern matching , the functionality being divided and assigned to each processor unit , the processor units operating sequentially or substantially in parallel .

US20030110033A1
CLAIM 6
. The system as claimed in claim 5 , wherein the first processor unit stores the results of the vector multiplication in a first buffer , the one or more second processor units subtracting the data in the first buffer from an original FFT band energies , and storing the subtraction results in a second buffer (interrupt signal) .

US20030110033A1
CLAIM 8
. The system as claimed in claim 6 , wherein the one or more second processor units calculate the Inverse Discrete Cosine Transform (feature vector, search stage to identify words) (IDCT) based on the results of the mapping .

US7979277B2
CLAIM 10
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory .
US20030110033A1
CLAIM 1
. A system for recognizing speech (speech recognition circuit, word identification) in real-time , the system comprising : an input processor for receiving samples of speech and creating a frame ;
and at least two processor units having functionality of feature extraction and pattern matching , the functionality being divided and assigned to each processor unit , the processor units operating sequentially or substantially in parallel .

US7979277B2
CLAIM 11
. The speech recognition circuit (recognizing speech) of claim 1 , comprising increasing the pipeline depth by computing extra front frames in advance .
US20030110033A1
CLAIM 1
. A system for recognizing speech (speech recognition circuit, word identification) in real-time , the system comprising : an input processor for receiving samples of speech and creating a frame ;
and at least two processor units having functionality of feature extraction and pattern matching , the functionality being divided and assigned to each processor unit , the processor units operating sequentially or substantially in parallel .

US7979277B2
CLAIM 12
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the audio front end is configured to input a digital audio signal .
US20030110033A1
CLAIM 1
. A system for recognizing speech (speech recognition circuit, word identification) in real-time , the system comprising : an input processor for receiving samples of speech and creating a frame ;
and at least two processor units having functionality of feature extraction and pattern matching , the functionality being divided and assigned to each processor unit , the processor units operating sequentially or substantially in parallel .

US7979277B2
CLAIM 13
. A speech recognition circuit (recognizing speech) of claim 1 , wherein said distance comprises a Mahalanobis distance .
US20030110033A1
CLAIM 1
. A system for recognizing speech (speech recognition circuit, word identification) in real-time , the system comprising : an input processor for receiving samples of speech and creating a frame ;
and at least two processor units having functionality of feature extraction and pattern matching , the functionality being divided and assigned to each processor unit , the processor units operating sequentially or substantially in parallel .

US7979277B2
CLAIM 14
. A speech recognition circuit (recognizing speech) , comprising : an audio front end for calculating a feature vector (Discrete Cosine Transform) from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US20030110033A1
CLAIM 1
. A system for recognizing speech (speech recognition circuit, word identification) in real-time , the system comprising : an input processor for receiving samples of speech and creating a frame ;
and at least two processor units having functionality of feature extraction and pattern matching , the functionality being divided and assigned to each processor unit , the processor units operating sequentially or substantially in parallel .

US20030110033A1
CLAIM 8
. The system as claimed in claim 6 , wherein the one or more second processor units calculate the Inverse Discrete Cosine Transform (feature vector, search stage to identify words) (IDCT) based on the results of the mapping .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector (Discrete Cosine Transform) from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words (Discrete Cosine Transform) within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US20030110033A1
CLAIM 8
. The system as claimed in claim 6 , wherein the one or more second processor units calculate the Inverse Discrete Cosine Transform (feature vector, search stage to identify words) (IDCT) based on the results of the mapping .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector (Discrete Cosine Transform) from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification (recognizing speech) .
US20030110033A1
CLAIM 1
. A system for recognizing speech (speech recognition circuit, word identification) in real-time , the system comprising : an input processor for receiving samples of speech and creating a frame ;
and at least two processor units having functionality of feature extraction and pattern matching , the functionality being divided and assigned to each processor unit , the processor units operating sequentially or substantially in parallel .

US20030110033A1
CLAIM 8
. The system as claimed in claim 6 , wherein the one or more second processor units calculate the Inverse Discrete Cosine Transform (feature vector, search stage to identify words) (IDCT) based on the results of the mapping .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US20030050783A1

Filed: 2002-09-12     Issued: 2003-03-13

Terminal device, server device and speech recognition method

(Original Assignee) Individual     (Current Assignee) Panasonic Holdings Corp

Shinichi Yoshizawa
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances (determining means) indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US20030050783A1
CLAIM 3
. The terminal device according to claim 1 , further comprising : a determining means (calculating distances) for comparing similarity between the voice of the user having the environmental noises added thereto and an acoustic model which has already been stored in the first storage means with a predetermined threshold value , wherein if the similarity is smaller than the threshold value , the transmitting means transmits the voice of the user and the environmental noises to the server device .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator (speech recognition means) has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
US20030050783A1
CLAIM 1
. A terminal device , comprising : a transmitting means for transmitting a voice produced by a user and environmental noises to a server device ;
a receiving means for receiving from the server device an acoustic model adapted to the voice of the user and the environmental noises ;
a first storage means for storing the acoustic model received by the receiving means ;
and a speech recognition means (speech accelerator) for conducting speech recognition using the acoustic model stored in the first storage means .

US7979277B2
CLAIM 15
. A speech recognition method (speech recognition method) , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US20030050783A1
CLAIM 22
. A speech recognition method (speech recognition method) , comprising the steps of : preparing a plurality of acoustic models each adapted to a corresponding speaker , a corresponding environment , and a corresponding tone of voice ;
obtaining an acoustic model adapted to a voice produced by a user and environmental noises , based on the voice of the user , the environmental noises and the plurality of acoustic models ;
and conducting speech recognition using the obtained acoustic model .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method (speech recognition method) , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
US20030050783A1
CLAIM 22
. A speech recognition method (speech recognition method) , comprising the steps of : preparing a plurality of acoustic models each adapted to a corresponding speaker , a corresponding environment , and a corresponding tone of voice ;
obtaining an acoustic model adapted to a voice produced by a user and environmental noises , based on the voice of the user , the environmental noises and the plurality of acoustic models ;
and conducting speech recognition using the obtained acoustic model .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
JP2004096520A

Filed: 2002-09-02     Issued: 2004-03-25

音声認識リモコン

(Original Assignee) Hosiden Corp; ホシデン株式会社     

Shunji Muraoka, 村岡 俊二, Shogo Kubota, 窪田 昭吾
US7979277B2
CLAIM 1
. A speech recognition circuit (音声信号) , comprising : an audio front end for calculating a feature vector from an audio signal (データ) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
JP2004096520A
CLAIM 1
制御対象に送信するリモコンコード群の一部に対応づけられた制御データ (audio signal) を音声入力に基づいて生成する音声認識モジュールと、制御対象に送信する前記リモコンコード群の残りの部分に対応づけられた制御データをキー入力に基づいて生成するリモコン制御モジュールとから構成されるとともに、前記音声認識モジュールは音声認識を行わない時には省電力状態に維持されることを特徴とする音声認識リモコン。

JP2004096520A
CLAIM 3
前記音声認識モジュールの動作モードは入力音声信号 (speech recognition circuit) を高サンプリングレートで処理する高動作モードと入力音声信号を低サンプリングレートで処理する低動作モードからなり、入力音声信号の変化が少ない場合低動作モードに、入力音声信号の変化が大きい場合高動作モードに自動的に切り換えられることを特徴とする請求項2に記載の音声認識リモコン。

US7979277B2
CLAIM 2
. A speech recognition circuit (音声信号) as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor .
JP2004096520A
CLAIM 3
前記音声認識モジュールの動作モードは入力音声信号 (speech recognition circuit) を高サンプリングレートで処理する高動作モードと入力音声信号を低サンプリングレートで処理する低動作モードからなり、入力音声信号の変化が少ない場合低動作モードに、入力音声信号の変化が大きい場合高動作モードに自動的に切り換えられることを特徴とする請求項2に記載の音声認識リモコン。

US7979277B2
CLAIM 3
. A speech recognition circuit (音声信号) as claimed in claim 1 , comprising dynamic scheduling whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
JP2004096520A
CLAIM 3
前記音声認識モジュールの動作モードは入力音声信号 (speech recognition circuit) を高サンプリングレートで処理する高動作モードと入力音声信号を低サンプリングレートで処理する低動作モードからなり、入力音声信号の変化が少ない場合低動作モードに、入力音声信号の変化が大きい場合高動作モードに自動的に切り換えられることを特徴とする請求項2に記載の音声認識リモコン。

US7979277B2
CLAIM 4
. A speech recognition circuit (音声信号) as claimed in claim 1 , wherein the first processor supports multi-threaded operation , and runs the search stage and front ends as separate threads .
JP2004096520A
CLAIM 3
前記音声認識モジュールの動作モードは入力音声信号 (speech recognition circuit) を高サンプリングレートで処理する高動作モードと入力音声信号を低サンプリングレートで処理する低動作モードからなり、入力音声信号の変化が少ない場合低動作モードに、入力音声信号の変化が大きい場合高動作モードに自動的に切り換えられることを特徴とする請求項2に記載の音声認識リモコン。

US7979277B2
CLAIM 5
. A speech recognition circuit (音声信号) as claimed in claim 1 , wherein the said calculating circuit is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
JP2004096520A
CLAIM 3
前記音声認識モジュールの動作モードは入力音声信号 (speech recognition circuit) を高サンプリングレートで処理する高動作モードと入力音声信号を低サンプリングレートで処理する低動作モードからなり、入力音声信号の変化が少ない場合低動作モードに、入力音声信号の変化が大きい場合高動作モードに自動的に切り換えられることを特徴とする請求項2に記載の音声認識リモコン。

US7979277B2
CLAIM 6
. The speech recognition circuit (音声信号) of claim 1 , comprising control means adapted to implement frame dropping , to discard one or more audio time frames .
JP2004096520A
CLAIM 3
前記音声認識モジュールの動作モードは入力音声信号 (speech recognition circuit) を高サンプリングレートで処理する高動作モードと入力音声信号を低サンプリングレートで処理する低動作モードからなり、入力音声信号の変化が少ない場合低動作モードに、入力音声信号の変化が大きい場合高動作モードに自動的に切り換えられることを特徴とする請求項2に記載の音声認識リモコン。

US7979277B2
CLAIM 7
. The speech recognition circuit (音声信号) of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal (データ) for a predetermined time frame .
JP2004096520A
CLAIM 1
制御対象に送信するリモコンコード群の一部に対応づけられた制御データ (audio signal) を音声入力に基づいて生成する音声認識モジュールと、制御対象に送信する前記リモコンコード群の残りの部分に対応づけられた制御データをキー入力に基づいて生成するリモコン制御モジュールとから構成されるとともに、前記音声認識モジュールは音声認識を行わない時には省電力状態に維持されることを特徴とする音声認識リモコン。

JP2004096520A
CLAIM 3
前記音声認識モジュールの動作モードは入力音声信号 (speech recognition circuit) を高サンプリングレートで処理する高動作モードと入力音声信号を低サンプリングレートで処理する低動作モードからなり、入力音声信号の変化が少ない場合低動作モードに、入力音声信号の変化が大きい場合高動作モードに自動的に切り換えられることを特徴とする請求項2に記載の音声認識リモコン。

US7979277B2
CLAIM 8
. The speech recognition circuit (音声信号) of claim 1 , wherein the processor is configured to divert to another task if the data flow stalls .
JP2004096520A
CLAIM 3
前記音声認識モジュールの動作モードは入力音声信号 (speech recognition circuit) を高サンプリングレートで処理する高動作モードと入力音声信号を低サンプリングレートで処理する低動作モードからなり、入力音声信号の変化が少ない場合低動作モードに、入力音声信号の変化が大きい場合高動作モードに自動的に切り換えられることを特徴とする請求項2に記載の音声認識リモコン。

US7979277B2
CLAIM 9
. The speech recognition circuit (音声信号) of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
JP2004096520A
CLAIM 3
前記音声認識モジュールの動作モードは入力音声信号 (speech recognition circuit) を高サンプリングレートで処理する高動作モードと入力音声信号を低サンプリングレートで処理する低動作モードからなり、入力音声信号の変化が少ない場合低動作モードに、入力音声信号の変化が大きい場合高動作モードに自動的に切り換えられることを特徴とする請求項2に記載の音声認識リモコン。

US7979277B2
CLAIM 10
. The speech recognition circuit (音声信号) of claim 1 , wherein the accelerator signals (アクティブ) to the search stage when the distances for a new frame are available in a result memory .
JP2004096520A
CLAIM 2
前記音声認識モジュールは音声入力を受け入れる動作モードと音声入力を受け入れない休止モードを備えており、特定のキー入力に基づいて前記リモコン制御モジュールから送られるアクティブ (accelerator signals) 化信号によって休止モードから動作モードに切り替わることを特徴とする請求項1に記載の音声認識リモコン。

JP2004096520A
CLAIM 3
前記音声認識モジュールの動作モードは入力音声信号 (speech recognition circuit) を高サンプリングレートで処理する高動作モードと入力音声信号を低サンプリングレートで処理する低動作モードからなり、入力音声信号の変化が少ない場合低動作モードに、入力音声信号の変化が大きい場合高動作モードに自動的に切り換えられることを特徴とする請求項2に記載の音声認識リモコン。

US7979277B2
CLAIM 11
. The speech recognition circuit (音声信号) of claim 1 , comprising increasing the pipeline depth by computing extra front frames in advance .
JP2004096520A
CLAIM 3
前記音声認識モジュールの動作モードは入力音声信号 (speech recognition circuit) を高サンプリングレートで処理する高動作モードと入力音声信号を低サンプリングレートで処理する低動作モードからなり、入力音声信号の変化が少ない場合低動作モードに、入力音声信号の変化が大きい場合高動作モードに自動的に切り換えられることを特徴とする請求項2に記載の音声認識リモコン。

US7979277B2
CLAIM 12
. The speech recognition circuit (音声信号) of claim 1 , wherein the audio front end is configured to input a digital audio signal (データ) .
JP2004096520A
CLAIM 1
制御対象に送信するリモコンコード群の一部に対応づけられた制御データ (audio signal) を音声入力に基づいて生成する音声認識モジュールと、制御対象に送信する前記リモコンコード群の残りの部分に対応づけられた制御データをキー入力に基づいて生成するリモコン制御モジュールとから構成されるとともに、前記音声認識モジュールは音声認識を行わない時には省電力状態に維持されることを特徴とする音声認識リモコン。

JP2004096520A
CLAIM 3
前記音声認識モジュールの動作モードは入力音声信号 (speech recognition circuit) を高サンプリングレートで処理する高動作モードと入力音声信号を低サンプリングレートで処理する低動作モードからなり、入力音声信号の変化が少ない場合低動作モードに、入力音声信号の変化が大きい場合高動作モードに自動的に切り換えられることを特徴とする請求項2に記載の音声認識リモコン。

US7979277B2
CLAIM 13
. A speech recognition circuit (音声信号) of claim 1 , wherein said distance comprises a Mahalanobis distance .
JP2004096520A
CLAIM 3
前記音声認識モジュールの動作モードは入力音声信号 (speech recognition circuit) を高サンプリングレートで処理する高動作モードと入力音声信号を低サンプリングレートで処理する低動作モードからなり、入力音声信号の変化が少ない場合低動作モードに、入力音声信号の変化が大きい場合高動作モードに自動的に切り換えられることを特徴とする請求項2に記載の音声認識リモコン。

US7979277B2
CLAIM 14
. A speech recognition circuit (音声信号) , comprising : an audio front end for calculating a feature vector from an audio signal (データ) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
JP2004096520A
CLAIM 1
制御対象に送信するリモコンコード群の一部に対応づけられた制御データ (audio signal) を音声入力に基づいて生成する音声認識モジュールと、制御対象に送信する前記リモコンコード群の残りの部分に対応づけられた制御データをキー入力に基づいて生成するリモコン制御モジュールとから構成されるとともに、前記音声認識モジュールは音声認識を行わない時には省電力状態に維持されることを特徴とする音声認識リモコン。

JP2004096520A
CLAIM 3
前記音声認識モジュールの動作モードは入力音声信号 (speech recognition circuit) を高サンプリングレートで処理する高動作モードと入力音声信号を低サンプリングレートで処理する低動作モードからなり、入力音声信号の変化が少ない場合低動作モードに、入力音声信号の変化が大きい場合高動作モードに自動的に切り換えられることを特徴とする請求項2に記載の音声認識リモコン。

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal (データ) using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
JP2004096520A
CLAIM 1
制御対象に送信するリモコンコード群の一部に対応づけられた制御データ (audio signal) を音声入力に基づいて生成する音声認識モジュールと、制御対象に送信する前記リモコンコード群の残りの部分に対応づけられた制御データをキー入力に基づいて生成するリモコン制御モジュールとから構成されるとともに、前記音声認識モジュールは音声認識を行わない時には省電力状態に維持されることを特徴とする音声認識リモコン。

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal (データ) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
JP2004096520A
CLAIM 1
制御対象に送信するリモコンコード群の一部に対応づけられた制御データ (audio signal) を音声入力に基づいて生成する音声認識モジュールと、制御対象に送信する前記リモコンコード群の残りの部分に対応づけられた制御データをキー入力に基づいて生成するリモコン制御モジュールとから構成されるとともに、前記音声認識モジュールは音声認識を行わない時には省電力状態に維持されることを特徴とする音声認識リモコン。




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US20030046073A1

Filed: 2002-08-22     Issued: 2003-03-06

Word predicting method, voice recognition method, and voice recognition apparatus and program using the same methods

(Original Assignee) International Business Machines Corp     (Current Assignee) Nuance Communications Inc

Shinsuke Mori, Masafumi Nishimura, Nobuyasu Itoh
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector (recognition method) from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US20030046073A1
CLAIM 7
. A voice recognition method (feature vector) of recognizing a voice signal as a word string by using a computer , comprising the steps of : making an arithmetical operation on the voice signal to be processed , using an acoustic model and selecting a word as a recognition candidate resulted from the arithmetical operation ;
specifying a sentence structure of a history up to the word immediately before the word to be predicted for said selected word as an object ;
and predicting said word to be predicted based on a context tree having the information about possible structures of a sentence and a probability of appearance of words with respect to said structures at nodes and the sentence structure of said history .

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector (recognition method) comprises a plurality of spectral components of an audio signal for a predetermined time frame .
US20030046073A1
CLAIM 7
. A voice recognition method (feature vector) of recognizing a voice signal as a word string by using a computer , comprising the steps of : making an arithmetical operation on the voice signal to be processed , using an acoustic model and selecting a word as a recognition candidate resulted from the arithmetical operation ;
specifying a sentence structure of a history up to the word immediately before the word to be predicted for said selected word as an object ;
and predicting said word to be predicted based on a context tree having the information about possible structures of a sentence and a probability of appearance of words with respect to said structures at nodes and the sentence structure of said history .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator (voice recognition, word string) has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector (recognition method) from the front end .
US20030046073A1
CLAIM 7
. A voice recognition method (feature vector) of recognizing a voice signal as a word string (speech accelerator) by using a computer , comprising the steps of : making an arithmetical operation on the voice signal to be processed , using an acoustic model and selecting a word as a recognition candidate resulted from the arithmetical operation ;
specifying a sentence structure of a history up to the word immediately before the word to be predicted for said selected word as an object ;
and predicting said word to be predicted based on a context tree having the information about possible structures of a sentence and a probability of appearance of words with respect to said structures at nodes and the sentence structure of said history .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector (recognition method) from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US20030046073A1
CLAIM 7
. A voice recognition method (feature vector) of recognizing a voice signal as a word string by using a computer , comprising the steps of : making an arithmetical operation on the voice signal to be processed , using an acoustic model and selecting a word as a recognition candidate resulted from the arithmetical operation ;
specifying a sentence structure of a history up to the word immediately before the word to be predicted for said selected word as an object ;
and predicting said word to be predicted based on a context tree having the information about possible structures of a sentence and a probability of appearance of words with respect to said structures at nodes and the sentence structure of said history .

US7979277B2
CLAIM 15
. A speech recognition method (probability distribution) , comprising : calculating a feature vector (recognition method) from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US20030046073A1
CLAIM 7
. A voice recognition method (feature vector) of recognizing a voice signal as a word string by using a computer , comprising the steps of : making an arithmetical operation on the voice signal to be processed , using an acoustic model and selecting a word as a recognition candidate resulted from the arithmetical operation ;
specifying a sentence structure of a history up to the word immediately before the word to be predicted for said selected word as an object ;
and predicting said word to be predicted based on a context tree having the information about possible structures of a sentence and a probability of appearance of words with respect to said structures at nodes and the sentence structure of said history .

US20030046073A1
CLAIM 9
. A data processing method comprising the steps of : acquiring a processing history of a tree structure to be used in predicting a predetermined element from a history storage unit storing said processing history for an array ;
acquiring a stochastic model from a stochastic model storage unit storing the stochastic model for the tree structure having predetermined partial trees and a probability distribution (speech recognition method) associated with said partial trees at nodes ;
and retrieving nodes corresponding to the tree structure of said processing history for said stochastic model , and predicting said predetermined element based on said probability distribution associated with said nodes .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method (probability distribution) , the code comprising : code for controlling the processor to calculate a feature vector (recognition method) from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
US20030046073A1
CLAIM 7
. A voice recognition method (feature vector) of recognizing a voice signal as a word string by using a computer , comprising the steps of : making an arithmetical operation on the voice signal to be processed , using an acoustic model and selecting a word as a recognition candidate resulted from the arithmetical operation ;
specifying a sentence structure of a history up to the word immediately before the word to be predicted for said selected word as an object ;
and predicting said word to be predicted based on a context tree having the information about possible structures of a sentence and a probability of appearance of words with respect to said structures at nodes and the sentence structure of said history .

US20030046073A1
CLAIM 9
. A data processing method comprising the steps of : acquiring a processing history of a tree structure to be used in predicting a predetermined element from a history storage unit storing said processing history for an array ;
acquiring a stochastic model from a stochastic model storage unit storing the stochastic model for the tree structure having predetermined partial trees and a probability distribution (speech recognition method) associated with said partial trees at nodes ;
and retrieving nodes corresponding to the tree structure of said processing history for said stochastic model , and predicting said predetermined element based on said probability distribution associated with said nodes .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
EP1423846A1

Filed: 2002-08-07     Issued: 2004-06-02

Method and apparatus for speech analysis

(Original Assignee) VoiceSense Ltd     (Current Assignee) VoiceSense Ltd

Yoav Degani, Yishai Zamir
US7979277B2
CLAIM 1
. A speech recognition circuit (voice segment) , comprising : an audio front end (external output) for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit (cellular telephone) for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor (sampled data) , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
EP1423846A1
CLAIM 11
. The method according to any previous claim , further comprising calculating a reliability grade based on at least one factor selected from the list of : quality of voice segment (speech recognition circuit) ;
significance of emotional arousal decision , and consistency of specific segment results with results of previous speech segments .

EP1423846A1
CLAIM 12
. The method according to claim 11 , wherein said quality of voice segment is determined , based on noise level , size of sampled data (first processor) , and quality of sampled data .

EP1423846A1
CLAIM 17
. The apparatas according to claim 15 or 16 , wherein said voice input unit includes at least one of the following : a microphone , an interface to an audio player , an interface to a wired , wireless or cellular telephone (calculating circuit) , an interface to the Internet or other network , an interface to a computer , an interface to an electronic personal organizer or to any other electronic equipment , or an interface to a toy .

EP1423846A1
CLAIM 22
. The apparatas according to any of claims 15 to 21 , wherein said pre-processing and said processing units are incorporated within a software tool capable of integrating with an external source of digitized voice input and with an external output (front end, front ends, audio front end) device .

US7979277B2
CLAIM 2
. A speech recognition circuit (voice segment) as claimed in claim 1 , wherein the pipelining comprises alternating of front end (external output) and search stage processing on the first processor (sampled data) .
EP1423846A1
CLAIM 11
. The method according to any previous claim , further comprising calculating a reliability grade based on at least one factor selected from the list of : quality of voice segment (speech recognition circuit) ;
significance of emotional arousal decision , and consistency of specific segment results with results of previous speech segments .

EP1423846A1
CLAIM 12
. The method according to claim 11 , wherein said quality of voice segment is determined , based on noise level , size of sampled data (first processor) , and quality of sampled data .

EP1423846A1
CLAIM 22
. The apparatas according to any of claims 15 to 21 , wherein said pre-processing and said processing units are incorporated within a software tool capable of integrating with an external source of digitized voice input and with an external output (front end, front ends, audio front end) device .

US7979277B2
CLAIM 3
. A speech recognition circuit (voice segment) as claimed in claim 1 , comprising dynamic scheduling whether the first processor (sampled data) should run the front end (external output) or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
EP1423846A1
CLAIM 11
. The method according to any previous claim , further comprising calculating a reliability grade based on at least one factor selected from the list of : quality of voice segment (speech recognition circuit) ;
significance of emotional arousal decision , and consistency of specific segment results with results of previous speech segments .

EP1423846A1
CLAIM 12
. The method according to claim 11 , wherein said quality of voice segment is determined , based on noise level , size of sampled data (first processor) , and quality of sampled data .

EP1423846A1
CLAIM 22
. The apparatas according to any of claims 15 to 21 , wherein said pre-processing and said processing units are incorporated within a software tool capable of integrating with an external source of digitized voice input and with an external output (front end, front ends, audio front end) device .

US7979277B2
CLAIM 4
. A speech recognition circuit (voice segment) as claimed in claim 1 , wherein the first processor (sampled data) supports multi-threaded operation , and runs the search stage and front ends (external output) as separate threads .
EP1423846A1
CLAIM 11
. The method according to any previous claim , further comprising calculating a reliability grade based on at least one factor selected from the list of : quality of voice segment (speech recognition circuit) ;
significance of emotional arousal decision , and consistency of specific segment results with results of previous speech segments .

EP1423846A1
CLAIM 12
. The method according to claim 11 , wherein said quality of voice segment is determined , based on noise level , size of sampled data (first processor) , and quality of sampled data .

EP1423846A1
CLAIM 22
. The apparatas according to any of claims 15 to 21 , wherein said pre-processing and said processing units are incorporated within a software tool capable of integrating with an external source of digitized voice input and with an external output (front end, front ends, audio front end) device .

US7979277B2
CLAIM 5
. A speech recognition circuit (voice segment) as claimed in claim 1 , wherein the said calculating circuit (cellular telephone) is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
EP1423846A1
CLAIM 11
. The method according to any previous claim , further comprising calculating a reliability grade based on at least one factor selected from the list of : quality of voice segment (speech recognition circuit) ;
significance of emotional arousal decision , and consistency of specific segment results with results of previous speech segments .

EP1423846A1
CLAIM 17
. The apparatas according to claim 15 or 16 , wherein said voice input unit includes at least one of the following : a microphone , an interface to an audio player , an interface to a wired , wireless or cellular telephone (calculating circuit) , an interface to the Internet or other network , an interface to a computer , an interface to an electronic personal organizer or to any other electronic equipment , or an interface to a toy .

US7979277B2
CLAIM 6
. The speech recognition circuit (voice segment) of claim 1 , comprising control means adapted to implement frame dropping , to discard one or more audio time frames .
EP1423846A1
CLAIM 11
. The method according to any previous claim , further comprising calculating a reliability grade based on at least one factor selected from the list of : quality of voice segment (speech recognition circuit) ;
significance of emotional arousal decision , and consistency of specific segment results with results of previous speech segments .

US7979277B2
CLAIM 7
. The speech recognition circuit (voice segment) of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal for a predetermined time frame .
EP1423846A1
CLAIM 11
. The method according to any previous claim , further comprising calculating a reliability grade based on at least one factor selected from the list of : quality of voice segment (speech recognition circuit) ;
significance of emotional arousal decision , and consistency of specific segment results with results of previous speech segments .

US7979277B2
CLAIM 8
. The speech recognition circuit (voice segment) of claim 1 , wherein the processor is configured to divert to another task if the data flow stalls .
EP1423846A1
CLAIM 11
. The method according to any previous claim , further comprising calculating a reliability grade based on at least one factor selected from the list of : quality of voice segment (speech recognition circuit) ;
significance of emotional arousal decision , and consistency of specific segment results with results of previous speech segments .

US7979277B2
CLAIM 9
. The speech recognition circuit (voice segment) of claim 1 , wherein the speech accelerator (voice channel, equal length) has an interrupt signal to inform the front end (external output) that the accelerator is ready to receive a next feature vector from the front end .
EP1423846A1
CLAIM 1
. A method for determining emotional arousal of a subject by speech analysis , comprising the steps of : obtaining a speech sample ;
pre-processing the speech sample into silent and active speech segments and dividing the active speech segments into strings of equal length (speech accelerator, accelerator signals) blocks ;
said blocks having primary speech parameters including pitch and amplitude parameters ;
deriving a plurality of selected secondary speech parameters indicative of characteristics of equal-pitch , rising-pitch and falling-pitch trends in said strings of blocks ;
comparing said secondary speech parameters with predefined , subject independent values representing non-emotional speech to generate a processing result indicative of emotional arousal , and outputting said generated processed result to an output device .

EP1423846A1
CLAIM 8
. The method according to any of claims 1 to 7 , adapted for analyzing a speech signal including a plurality of interacting voices , further comprising : separating the interacting voices into separate voice channel (speech accelerator, accelerator signals) s ;
performing samples normalization for each channel of interest ;
performing data filtering for each channel of interest ;
performing noise-reduction for each channel of interest ;
performing silence and speech segmentation and dividing the speech segments into blocks for each channel of interest , and auto-correlation processing to calculate pitch and amplitade voice parameters per block for each channel of interest .

EP1423846A1
CLAIM 11
. The method according to any previous claim , further comprising calculating a reliability grade based on at least one factor selected from the list of : quality of voice segment (speech recognition circuit) ;
significance of emotional arousal decision , and consistency of specific segment results with results of previous speech segments .

EP1423846A1
CLAIM 22
. The apparatas according to any of claims 15 to 21 , wherein said pre-processing and said processing units are incorporated within a software tool capable of integrating with an external source of digitized voice input and with an external output (front end, front ends, audio front end) device .

US7979277B2
CLAIM 10
. The speech recognition circuit (voice segment) of claim 1 , wherein the accelerator signals (voice channel, equal length) to the search stage when the distances for a new frame are available in a result memory (speech signal) .
EP1423846A1
CLAIM 1
. A method for determining emotional arousal of a subject by speech analysis , comprising the steps of : obtaining a speech sample ;
pre-processing the speech sample into silent and active speech segments and dividing the active speech segments into strings of equal length (speech accelerator, accelerator signals) blocks ;
said blocks having primary speech parameters including pitch and amplitude parameters ;
deriving a plurality of selected secondary speech parameters indicative of characteristics of equal-pitch , rising-pitch and falling-pitch trends in said strings of blocks ;
comparing said secondary speech parameters with predefined , subject independent values representing non-emotional speech to generate a processing result indicative of emotional arousal , and outputting said generated processed result to an output device .

EP1423846A1
CLAIM 8
. The method according to any of claims 1 to 7 , adapted for analyzing a speech signal (result memory) including a plurality of interacting voices , further comprising : separating the interacting voices into separate voice channel (speech accelerator, accelerator signals) s ;
performing samples normalization for each channel of interest ;
performing data filtering for each channel of interest ;
performing noise-reduction for each channel of interest ;
performing silence and speech segmentation and dividing the speech segments into blocks for each channel of interest , and auto-correlation processing to calculate pitch and amplitade voice parameters per block for each channel of interest .

EP1423846A1
CLAIM 11
. The method according to any previous claim , further comprising calculating a reliability grade based on at least one factor selected from the list of : quality of voice segment (speech recognition circuit) ;
significance of emotional arousal decision , and consistency of specific segment results with results of previous speech segments .

US7979277B2
CLAIM 11
. The speech recognition circuit (voice segment) of claim 1 , comprising increasing the pipeline depth by computing extra front frames in advance .
EP1423846A1
CLAIM 11
. The method according to any previous claim , further comprising calculating a reliability grade based on at least one factor selected from the list of : quality of voice segment (speech recognition circuit) ;
significance of emotional arousal decision , and consistency of specific segment results with results of previous speech segments .

US7979277B2
CLAIM 12
. The speech recognition circuit (voice segment) of claim 1 , wherein the audio front end (external output) is configured to input a digital audio signal .
EP1423846A1
CLAIM 11
. The method according to any previous claim , further comprising calculating a reliability grade based on at least one factor selected from the list of : quality of voice segment (speech recognition circuit) ;
significance of emotional arousal decision , and consistency of specific segment results with results of previous speech segments .

EP1423846A1
CLAIM 22
. The apparatas according to any of claims 15 to 21 , wherein said pre-processing and said processing units are incorporated within a software tool capable of integrating with an external source of digitized voice input and with an external output (front end, front ends, audio front end) device .

US7979277B2
CLAIM 13
. A speech recognition circuit (voice segment) of claim 1 , wherein said distance comprises a Mahalanobis distance .
EP1423846A1
CLAIM 11
. The method according to any previous claim , further comprising calculating a reliability grade based on at least one factor selected from the list of : quality of voice segment (speech recognition circuit) ;
significance of emotional arousal decision , and consistency of specific segment results with results of previous speech segments .

US7979277B2
CLAIM 14
. A speech recognition circuit (voice segment) , comprising : an audio front end (external output) for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
EP1423846A1
CLAIM 11
. The method according to any previous claim , further comprising calculating a reliability grade based on at least one factor selected from the list of : quality of voice segment (speech recognition circuit) ;
significance of emotional arousal decision , and consistency of specific segment results with results of previous speech segments .

EP1423846A1
CLAIM 22
. The apparatas according to any of claims 15 to 21 , wherein said pre-processing and said processing units are incorporated within a software tool capable of integrating with an external source of digitized voice input and with an external output (front end, front ends, audio front end) device .

US7979277B2
CLAIM 15
. A speech recognition method (speech segments) , comprising : calculating a feature vector from an audio signal using an audio front end (external output) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit (cellular telephone) ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
EP1423846A1
CLAIM 1
. A method for determining emotional arousal of a subject by speech analysis , comprising the steps of : obtaining a speech sample ;
pre-processing the speech sample into silent and active speech segments (speech recognition method) and dividing the active speech segments into strings of equal length blocks ;
said blocks having primary speech parameters including pitch and amplitude parameters ;
deriving a plurality of selected secondary speech parameters indicative of characteristics of equal-pitch , rising-pitch and falling-pitch trends in said strings of blocks ;
comparing said secondary speech parameters with predefined , subject independent values representing non-emotional speech to generate a processing result indicative of emotional arousal , and outputting said generated processed result to an output device .

EP1423846A1
CLAIM 17
. The apparatas according to claim 15 or 16 , wherein said voice input unit includes at least one of the following : a microphone , an interface to an audio player , an interface to a wired , wireless or cellular telephone (calculating circuit) , an interface to the Internet or other network , an interface to a computer , an interface to an electronic personal organizer or to any other electronic equipment , or an interface to a toy .

EP1423846A1
CLAIM 22
. The apparatas according to any of claims 15 to 21 , wherein said pre-processing and said processing units are incorporated within a software tool capable of integrating with an external source of digitized voice input and with an external output (front end, front ends, audio front end) device .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method (speech segments) , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification (said unit) .
EP1423846A1
CLAIM 1
. A method for determining emotional arousal of a subject by speech analysis , comprising the steps of : obtaining a speech sample ;
pre-processing the speech sample into silent and active speech segments (speech recognition method) and dividing the active speech segments into strings of equal length blocks ;
said blocks having primary speech parameters including pitch and amplitude parameters ;
deriving a plurality of selected secondary speech parameters indicative of characteristics of equal-pitch , rising-pitch and falling-pitch trends in said strings of blocks ;
comparing said secondary speech parameters with predefined , subject independent values representing non-emotional speech to generate a processing result indicative of emotional arousal , and outputting said generated processed result to an output device .

EP1423846A1
CLAIM 20
. The apparatas according to any of claims 15 to 19 , wherein all said unit (word identification) s are installed on a small , mobile , DSP chip based unit .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
WO2004015686A1

Filed: 2002-08-01     Issued: 2004-02-19

Method for automatic speech recognition

(Original Assignee) Telefonaktiebolaget Lm Ericsson (Publ)     

Ralph Schleifer, Andreas Kiessling, Hans-Günter Hirsch
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor (memory part) , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
WO2004015686A1
CLAIM 15
. An automatic speech recognition device 100 , implemented the method according to any of the claims 1-12 , including - a pre-processing part (110) , where a digital signal from an utterance , spoken into a microphone (210) and transformed in an A/D converter 220 is transformable in a parametric description ;
- a memory part (first processor) (130) , where keyword models , SIL models , garbage models and garbage sequence models are storable ;
- a pattern matcher (120) , where the parametric description of the spoken utterance is comparable with the stored keyword models , SIL models , garbage models and garbage sequence models ;
- a controller part (140) , where in combination with the pattern matcher (120) and the memory part (130) , the method for automatic speech recognition is executable .

US7979277B2
CLAIM 2
. A speech recognition circuit as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor (memory part) .
WO2004015686A1
CLAIM 15
. An automatic speech recognition device 100 , implemented the method according to any of the claims 1-12 , including - a pre-processing part (110) , where a digital signal from an utterance , spoken into a microphone (210) and transformed in an A/D converter 220 is transformable in a parametric description ;
- a memory part (first processor) (130) , where keyword models , SIL models , garbage models and garbage sequence models are storable ;
- a pattern matcher (120) , where the parametric description of the spoken utterance is comparable with the stored keyword models , SIL models , garbage models and garbage sequence models ;
- a controller part (140) , where in combination with the pattern matcher (120) and the memory part (130) , the method for automatic speech recognition is executable .

US7979277B2
CLAIM 3
. A speech recognition circuit as claimed in claim 1 , comprising dynamic scheduling whether the first processor (memory part) should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
WO2004015686A1
CLAIM 15
. An automatic speech recognition device 100 , implemented the method according to any of the claims 1-12 , including - a pre-processing part (110) , where a digital signal from an utterance , spoken into a microphone (210) and transformed in an A/D converter 220 is transformable in a parametric description ;
- a memory part (first processor) (130) , where keyword models , SIL models , garbage models and garbage sequence models are storable ;
- a pattern matcher (120) , where the parametric description of the spoken utterance is comparable with the stored keyword models , SIL models , garbage models and garbage sequence models ;
- a controller part (140) , where in combination with the pattern matcher (120) and the memory part (130) , the method for automatic speech recognition is executable .

US7979277B2
CLAIM 4
. A speech recognition circuit as claimed in claim 1 , wherein the first processor (memory part) supports multi-threaded operation , and runs the search stage and front ends as separate threads .
WO2004015686A1
CLAIM 15
. An automatic speech recognition device 100 , implemented the method according to any of the claims 1-12 , including - a pre-processing part (110) , where a digital signal from an utterance , spoken into a microphone (210) and transformed in an A/D converter 220 is transformable in a parametric description ;
- a memory part (first processor) (130) , where keyword models , SIL models , garbage models and garbage sequence models are storable ;
- a pattern matcher (120) , where the parametric description of the spoken utterance is comparable with the stored keyword models , SIL models , garbage models and garbage sequence models ;
- a controller part (140) , where in combination with the pattern matcher (120) and the memory part (130) , the method for automatic speech recognition is executable .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation (keyword model) , to the distance calculation , and to the word identification .
WO2004015686A1
CLAIM 1
. Method for recognizing a keyword from a spoken utterance , with at least one keyword model (feature calculation) and a plurality of garbage models , wherein a part of the spoken utterance is assessed as the keyword to be recognized , if that part matches best either to the keyword model or to a garbage sequence model , and wherein the garbage sequence model is a series of consecutive garbage models from that plurality of garbage models .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
JP2004012653A

Filed: 2002-06-05     Issued: 2004-01-15

音声認識システム、音声認識クライアント、音声認識サーバ、音声認識クライアントプログラムおよび音声認識サーバプログラム

(Original Assignee) Matsushita Electric Industrial Co Ltd     (Current Assignee) Panasonic Holdings Corp

Takashi Akiyama, 秋山 貴, Norihiko Kumon, 久門 紀彦
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor (前記音声認識システム) , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
JP2004012653A
CLAIM 3
前記サーバは、前記第2の受信手段で受信されたデータが1次認識結果データである場合、前記第2の音声認識手段で2次音声認識を行わず、前記1次認識結果データを前記音声認識システム (first processor, processor pursuant) の認識結果データとして得ることを特徴とする請求項1又は請求項2の何れかに記載の音声認識システム。

US7979277B2
CLAIM 2
. A speech recognition circuit as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor (前記音声認識システム) .
JP2004012653A
CLAIM 3
前記サーバは、前記第2の受信手段で受信されたデータが1次認識結果データである場合、前記第2の音声認識手段で2次音声認識を行わず、前記1次認識結果データを前記音声認識システム (first processor, processor pursuant) の認識結果データとして得ることを特徴とする請求項1又は請求項2の何れかに記載の音声認識システム。

US7979277B2
CLAIM 3
. A speech recognition circuit as claimed in claim 1 , comprising dynamic scheduling whether the first processor (前記音声認識システム) should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
JP2004012653A
CLAIM 3
前記サーバは、前記第2の受信手段で受信されたデータが1次認識結果データである場合、前記第2の音声認識手段で2次音声認識を行わず、前記1次認識結果データを前記音声認識システム (first processor, processor pursuant) の認識結果データとして得ることを特徴とする請求項1又は請求項2の何れかに記載の音声認識システム。

US7979277B2
CLAIM 4
. A speech recognition circuit as claimed in claim 1 , wherein the first processor (前記音声認識システム) supports multi-threaded operation , and runs the search stage and front ends as separate threads .
JP2004012653A
CLAIM 3
前記サーバは、前記第2の受信手段で受信されたデータが1次認識結果データである場合、前記第2の音声認識手段で2次音声認識を行わず、前記1次認識結果データを前記音声認識システム (first processor, processor pursuant) の認識結果データとして得ることを特徴とする請求項1又は請求項2の何れかに記載の音声認識システム。

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature (備えること) vector from the front end .
JP2004012653A
CLAIM 1
サーバとクライアントから構成される音声認識システムであって、 前記クライアントは、入力音声を分析して音声データを生成する音声分析手段と、1次音声認識を行うための複数の辞書データより構成される第1の辞書を記憶する第1の記憶手段と、前記音声データと前記第1の辞書の辞書データとを用いて1次音声認識を行い1次認識結果データを生成する第1の音声認識手段と、前記音声データまたは前記1次認識結果データから前記サーバへ伝送するデータを選択する第1の選択手段と、前記第1の選択手段で選択されたデータを前記サーバへ送信する第1の送信手段とを備え、 前記サーバは、前記クライアントが送信したデータを受信する第2の受信手段と、2次音声認識を行うための複数の辞書データより構成される第2の辞書を記憶する第2の記憶装置と、前記受信手段で受信されたデータと前記第2の辞書の辞書データとを用いて2次音声認識を行う第2の音声認識手段とを備えること (next feature) を特徴とする音声認識システム。

JP2004012653A
CLAIM 8
前記クライアントは更に、前記音声データを用いて音声識 (next feature vector) 別を行い話者の特定を行い話者が誰であるかを示す話者データを生成する音声識別手段を備え、前記第1の記憶手段は前記第1の辞書の複数のデータの夫々を話者と関連付けて記憶し、前記第2の記憶手段は前記第2の辞書の複数のデータの夫々を話者と関連付けて記憶することを特徴とする請求項1に記載の音声認識システム。

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant (前記音声認識システム) to the code from the feature calculation (演算手段, 記憶装置) , to the distance calculation , and to the word identification (識別手段) .
JP2004012653A
CLAIM 1
サーバとクライアントから構成される音声認識システムであって、 前記クライアントは、入力音声を分析して音声データを生成する音声分析手段と、1次音声認識を行うための複数の辞書データより構成される第1の辞書を記憶する第1の記憶手段と、前記音声データと前記第1の辞書の辞書データとを用いて1次音声認識を行い1次認識結果データを生成する第1の音声認識手段と、前記音声データまたは前記1次認識結果データから前記サーバへ伝送するデータを選択する第1の選択手段と、前記第1の選択手段で選択されたデータを前記サーバへ送信する第1の送信手段とを備え、 前記サーバは、前記クライアントが送信したデータを受信する第2の受信手段と、2次音声認識を行うための複数の辞書データより構成される第2の辞書を記憶する第2の記憶装置 (feature calculation) と、前記受信手段で受信されたデータと前記第2の辞書の辞書データとを用いて2次音声認識を行う第2の音声認識手段とを備えることを特徴とする音声認識システム。

JP2004012653A
CLAIM 3
前記サーバは、前記第2の受信手段で受信されたデータが1次認識結果データである場合、前記第2の音声認識手段で2次音声認識を行わず、前記1次認識結果データを前記音声認識システム (first processor, processor pursuant) の認識結果データとして得ることを特徴とする請求項1又は請求項2の何れかに記載の音声認識システム。

JP2004012653A
CLAIM 8
前記クライアントは更に、前記音声データを用いて音声識別を行い話者の特定を行い話者が誰であるかを示す話者データを生成する音声識別手段 (word identification) を備え、前記第1の記憶手段は前記第1の辞書の複数のデータの夫々を話者と関連付けて記憶し、前記第2の記憶手段は前記第2の辞書の複数のデータの夫々を話者と関連付けて記憶することを特徴とする請求項1に記載の音声認識システム。

JP2004012653A
CLAIM 14
サーバとクライアントから構成される音声認識システムであって、 前記クライアントは、入力音声を分析して音声データを生成する音声分析手段と、1次音声認識を行うための複数の辞書データを格納する第1の辞書領域と第2の辞書領域とより構成される第1の辞書を記憶する第1の記憶手段と、前記音声データと第1の辞書の第1の辞書領域及び第2の辞書領域の何れかに格納された辞書データとを用いて1次音声認識を行い1次認識結果データを生成する第1の音声認識手段と、前記音声データまたは前記1次認識結果データから前記サーバへ伝送するデータを選択する第1の選択手段と、前記第1の選択手段で選択されたデータを前記サーバへ送信する第1の送信手段と、前記サーバが送信したデータを受信する第1の受信手段と、前記サーバ側CPU使用率を監視するCPU監視手段とを備え、 前記サーバは、前記クライアントが送信したデータを受信する第2の受信手段と、2次音声認識を行うための複数の辞書データより構成される第2の辞書を記憶する第2の記憶装置と、前記受信手段で受信されたデータと前記第2の辞書の辞書データとを用いて2次音声認識を行う第2の音声認識手段と、前記サーバで生成された複数のデータから前記クライアントへ伝送するデータを選択する第2の選択手段と、前記第2の選択手段で選択されたデータを前記クライアントへ送信する第2の送信手段と、前記サーバ側CPUの使用率を算出しサーバ側CPU使用率データを生成するCPU使用率演算手段 (feature calculation) とを備えることを特徴とする音声認識システム。




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US20020099542A1

Filed: 2002-03-18     Issued: 2002-07-25

Method and apparatus for processing the output of a speech recognition engine

(Original Assignee) AllVoice Computing PLC     (Current Assignee) ALLVOICE DEVELOPMENTS US LLC

John C. Mitchell, Alan James Heard, Steven Norman Corbett, Nicholas John Daniel
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal (audio signal, audio data signal, second user) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (audio signal, audio data signal, second user) ;

a calculating circuit (control means, said input) for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US20020099542A1
CLAIM 1
. Data processing apparatus comprising : input means for receiving recognition data from a speech recognition engine and audio data , said recognition data including a string of recognised characters and audio identifiers identifying audio components corresponding to a character component of the recognised characters ;
storage means for storing said audio data received from said input (calculating circuit, control means, distance calculation) means ;
processing means for receiving and processing the input recognised characters to at least one of replace , insert , move and position the recognised characters to form a processed character string ;
link means for forming link data linking the audio identifiers to the character component positions in the character string and for updating said link data after processing to maintain the link between the audio identifiers and the character component positions in the processed character string ;
display means for displaying the characters received and processed by said processing means ;
user operable selection means for selecting characters in the displayed characters for audio playback , where said link data identifies any selected audio components , if present , which are linked to the selected characters ;
and audio playback means for playing back the selected audio components in the order of the character component positions in the character string or the processed character string .

US20020099542A1
CLAIM 7
. Data processing apparatus as claimed in any preceding claim wherein said recognition data includes recognition status indicators to indicate whether each recognised character is a character finally selected as recognised by said speech recognition engine or a character which is the most likely at that time but which is still being recognised by said speech recognition engine , the apparatus including status detection means for detecting said recognition status indicators , and display control means (calculating circuit, control means, distance calculation) to control said display means to display characters which are still being recognised differently to characters which have been recognised , said link means being responsive to said recognition status indicators to link the recognised characters to the corresponding audio component in the audio data .

US20020099542A1
CLAIM 9
. Data processing apparatus as claimed in any preceding claim wherein said recognition data includes a likelihood indicator for each character in the character string indicating the likelihood that the character is correct , and said link means stores the likelihood indicators , the apparatus including automatic error detection means for detecting possible errors in recognition of characters in the recognised characters by scanning the likelihood indicators in said link means for the recognised characters and detecting if the likelihood indicator for a character is below a likelihood threshold , whereby said display means highlights the character having a likelihood indicator below the likelihood threshold ;
second user (speech accelerator, digital audio, audio signal, audio time frame, digital audio signal) operable selection means for selecting a character to replace an incorrectly recognised character highlighted in the recognised characters ;
and correction means for replacing the incorrectly recognised character with the selected character to correct the recognised characters .

US20020099542A1
CLAIM 53
. A computer usable medium having computer readable instructions stored therein for causing the processor in a data processing apparatus to process signals defining a string of characters and audio data to store the characters and the audio data , the instructions comprising instructions for a) causing the processor to receive the signals from a speech recognition engine ;
b) causing the processor to generate an image of the characters on a display ;
c) causing the processor to store the characters as a file ;
d) causing the processor to selectively disable one of the display and storage of the characters and the speech recognition engine for a period of time ;
and e) causing the processor to store the audio signal (speech accelerator, digital audio, audio signal, audio time frame, digital audio signal) for the period of time as an audio message associated with the file .

US20020099542A1
CLAIM 75
. A computer usable medium having computer readable instructions stored therein for causing the processor in a data processing apparatus to process signals defining recognition data from a speech recognition engine and audio data to store the recognition data and the audio data , the instructions comprising instructions for a) causing the processor to receive audio data signal (speech accelerator, digital audio, audio signal, audio time frame, digital audio signal) s ;
b) causing the processor to receive recognition data signals from a speech recognition engine ;
and c) selectively causing the processor to store the recognition data signals in a file and to store corresponding audio data signals in storage means , or to store the audio data signals for which there is no corresponding recognition data signals in association with a file of recognition data signals .

US7979277B2
CLAIM 5
. A speech recognition circuit as claimed in claim 1 , wherein the said calculating circuit (control means, said input) is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
US20020099542A1
CLAIM 1
. Data processing apparatus comprising : input means for receiving recognition data from a speech recognition engine and audio data , said recognition data including a string of recognised characters and audio identifiers identifying audio components corresponding to a character component of the recognised characters ;
storage means for storing said audio data received from said input (calculating circuit, control means, distance calculation) means ;
processing means for receiving and processing the input recognised characters to at least one of replace , insert , move and position the recognised characters to form a processed character string ;
link means for forming link data linking the audio identifiers to the character component positions in the character string and for updating said link data after processing to maintain the link between the audio identifiers and the character component positions in the processed character string ;
display means for displaying the characters received and processed by said processing means ;
user operable selection means for selecting characters in the displayed characters for audio playback , where said link data identifies any selected audio components , if present , which are linked to the selected characters ;
and audio playback means for playing back the selected audio components in the order of the character component positions in the character string or the processed character string .

US20020099542A1
CLAIM 7
. Data processing apparatus as claimed in any preceding claim wherein said recognition data includes recognition status indicators to indicate whether each recognised character is a character finally selected as recognised by said speech recognition engine or a character which is the most likely at that time but which is still being recognised by said speech recognition engine , the apparatus including status detection means for detecting said recognition status indicators , and display control means (calculating circuit, control means, distance calculation) to control said display means to display characters which are still being recognised differently to characters which have been recognised , said link means being responsive to said recognition status indicators to link the recognised characters to the corresponding audio component in the audio data .

US7979277B2
CLAIM 6
. The speech recognition circuit of claim 1 , comprising control means (control means, said input) adapted to implement frame dropping (display control) , to discard one or more audio time frames .
US20020099542A1
CLAIM 1
. Data processing apparatus comprising : input means for receiving recognition data from a speech recognition engine and audio data , said recognition data including a string of recognised characters and audio identifiers identifying audio components corresponding to a character component of the recognised characters ;
storage means for storing said audio data received from said input (calculating circuit, control means, distance calculation) means ;
processing means for receiving and processing the input recognised characters to at least one of replace , insert , move and position the recognised characters to form a processed character string ;
link means for forming link data linking the audio identifiers to the character component positions in the character string and for updating said link data after processing to maintain the link between the audio identifiers and the character component positions in the processed character string ;
display means for displaying the characters received and processed by said processing means ;
user operable selection means for selecting characters in the displayed characters for audio playback , where said link data identifies any selected audio components , if present , which are linked to the selected characters ;
and audio playback means for playing back the selected audio components in the order of the character component positions in the character string or the processed character string .

US20020099542A1
CLAIM 7
. Data processing apparatus as claimed in any preceding claim wherein said recognition data includes recognition status indicators to indicate whether each recognised character is a character finally selected as recognised by said speech recognition engine or a character which is the most likely at that time but which is still being recognised by said speech recognition engine , the apparatus including status detection means for detecting said recognition status indicators , and display control (frame dropping) means to control said display means to display characters which are still being recognised differently to characters which have been recognised , said link means being responsive to said recognition status indicators to link the recognised characters to the corresponding audio component in the audio data .

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal (audio signal, audio data signal, second user) for a predetermined time frame .
US20020099542A1
CLAIM 9
. Data processing apparatus as claimed in any preceding claim wherein said recognition data includes a likelihood indicator for each character in the character string indicating the likelihood that the character is correct , and said link means stores the likelihood indicators , the apparatus including automatic error detection means for detecting possible errors in recognition of characters in the recognised characters by scanning the likelihood indicators in said link means for the recognised characters and detecting if the likelihood indicator for a character is below a likelihood threshold , whereby said display means highlights the character having a likelihood indicator below the likelihood threshold ;
second user (speech accelerator, digital audio, audio signal, audio time frame, digital audio signal) operable selection means for selecting a character to replace an incorrectly recognised character highlighted in the recognised characters ;
and correction means for replacing the incorrectly recognised character with the selected character to correct the recognised characters .

US20020099542A1
CLAIM 53
. A computer usable medium having computer readable instructions stored therein for causing the processor in a data processing apparatus to process signals defining a string of characters and audio data to store the characters and the audio data , the instructions comprising instructions for a) causing the processor to receive the signals from a speech recognition engine ;
b) causing the processor to generate an image of the characters on a display ;
c) causing the processor to store the characters as a file ;
d) causing the processor to selectively disable one of the display and storage of the characters and the speech recognition engine for a period of time ;
and e) causing the processor to store the audio signal (speech accelerator, digital audio, audio signal, audio time frame, digital audio signal) for the period of time as an audio message associated with the file .

US20020099542A1
CLAIM 75
. A computer usable medium having computer readable instructions stored therein for causing the processor in a data processing apparatus to process signals defining recognition data from a speech recognition engine and audio data to store the recognition data and the audio data , the instructions comprising instructions for a) causing the processor to receive audio data signal (speech accelerator, digital audio, audio signal, audio time frame, digital audio signal) s ;
b) causing the processor to receive recognition data signals from a speech recognition engine ;
and c) selectively causing the processor to store the recognition data signals in a file and to store corresponding audio data signals in storage means , or to store the audio data signals for which there is no corresponding recognition data signals in association with a file of recognition data signals .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator (audio signal, audio data signal, second user) has an interrupt signal (processing data) to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
US20020099542A1
CLAIM 9
. Data processing apparatus as claimed in any preceding claim wherein said recognition data includes a likelihood indicator for each character in the character string indicating the likelihood that the character is correct , and said link means stores the likelihood indicators , the apparatus including automatic error detection means for detecting possible errors in recognition of characters in the recognised characters by scanning the likelihood indicators in said link means for the recognised characters and detecting if the likelihood indicator for a character is below a likelihood threshold , whereby said display means highlights the character having a likelihood indicator below the likelihood threshold ;
second user (speech accelerator, digital audio, audio signal, audio time frame, digital audio signal) operable selection means for selecting a character to replace an incorrectly recognised character highlighted in the recognised characters ;
and correction means for replacing the incorrectly recognised character with the selected character to correct the recognised characters .

US20020099542A1
CLAIM 35
. A method of processing data (interrupt signal) comprising the steps of : at an author work station , carrying out the method as claimed in claim 23 wherein the recognised characters , the link data and the audio data are stored ;
and at an editor work station , obtaining the stored characters , link data and audio data from the author work station ;
inputting the characters into a processor ;
linking the audio data to the character component positions using the link data ;
displaying the characters being processed ;
selecting any displayed characters which have been incorrectly recognised ;
playing back any audio component corresponding to the selected characters to aid correction ;
correcting the incorrectly recognised characters ;
storing the corrected characters and the audio identifier for the audio component corresponding to the corrected character in a character correction file ;
and transferring the character correction file to the author work station for later updating of models used by said speech recognition engine ;
wherein , at a later time , said character correction file is read at said author work station to pass the data contained therein to said speech recognition engine for updating of said models .

US20020099542A1
CLAIM 53
. A computer usable medium having computer readable instructions stored therein for causing the processor in a data processing apparatus to process signals defining a string of characters and audio data to store the characters and the audio data , the instructions comprising instructions for a) causing the processor to receive the signals from a speech recognition engine ;
b) causing the processor to generate an image of the characters on a display ;
c) causing the processor to store the characters as a file ;
d) causing the processor to selectively disable one of the display and storage of the characters and the speech recognition engine for a period of time ;
and e) causing the processor to store the audio signal (speech accelerator, digital audio, audio signal, audio time frame, digital audio signal) for the period of time as an audio message associated with the file .

US20020099542A1
CLAIM 75
. A computer usable medium having computer readable instructions stored therein for causing the processor in a data processing apparatus to process signals defining recognition data from a speech recognition engine and audio data to store the recognition data and the audio data , the instructions comprising instructions for a) causing the processor to receive audio data signal (speech accelerator, digital audio, audio signal, audio time frame, digital audio signal) s ;
b) causing the processor to receive recognition data signals from a speech recognition engine ;
and c) selectively causing the processor to store the recognition data signals in a file and to store corresponding audio data signals in storage means , or to store the audio data signals for which there is no corresponding recognition data signals in association with a file of recognition data signals .

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end is configured to input a digital audio (audio signal, audio data signal, second user) signal .
US20020099542A1
CLAIM 9
. Data processing apparatus as claimed in any preceding claim wherein said recognition data includes a likelihood indicator for each character in the character string indicating the likelihood that the character is correct , and said link means stores the likelihood indicators , the apparatus including automatic error detection means for detecting possible errors in recognition of characters in the recognised characters by scanning the likelihood indicators in said link means for the recognised characters and detecting if the likelihood indicator for a character is below a likelihood threshold , whereby said display means highlights the character having a likelihood indicator below the likelihood threshold ;
second user (speech accelerator, digital audio, audio signal, audio time frame, digital audio signal) operable selection means for selecting a character to replace an incorrectly recognised character highlighted in the recognised characters ;
and correction means for replacing the incorrectly recognised character with the selected character to correct the recognised characters .

US20020099542A1
CLAIM 53
. A computer usable medium having computer readable instructions stored therein for causing the processor in a data processing apparatus to process signals defining a string of characters and audio data to store the characters and the audio data , the instructions comprising instructions for a) causing the processor to receive the signals from a speech recognition engine ;
b) causing the processor to generate an image of the characters on a display ;
c) causing the processor to store the characters as a file ;
d) causing the processor to selectively disable one of the display and storage of the characters and the speech recognition engine for a period of time ;
and e) causing the processor to store the audio signal (speech accelerator, digital audio, audio signal, audio time frame, digital audio signal) for the period of time as an audio message associated with the file .

US20020099542A1
CLAIM 75
. A computer usable medium having computer readable instructions stored therein for causing the processor in a data processing apparatus to process signals defining recognition data from a speech recognition engine and audio data to store the recognition data and the audio data , the instructions comprising instructions for a) causing the processor to receive audio data signal (speech accelerator, digital audio, audio signal, audio time frame, digital audio signal) s ;
b) causing the processor to receive recognition data signals from a speech recognition engine ;
and c) selectively causing the processor to store the recognition data signals in a file and to store corresponding audio data signals in storage means , or to store the audio data signals for which there is no corresponding recognition data signals in association with a file of recognition data signals .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal (audio signal, audio data signal, second user) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (audio signal, audio data signal, second user) ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US20020099542A1
CLAIM 9
. Data processing apparatus as claimed in any preceding claim wherein said recognition data includes a likelihood indicator for each character in the character string indicating the likelihood that the character is correct , and said link means stores the likelihood indicators , the apparatus including automatic error detection means for detecting possible errors in recognition of characters in the recognised characters by scanning the likelihood indicators in said link means for the recognised characters and detecting if the likelihood indicator for a character is below a likelihood threshold , whereby said display means highlights the character having a likelihood indicator below the likelihood threshold ;
second user (speech accelerator, digital audio, audio signal, audio time frame, digital audio signal) operable selection means for selecting a character to replace an incorrectly recognised character highlighted in the recognised characters ;
and correction means for replacing the incorrectly recognised character with the selected character to correct the recognised characters .

US20020099542A1
CLAIM 53
. A computer usable medium having computer readable instructions stored therein for causing the processor in a data processing apparatus to process signals defining a string of characters and audio data to store the characters and the audio data , the instructions comprising instructions for a) causing the processor to receive the signals from a speech recognition engine ;
b) causing the processor to generate an image of the characters on a display ;
c) causing the processor to store the characters as a file ;
d) causing the processor to selectively disable one of the display and storage of the characters and the speech recognition engine for a period of time ;
and e) causing the processor to store the audio signal (speech accelerator, digital audio, audio signal, audio time frame, digital audio signal) for the period of time as an audio message associated with the file .

US20020099542A1
CLAIM 75
. A computer usable medium having computer readable instructions stored therein for causing the processor in a data processing apparatus to process signals defining recognition data from a speech recognition engine and audio data to store the recognition data and the audio data , the instructions comprising instructions for a) causing the processor to receive audio data signal (speech accelerator, digital audio, audio signal, audio time frame, digital audio signal) s ;
b) causing the processor to receive recognition data signals from a speech recognition engine ;
and c) selectively causing the processor to store the recognition data signals in a file and to store corresponding audio data signals in storage means , or to store the audio data signals for which there is no corresponding recognition data signals in association with a file of recognition data signals .

US7979277B2
CLAIM 15
. A speech recognition method (speech recognition method) , comprising : calculating a feature vector from an audio signal (audio signal, audio data signal, second user) using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (audio signal, audio data signal, second user) ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit (control means, said input) ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US20020099542A1
CLAIM 1
. Data processing apparatus comprising : input means for receiving recognition data from a speech recognition engine and audio data , said recognition data including a string of recognised characters and audio identifiers identifying audio components corresponding to a character component of the recognised characters ;
storage means for storing said audio data received from said input (calculating circuit, control means, distance calculation) means ;
processing means for receiving and processing the input recognised characters to at least one of replace , insert , move and position the recognised characters to form a processed character string ;
link means for forming link data linking the audio identifiers to the character component positions in the character string and for updating said link data after processing to maintain the link between the audio identifiers and the character component positions in the processed character string ;
display means for displaying the characters received and processed by said processing means ;
user operable selection means for selecting characters in the displayed characters for audio playback , where said link data identifies any selected audio components , if present , which are linked to the selected characters ;
and audio playback means for playing back the selected audio components in the order of the character component positions in the character string or the processed character string .

US20020099542A1
CLAIM 7
. Data processing apparatus as claimed in any preceding claim wherein said recognition data includes recognition status indicators to indicate whether each recognised character is a character finally selected as recognised by said speech recognition engine or a character which is the most likely at that time but which is still being recognised by said speech recognition engine , the apparatus including status detection means for detecting said recognition status indicators , and display control means (calculating circuit, control means, distance calculation) to control said display means to display characters which are still being recognised differently to characters which have been recognised , said link means being responsive to said recognition status indicators to link the recognised characters to the corresponding audio component in the audio data .

US20020099542A1
CLAIM 9
. Data processing apparatus as claimed in any preceding claim wherein said recognition data includes a likelihood indicator for each character in the character string indicating the likelihood that the character is correct , and said link means stores the likelihood indicators , the apparatus including automatic error detection means for detecting possible errors in recognition of characters in the recognised characters by scanning the likelihood indicators in said link means for the recognised characters and detecting if the likelihood indicator for a character is below a likelihood threshold , whereby said display means highlights the character having a likelihood indicator below the likelihood threshold ;
second user (speech accelerator, digital audio, audio signal, audio time frame, digital audio signal) operable selection means for selecting a character to replace an incorrectly recognised character highlighted in the recognised characters ;
and correction means for replacing the incorrectly recognised character with the selected character to correct the recognised characters .

US20020099542A1
CLAIM 53
. A computer usable medium having computer readable instructions stored therein for causing the processor in a data processing apparatus to process signals defining a string of characters and audio data to store the characters and the audio data , the instructions comprising instructions for a) causing the processor to receive the signals from a speech recognition engine ;
b) causing the processor to generate an image of the characters on a display ;
c) causing the processor to store the characters as a file ;
d) causing the processor to selectively disable one of the display and storage of the characters and the speech recognition engine for a period of time ;
and e) causing the processor to store the audio signal (speech accelerator, digital audio, audio signal, audio time frame, digital audio signal) for the period of time as an audio message associated with the file .

US20020099542A1
CLAIM 72
. A speech recognition method (speech recognition method) comprising the steps of inputting speech data ;
selectively performing speech recognition on input speech data to generate the recognised characters ;
visibly outputting the recognised characters ;
storing recognised characters corresponding to a portion of input speech data as a file when speech recognition is performed ;
and storing a portion of the input speech data in association with a file of recognised characters as an audio message when speech recognition is not performed .

US20020099542A1
CLAIM 75
. A computer usable medium having computer readable instructions stored therein for causing the processor in a data processing apparatus to process signals defining recognition data from a speech recognition engine and audio data to store the recognition data and the audio data , the instructions comprising instructions for a) causing the processor to receive audio data signal (speech accelerator, digital audio, audio signal, audio time frame, digital audio signal) s ;
b) causing the processor to receive recognition data signals from a speech recognition engine ;
and c) selectively causing the processor to store the recognition data signals in a file and to store corresponding audio data signals in storage means , or to store the audio data signals for which there is no corresponding recognition data signals in association with a file of recognition data signals .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method (speech recognition method) , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal (audio signal, audio data signal, second user) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (audio signal, audio data signal, second user) ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation (control means, said input) , and to the word identification .
US20020099542A1
CLAIM 1
. Data processing apparatus comprising : input means for receiving recognition data from a speech recognition engine and audio data , said recognition data including a string of recognised characters and audio identifiers identifying audio components corresponding to a character component of the recognised characters ;
storage means for storing said audio data received from said input (calculating circuit, control means, distance calculation) means ;
processing means for receiving and processing the input recognised characters to at least one of replace , insert , move and position the recognised characters to form a processed character string ;
link means for forming link data linking the audio identifiers to the character component positions in the character string and for updating said link data after processing to maintain the link between the audio identifiers and the character component positions in the processed character string ;
display means for displaying the characters received and processed by said processing means ;
user operable selection means for selecting characters in the displayed characters for audio playback , where said link data identifies any selected audio components , if present , which are linked to the selected characters ;
and audio playback means for playing back the selected audio components in the order of the character component positions in the character string or the processed character string .

US20020099542A1
CLAIM 7
. Data processing apparatus as claimed in any preceding claim wherein said recognition data includes recognition status indicators to indicate whether each recognised character is a character finally selected as recognised by said speech recognition engine or a character which is the most likely at that time but which is still being recognised by said speech recognition engine , the apparatus including status detection means for detecting said recognition status indicators , and display control means (calculating circuit, control means, distance calculation) to control said display means to display characters which are still being recognised differently to characters which have been recognised , said link means being responsive to said recognition status indicators to link the recognised characters to the corresponding audio component in the audio data .

US20020099542A1
CLAIM 9
. Data processing apparatus as claimed in any preceding claim wherein said recognition data includes a likelihood indicator for each character in the character string indicating the likelihood that the character is correct , and said link means stores the likelihood indicators , the apparatus including automatic error detection means for detecting possible errors in recognition of characters in the recognised characters by scanning the likelihood indicators in said link means for the recognised characters and detecting if the likelihood indicator for a character is below a likelihood threshold , whereby said display means highlights the character having a likelihood indicator below the likelihood threshold ;
second user (speech accelerator, digital audio, audio signal, audio time frame, digital audio signal) operable selection means for selecting a character to replace an incorrectly recognised character highlighted in the recognised characters ;
and correction means for replacing the incorrectly recognised character with the selected character to correct the recognised characters .

US20020099542A1
CLAIM 53
. A computer usable medium having computer readable instructions stored therein for causing the processor in a data processing apparatus to process signals defining a string of characters and audio data to store the characters and the audio data , the instructions comprising instructions for a) causing the processor to receive the signals from a speech recognition engine ;
b) causing the processor to generate an image of the characters on a display ;
c) causing the processor to store the characters as a file ;
d) causing the processor to selectively disable one of the display and storage of the characters and the speech recognition engine for a period of time ;
and e) causing the processor to store the audio signal (speech accelerator, digital audio, audio signal, audio time frame, digital audio signal) for the period of time as an audio message associated with the file .

US20020099542A1
CLAIM 72
. A speech recognition method (speech recognition method) comprising the steps of inputting speech data ;
selectively performing speech recognition on input speech data to generate the recognised characters ;
visibly outputting the recognised characters ;
storing recognised characters corresponding to a portion of input speech data as a file when speech recognition is performed ;
and storing a portion of the input speech data in association with a file of recognised characters as an audio message when speech recognition is not performed .

US20020099542A1
CLAIM 75
. A computer usable medium having computer readable instructions stored therein for causing the processor in a data processing apparatus to process signals defining recognition data from a speech recognition engine and audio data to store the recognition data and the audio data , the instructions comprising instructions for a) causing the processor to receive audio data signal (speech accelerator, digital audio, audio signal, audio time frame, digital audio signal) s ;
b) causing the processor to receive recognition data signals from a speech recognition engine ;
and c) selectively causing the processor to store the recognition data signals in a file and to store corresponding audio data signals in storage means , or to store the audio data signals for which there is no corresponding recognition data signals in association with a file of recognition data signals .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
CN1493071A

Filed: 2002-02-20     Issued: 2004-04-28

用于语音识别系统的语音命令鉴别器

(Original Assignee) SUNWOO TECHNO Inc     (Current Assignee) SUNWOO TECHNO Inc

丁华镇
US7979277B2
CLAIM 1
. A speech recognition circuit (进行识别, 以识别) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
CN1493071A
CLAIM 1
. 一种用于声音输出系统的语音命令鉴别器,该声音输出系统具有执行预定功能的内部电路;音频信号发生器,它根据所述内部电路提供的信号生成可听声频率的声音信号;把所述声音信号输出为可听声音的扬声器;接收外部声音并将其转换为电信号的麦克风;以及用以识别 (word identification, speech recognition circuit, speech recognition method) 包含于来自所述麦克风的所述电信号中的目标信号的声音识别器,所述语音命令鉴别器包括:存储器,具有预定的存储容量;微处理器,用于管理所述的存储器,并生成至少一种控制信号;第一模数转换器,响应于所述微处理器的控制,接收来自所述音频信号发生器的所述声音信号,并将其转换为数字信号;加法器,响应于所述微处理器的控制,接收来自所述麦克风的所述电信号,并输出要由所述声音识别器进行识别 (word identification, speech recognition circuit, speech recognition method) 的所述目标信号;第二模数转换器,用于接收所述的目标信号,并将其转换为数字信号;第一和第二数模转换器,分别响应于所述微处理器的控制,将从所述存储器中读取的数据转换为模拟信号;和输出选择开关,响应于所述微处理器的控制,在所述第二数模转换器输出的信号与所述音频信号发生器输出的信号中选择其中之一。

US7979277B2
CLAIM 2
. A speech recognition circuit (进行识别, 以识别) as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor .
CN1493071A
CLAIM 1
. 一种用于声音输出系统的语音命令鉴别器,该声音输出系统具有执行预定功能的内部电路;音频信号发生器,它根据所述内部电路提供的信号生成可听声频率的声音信号;把所述声音信号输出为可听声音的扬声器;接收外部声音并将其转换为电信号的麦克风;以及用以识别 (word identification, speech recognition circuit, speech recognition method) 包含于来自所述麦克风的所述电信号中的目标信号的声音识别器,所述语音命令鉴别器包括:存储器,具有预定的存储容量;微处理器,用于管理所述的存储器,并生成至少一种控制信号;第一模数转换器,响应于所述微处理器的控制,接收来自所述音频信号发生器的所述声音信号,并将其转换为数字信号;加法器,响应于所述微处理器的控制,接收来自所述麦克风的所述电信号,并输出要由所述声音识别器进行识别 (word identification, speech recognition circuit, speech recognition method) 的所述目标信号;第二模数转换器,用于接收所述的目标信号,并将其转换为数字信号;第一和第二数模转换器,分别响应于所述微处理器的控制,将从所述存储器中读取的数据转换为模拟信号;和输出选择开关,响应于所述微处理器的控制,在所述第二数模转换器输出的信号与所述音频信号发生器输出的信号中选择其中之一。

US7979277B2
CLAIM 3
. A speech recognition circuit (进行识别, 以识别) as claimed in claim 1 , comprising dynamic scheduling whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
CN1493071A
CLAIM 1
. 一种用于声音输出系统的语音命令鉴别器,该声音输出系统具有执行预定功能的内部电路;音频信号发生器,它根据所述内部电路提供的信号生成可听声频率的声音信号;把所述声音信号输出为可听声音的扬声器;接收外部声音并将其转换为电信号的麦克风;以及用以识别 (word identification, speech recognition circuit, speech recognition method) 包含于来自所述麦克风的所述电信号中的目标信号的声音识别器,所述语音命令鉴别器包括:存储器,具有预定的存储容量;微处理器,用于管理所述的存储器,并生成至少一种控制信号;第一模数转换器,响应于所述微处理器的控制,接收来自所述音频信号发生器的所述声音信号,并将其转换为数字信号;加法器,响应于所述微处理器的控制,接收来自所述麦克风的所述电信号,并输出要由所述声音识别器进行识别 (word identification, speech recognition circuit, speech recognition method) 的所述目标信号;第二模数转换器,用于接收所述的目标信号,并将其转换为数字信号;第一和第二数模转换器,分别响应于所述微处理器的控制,将从所述存储器中读取的数据转换为模拟信号;和输出选择开关,响应于所述微处理器的控制,在所述第二数模转换器输出的信号与所述音频信号发生器输出的信号中选择其中之一。

US7979277B2
CLAIM 4
. A speech recognition circuit (进行识别, 以识别) as claimed in claim 1 , wherein the first processor supports multi-threaded operation , and runs the search stage and front ends as separate threads .
CN1493071A
CLAIM 1
. 一种用于声音输出系统的语音命令鉴别器,该声音输出系统具有执行预定功能的内部电路;音频信号发生器,它根据所述内部电路提供的信号生成可听声频率的声音信号;把所述声音信号输出为可听声音的扬声器;接收外部声音并将其转换为电信号的麦克风;以及用以识别 (word identification, speech recognition circuit, speech recognition method) 包含于来自所述麦克风的所述电信号中的目标信号的声音识别器,所述语音命令鉴别器包括:存储器,具有预定的存储容量;微处理器,用于管理所述的存储器,并生成至少一种控制信号;第一模数转换器,响应于所述微处理器的控制,接收来自所述音频信号发生器的所述声音信号,并将其转换为数字信号;加法器,响应于所述微处理器的控制,接收来自所述麦克风的所述电信号,并输出要由所述声音识别器进行识别 (word identification, speech recognition circuit, speech recognition method) 的所述目标信号;第二模数转换器,用于接收所述的目标信号,并将其转换为数字信号;第一和第二数模转换器,分别响应于所述微处理器的控制,将从所述存储器中读取的数据转换为模拟信号;和输出选择开关,响应于所述微处理器的控制,在所述第二数模转换器输出的信号与所述音频信号发生器输出的信号中选择其中之一。

US7979277B2
CLAIM 5
. A speech recognition circuit (进行识别, 以识别) as claimed in claim 1 , wherein the said calculating circuit is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
CN1493071A
CLAIM 1
. 一种用于声音输出系统的语音命令鉴别器,该声音输出系统具有执行预定功能的内部电路;音频信号发生器,它根据所述内部电路提供的信号生成可听声频率的声音信号;把所述声音信号输出为可听声音的扬声器;接收外部声音并将其转换为电信号的麦克风;以及用以识别 (word identification, speech recognition circuit, speech recognition method) 包含于来自所述麦克风的所述电信号中的目标信号的声音识别器,所述语音命令鉴别器包括:存储器,具有预定的存储容量;微处理器,用于管理所述的存储器,并生成至少一种控制信号;第一模数转换器,响应于所述微处理器的控制,接收来自所述音频信号发生器的所述声音信号,并将其转换为数字信号;加法器,响应于所述微处理器的控制,接收来自所述麦克风的所述电信号,并输出要由所述声音识别器进行识别 (word identification, speech recognition circuit, speech recognition method) 的所述目标信号;第二模数转换器,用于接收所述的目标信号,并将其转换为数字信号;第一和第二数模转换器,分别响应于所述微处理器的控制,将从所述存储器中读取的数据转换为模拟信号;和输出选择开关,响应于所述微处理器的控制,在所述第二数模转换器输出的信号与所述音频信号发生器输出的信号中选择其中之一。

US7979277B2
CLAIM 6
. The speech recognition circuit (进行识别, 以识别) of claim 1 , comprising control means adapted to implement frame dropping , to discard one or more audio time frames .
CN1493071A
CLAIM 1
. 一种用于声音输出系统的语音命令鉴别器,该声音输出系统具有执行预定功能的内部电路;音频信号发生器,它根据所述内部电路提供的信号生成可听声频率的声音信号;把所述声音信号输出为可听声音的扬声器;接收外部声音并将其转换为电信号的麦克风;以及用以识别 (word identification, speech recognition circuit, speech recognition method) 包含于来自所述麦克风的所述电信号中的目标信号的声音识别器,所述语音命令鉴别器包括:存储器,具有预定的存储容量;微处理器,用于管理所述的存储器,并生成至少一种控制信号;第一模数转换器,响应于所述微处理器的控制,接收来自所述音频信号发生器的所述声音信号,并将其转换为数字信号;加法器,响应于所述微处理器的控制,接收来自所述麦克风的所述电信号,并输出要由所述声音识别器进行识别 (word identification, speech recognition circuit, speech recognition method) 的所述目标信号;第二模数转换器,用于接收所述的目标信号,并将其转换为数字信号;第一和第二数模转换器,分别响应于所述微处理器的控制,将从所述存储器中读取的数据转换为模拟信号;和输出选择开关,响应于所述微处理器的控制,在所述第二数模转换器输出的信号与所述音频信号发生器输出的信号中选择其中之一。

US7979277B2
CLAIM 7
. The speech recognition circuit (进行识别, 以识别) of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal for a predetermined time frame .
CN1493071A
CLAIM 1
. 一种用于声音输出系统的语音命令鉴别器,该声音输出系统具有执行预定功能的内部电路;音频信号发生器,它根据所述内部电路提供的信号生成可听声频率的声音信号;把所述声音信号输出为可听声音的扬声器;接收外部声音并将其转换为电信号的麦克风;以及用以识别 (word identification, speech recognition circuit, speech recognition method) 包含于来自所述麦克风的所述电信号中的目标信号的声音识别器,所述语音命令鉴别器包括:存储器,具有预定的存储容量;微处理器,用于管理所述的存储器,并生成至少一种控制信号;第一模数转换器,响应于所述微处理器的控制,接收来自所述音频信号发生器的所述声音信号,并将其转换为数字信号;加法器,响应于所述微处理器的控制,接收来自所述麦克风的所述电信号,并输出要由所述声音识别器进行识别 (word identification, speech recognition circuit, speech recognition method) 的所述目标信号;第二模数转换器,用于接收所述的目标信号,并将其转换为数字信号;第一和第二数模转换器,分别响应于所述微处理器的控制,将从所述存储器中读取的数据转换为模拟信号;和输出选择开关,响应于所述微处理器的控制,在所述第二数模转换器输出的信号与所述音频信号发生器输出的信号中选择其中之一。

US7979277B2
CLAIM 8
. The speech recognition circuit (进行识别, 以识别) of claim 1 , wherein the processor is configured to divert to another task if the data flow stalls .
CN1493071A
CLAIM 1
. 一种用于声音输出系统的语音命令鉴别器,该声音输出系统具有执行预定功能的内部电路;音频信号发生器,它根据所述内部电路提供的信号生成可听声频率的声音信号;把所述声音信号输出为可听声音的扬声器;接收外部声音并将其转换为电信号的麦克风;以及用以识别 (word identification, speech recognition circuit, speech recognition method) 包含于来自所述麦克风的所述电信号中的目标信号的声音识别器,所述语音命令鉴别器包括:存储器,具有预定的存储容量;微处理器,用于管理所述的存储器,并生成至少一种控制信号;第一模数转换器,响应于所述微处理器的控制,接收来自所述音频信号发生器的所述声音信号,并将其转换为数字信号;加法器,响应于所述微处理器的控制,接收来自所述麦克风的所述电信号,并输出要由所述声音识别器进行识别 (word identification, speech recognition circuit, speech recognition method) 的所述目标信号;第二模数转换器,用于接收所述的目标信号,并将其转换为数字信号;第一和第二数模转换器,分别响应于所述微处理器的控制,将从所述存储器中读取的数据转换为模拟信号;和输出选择开关,响应于所述微处理器的控制,在所述第二数模转换器输出的信号与所述音频信号发生器输出的信号中选择其中之一。

US7979277B2
CLAIM 9
. The speech recognition circuit (进行识别, 以识别) of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
CN1493071A
CLAIM 1
. 一种用于声音输出系统的语音命令鉴别器,该声音输出系统具有执行预定功能的内部电路;音频信号发生器,它根据所述内部电路提供的信号生成可听声频率的声音信号;把所述声音信号输出为可听声音的扬声器;接收外部声音并将其转换为电信号的麦克风;以及用以识别 (word identification, speech recognition circuit, speech recognition method) 包含于来自所述麦克风的所述电信号中的目标信号的声音识别器,所述语音命令鉴别器包括:存储器,具有预定的存储容量;微处理器,用于管理所述的存储器,并生成至少一种控制信号;第一模数转换器,响应于所述微处理器的控制,接收来自所述音频信号发生器的所述声音信号,并将其转换为数字信号;加法器,响应于所述微处理器的控制,接收来自所述麦克风的所述电信号,并输出要由所述声音识别器进行识别 (word identification, speech recognition circuit, speech recognition method) 的所述目标信号;第二模数转换器,用于接收所述的目标信号,并将其转换为数字信号;第一和第二数模转换器,分别响应于所述微处理器的控制,将从所述存储器中读取的数据转换为模拟信号;和输出选择开关,响应于所述微处理器的控制,在所述第二数模转换器输出的信号与所述音频信号发生器输出的信号中选择其中之一。

US7979277B2
CLAIM 10
. The speech recognition circuit (进行识别, 以识别) of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory (判定结果) .
CN1493071A
CLAIM 1
. 一种用于声音输出系统的语音命令鉴别器,该声音输出系统具有执行预定功能的内部电路;音频信号发生器,它根据所述内部电路提供的信号生成可听声频率的声音信号;把所述声音信号输出为可听声音的扬声器;接收外部声音并将其转换为电信号的麦克风;以及用以识别 (word identification, speech recognition circuit, speech recognition method) 包含于来自所述麦克风的所述电信号中的目标信号的声音识别器,所述语音命令鉴别器包括:存储器,具有预定的存储容量;微处理器,用于管理所述的存储器,并生成至少一种控制信号;第一模数转换器,响应于所述微处理器的控制,接收来自所述音频信号发生器的所述声音信号,并将其转换为数字信号;加法器,响应于所述微处理器的控制,接收来自所述麦克风的所述电信号,并输出要由所述声音识别器进行识别 (word identification, speech recognition circuit, speech recognition method) 的所述目标信号;第二模数转换器,用于接收所述的目标信号,并将其转换为数字信号;第一和第二数模转换器,分别响应于所述微处理器的控制,将从所述存储器中读取的数据转换为模拟信号;和输出选择开关,响应于所述微处理器的控制,在所述第二数模转换器输出的信号与所述音频信号发生器输出的信号中选择其中之一。

CN1493071A
CLAIM 6
. 一种用于声音输出系统的语音命令鉴别方法,该声音输出系统具有执行预定功能的内部电路;音频信号发生器,它根据所述内部电路提供的信号生成可听声频率的声音信号;把所述声音信号输出为可听声音的扬声器;接收外部声音并将其转换为电信号的麦克风;以及用以识别包含于来自所述麦克风的所述电信号中的目标信号的声音识别器,所述方法包括以下步骤:(1)判定进行设置操作还是正常操作;如果所述步骤(1)的判定结果 (result memory) 是要执行所述的设置操作,则(1-1)输出预定振幅及宽度的脉冲;以及(1-2)在输出所述脉冲之后,在预定时间内对输入至所述麦克风的信号进行数字化,从而获得由安装环境唯一确定的环境系数;如果所述步骤(1)的判定结果是要执行所述的正常操作,则(2-1)对所述音频信号发生器输出的信号进行模数转换,从而获得数字信号;(2-2)将所述步骤(2-1)中获得的所述数字信号与所述环境系数相乘,并对该乘法结果进行累加;以及(2-3)将累加结果数模转换为模拟信号,并从所述麦克风输出的所述电信号中减去所述模拟信号,从而生成所述目标信号。

US7979277B2
CLAIM 11
. The speech recognition circuit (进行识别, 以识别) of claim 1 , comprising increasing the pipeline depth by computing extra front frames in advance .
CN1493071A
CLAIM 1
. 一种用于声音输出系统的语音命令鉴别器,该声音输出系统具有执行预定功能的内部电路;音频信号发生器,它根据所述内部电路提供的信号生成可听声频率的声音信号;把所述声音信号输出为可听声音的扬声器;接收外部声音并将其转换为电信号的麦克风;以及用以识别 (word identification, speech recognition circuit, speech recognition method) 包含于来自所述麦克风的所述电信号中的目标信号的声音识别器,所述语音命令鉴别器包括:存储器,具有预定的存储容量;微处理器,用于管理所述的存储器,并生成至少一种控制信号;第一模数转换器,响应于所述微处理器的控制,接收来自所述音频信号发生器的所述声音信号,并将其转换为数字信号;加法器,响应于所述微处理器的控制,接收来自所述麦克风的所述电信号,并输出要由所述声音识别器进行识别 (word identification, speech recognition circuit, speech recognition method) 的所述目标信号;第二模数转换器,用于接收所述的目标信号,并将其转换为数字信号;第一和第二数模转换器,分别响应于所述微处理器的控制,将从所述存储器中读取的数据转换为模拟信号;和输出选择开关,响应于所述微处理器的控制,在所述第二数模转换器输出的信号与所述音频信号发生器输出的信号中选择其中之一。

US7979277B2
CLAIM 12
. The speech recognition circuit (进行识别, 以识别) of claim 1 , wherein the audio front end is configured to input a digital audio signal .
CN1493071A
CLAIM 1
. 一种用于声音输出系统的语音命令鉴别器,该声音输出系统具有执行预定功能的内部电路;音频信号发生器,它根据所述内部电路提供的信号生成可听声频率的声音信号;把所述声音信号输出为可听声音的扬声器;接收外部声音并将其转换为电信号的麦克风;以及用以识别 (word identification, speech recognition circuit, speech recognition method) 包含于来自所述麦克风的所述电信号中的目标信号的声音识别器,所述语音命令鉴别器包括:存储器,具有预定的存储容量;微处理器,用于管理所述的存储器,并生成至少一种控制信号;第一模数转换器,响应于所述微处理器的控制,接收来自所述音频信号发生器的所述声音信号,并将其转换为数字信号;加法器,响应于所述微处理器的控制,接收来自所述麦克风的所述电信号,并输出要由所述声音识别器进行识别 (word identification, speech recognition circuit, speech recognition method) 的所述目标信号;第二模数转换器,用于接收所述的目标信号,并将其转换为数字信号;第一和第二数模转换器,分别响应于所述微处理器的控制,将从所述存储器中读取的数据转换为模拟信号;和输出选择开关,响应于所述微处理器的控制,在所述第二数模转换器输出的信号与所述音频信号发生器输出的信号中选择其中之一。

US7979277B2
CLAIM 13
. A speech recognition circuit (进行识别, 以识别) of claim 1 , wherein said distance comprises a Mahalanobis distance .
CN1493071A
CLAIM 1
. 一种用于声音输出系统的语音命令鉴别器,该声音输出系统具有执行预定功能的内部电路;音频信号发生器,它根据所述内部电路提供的信号生成可听声频率的声音信号;把所述声音信号输出为可听声音的扬声器;接收外部声音并将其转换为电信号的麦克风;以及用以识别 (word identification, speech recognition circuit, speech recognition method) 包含于来自所述麦克风的所述电信号中的目标信号的声音识别器,所述语音命令鉴别器包括:存储器,具有预定的存储容量;微处理器,用于管理所述的存储器,并生成至少一种控制信号;第一模数转换器,响应于所述微处理器的控制,接收来自所述音频信号发生器的所述声音信号,并将其转换为数字信号;加法器,响应于所述微处理器的控制,接收来自所述麦克风的所述电信号,并输出要由所述声音识别器进行识别 (word identification, speech recognition circuit, speech recognition method) 的所述目标信号;第二模数转换器,用于接收所述的目标信号,并将其转换为数字信号;第一和第二数模转换器,分别响应于所述微处理器的控制,将从所述存储器中读取的数据转换为模拟信号;和输出选择开关,响应于所述微处理器的控制,在所述第二数模转换器输出的信号与所述音频信号发生器输出的信号中选择其中之一。

US7979277B2
CLAIM 14
. A speech recognition circuit (进行识别, 以识别) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
CN1493071A
CLAIM 1
. 一种用于声音输出系统的语音命令鉴别器,该声音输出系统具有执行预定功能的内部电路;音频信号发生器,它根据所述内部电路提供的信号生成可听声频率的声音信号;把所述声音信号输出为可听声音的扬声器;接收外部声音并将其转换为电信号的麦克风;以及用以识别 (word identification, speech recognition circuit, speech recognition method) 包含于来自所述麦克风的所述电信号中的目标信号的声音识别器,所述语音命令鉴别器包括:存储器,具有预定的存储容量;微处理器,用于管理所述的存储器,并生成至少一种控制信号;第一模数转换器,响应于所述微处理器的控制,接收来自所述音频信号发生器的所述声音信号,并将其转换为数字信号;加法器,响应于所述微处理器的控制,接收来自所述麦克风的所述电信号,并输出要由所述声音识别器进行识别 (word identification, speech recognition circuit, speech recognition method) 的所述目标信号;第二模数转换器,用于接收所述的目标信号,并将其转换为数字信号;第一和第二数模转换器,分别响应于所述微处理器的控制,将从所述存储器中读取的数据转换为模拟信号;和输出选择开关,响应于所述微处理器的控制,在所述第二数模转换器输出的信号与所述音频信号发生器输出的信号中选择其中之一。

US7979277B2
CLAIM 15
. A speech recognition method (进行识别, 以识别) , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
CN1493071A
CLAIM 1
. 一种用于声音输出系统的语音命令鉴别器,该声音输出系统具有执行预定功能的内部电路;音频信号发生器,它根据所述内部电路提供的信号生成可听声频率的声音信号;把所述声音信号输出为可听声音的扬声器;接收外部声音并将其转换为电信号的麦克风;以及用以识别 (word identification, speech recognition circuit, speech recognition method) 包含于来自所述麦克风的所述电信号中的目标信号的声音识别器,所述语音命令鉴别器包括:存储器,具有预定的存储容量;微处理器,用于管理所述的存储器,并生成至少一种控制信号;第一模数转换器,响应于所述微处理器的控制,接收来自所述音频信号发生器的所述声音信号,并将其转换为数字信号;加法器,响应于所述微处理器的控制,接收来自所述麦克风的所述电信号,并输出要由所述声音识别器进行识别 (word identification, speech recognition circuit, speech recognition method) 的所述目标信号;第二模数转换器,用于接收所述的目标信号,并将其转换为数字信号;第一和第二数模转换器,分别响应于所述微处理器的控制,将从所述存储器中读取的数据转换为模拟信号;和输出选择开关,响应于所述微处理器的控制,在所述第二数模转换器输出的信号与所述音频信号发生器输出的信号中选择其中之一。

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method (进行识别, 以识别) , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification (进行识别, 以识别) .
CN1493071A
CLAIM 1
. 一种用于声音输出系统的语音命令鉴别器,该声音输出系统具有执行预定功能的内部电路;音频信号发生器,它根据所述内部电路提供的信号生成可听声频率的声音信号;把所述声音信号输出为可听声音的扬声器;接收外部声音并将其转换为电信号的麦克风;以及用以识别 (word identification, speech recognition circuit, speech recognition method) 包含于来自所述麦克风的所述电信号中的目标信号的声音识别器,所述语音命令鉴别器包括:存储器,具有预定的存储容量;微处理器,用于管理所述的存储器,并生成至少一种控制信号;第一模数转换器,响应于所述微处理器的控制,接收来自所述音频信号发生器的所述声音信号,并将其转换为数字信号;加法器,响应于所述微处理器的控制,接收来自所述麦克风的所述电信号,并输出要由所述声音识别器进行识别 (word identification, speech recognition circuit, speech recognition method) 的所述目标信号;第二模数转换器,用于接收所述的目标信号,并将其转换为数字信号;第一和第二数模转换器,分别响应于所述微处理器的控制,将从所述存储器中读取的数据转换为模拟信号;和输出选择开关,响应于所述微处理器的控制,在所述第二数模转换器输出的信号与所述音频信号发生器输出的信号中选择其中之一。




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
GB2391679A

Filed: 2002-02-04     Issued: 2004-02-11

Speech recognition circuit using parallel processors

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Mark Catchpole
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end (search algorithm) for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances (feature vectors) indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
GB2391679A
CLAIM 19
. A speech recognition circuit according to any preceding claim , wherein said input means is arranged to receive said speech parameters as feature vectors (calculating distances) .

GB2391679A
CLAIM 23
. A speech recognition circuit according to any preceding claim , wherein said lexical memory means store said lexical data as Hidden Markov Models , and each processor is operative to performs the Viterbi search algorithm (front end) using a respective part of said lexical data .

US7979277B2
CLAIM 2
. A speech recognition circuit as claimed in claim 1 , wherein the pipelining comprises alternating of front end (search algorithm) and search stage processing on the first processor .
GB2391679A
CLAIM 23
. A speech recognition circuit according to any preceding claim , wherein said lexical memory means store said lexical data as Hidden Markov Models , and each processor is operative to performs the Viterbi search algorithm (front end) using a respective part of said lexical data .

US7979277B2
CLAIM 3
. A speech recognition circuit as claimed in claim 1 , comprising dynamic scheduling whether the first processor should run the front end (search algorithm) or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
GB2391679A
CLAIM 23
. A speech recognition circuit according to any preceding claim , wherein said lexical memory means store said lexical data as Hidden Markov Models , and each processor is operative to performs the Viterbi search algorithm (front end) using a respective part of said lexical data .

US7979277B2
CLAIM 6
. The speech recognition circuit of claim 1 , comprising control means (said input) adapted to implement frame dropping , to discard one or more audio time frames .
GB2391679A
CLAIM 1
. A speech recognition circuit comprising : input means for receiving processed speech parameters ;
a plurality of lexical memory means containing in combination complete lexical data for word recognition , each lexical memory means containing part of said complete lexical data ;
a plurality of processors connected in parallel to said input (control means) means for processing the speech parameters in parallel , said processors being arranged in groups of processors , each group of processors being connected to a lexical memory means ;
control processor means for controlling each processor to process said speech parameters using partial lexical data read from a said lexical memory means ;
and receiving means for receiving the results of the processing of the speech parameters from said processors .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end (search algorithm) that the accelerator is ready to receive a next feature vector from the front end .
GB2391679A
CLAIM 23
. A speech recognition circuit according to any preceding claim , wherein said lexical memory means store said lexical data as Hidden Markov Models , and each processor is operative to performs the Viterbi search algorithm (front end) using a respective part of said lexical data .

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end (search algorithm) is configured to input a digital audio signal .
GB2391679A
CLAIM 23
. A speech recognition circuit according to any preceding claim , wherein said lexical memory means store said lexical data as Hidden Markov Models , and each processor is operative to performs the Viterbi search algorithm (front end) using a respective part of said lexical data .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end (search algorithm) for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
GB2391679A
CLAIM 23
. A speech recognition circuit according to any preceding claim , wherein said lexical memory means store said lexical data as Hidden Markov Models , and each processor is operative to performs the Viterbi search algorithm (front end) using a respective part of said lexical data .

US7979277B2
CLAIM 15
. A speech recognition method (speech recognition method) , comprising : calculating a feature vector from an audio signal using an audio front end (search algorithm) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
GB2391679A
CLAIM 23
. A speech recognition circuit according to any preceding claim , wherein said lexical memory means store said lexical data as Hidden Markov Models , and each processor is operative to performs the Viterbi search algorithm (front end) using a respective part of said lexical data .

GB2391679A
CLAIM 46
. A speech recognition method (speech recognition method) comprising : receiving processed speech parameters ;
storing complete lexical data for word recognition in a plurality of lexical memory means , each lexical memory means containing part of said complete lexical data ;
processing the speech parameters in parallel using a plurality of processors connected in parallel , said processors being arranged in groups of processors , each group of processors being connected to a said lexical memory means ;
controlling each processor to process said speech parameters using partial lexical data read from a said lexical memory means ;
and storing the results of the processing of the speech parameters from said processors .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method (speech recognition method) , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
GB2391679A
CLAIM 46
. A speech recognition method (speech recognition method) comprising : receiving processed speech parameters ;
storing complete lexical data for word recognition in a plurality of lexical memory means , each lexical memory means containing part of said complete lexical data ;
processing the speech parameters in parallel using a plurality of processors connected in parallel , said processors being arranged in groups of processors , each group of processors being connected to a said lexical memory means ;
controlling each processor to process said speech parameters using partial lexical data read from a said lexical memory means ;
and storing the results of the processing of the speech parameters from said processors .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
CN1494711A

Filed: 2002-01-31     Issued: 2004-05-05

使用多模式输入进行多模式焦点检测,参考岐义解析和语气分类的系统和方法

(Original Assignee) International Business Machines Corp     (Current Assignee) International Business Machines Corp

˹���պ���H.����˹, 斯蒂普汉·H.·密斯, V., 卡拉裴兹·V.·尼迪
US7979277B2
CLAIM 1
. A speech recognition circuit (语音识别) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
CN1494711A
CLAIM 33
. 如权利要求32所述的系统,其中一或多个识别操作中的一个包括语音识别 (speech recognition circuit)

US7979277B2
CLAIM 2
. A speech recognition circuit (语音识别) as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor .
CN1494711A
CLAIM 33
. 如权利要求32所述的系统,其中一或多个识别操作中的一个包括语音识别 (speech recognition circuit)

US7979277B2
CLAIM 3
. A speech recognition circuit (语音识别) as claimed in claim 1 , comprising dynamic scheduling whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
CN1494711A
CLAIM 33
. 如权利要求32所述的系统,其中一或多个识别操作中的一个包括语音识别 (speech recognition circuit)

US7979277B2
CLAIM 4
. A speech recognition circuit (语音识别) as claimed in claim 1 , wherein the first processor supports multi-threaded operation , and runs the search stage and front ends as separate threads .
CN1494711A
CLAIM 33
. 如权利要求32所述的系统,其中一或多个识别操作中的一个包括语音识别 (speech recognition circuit)

US7979277B2
CLAIM 5
. A speech recognition circuit (语音识别) as claimed in claim 1 , wherein the said calculating circuit is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
CN1494711A
CLAIM 33
. 如权利要求32所述的系统,其中一或多个识别操作中的一个包括语音识别 (speech recognition circuit)

US7979277B2
CLAIM 6
. The speech recognition circuit (语音识别) of claim 1 , comprising control means adapted to implement frame dropping , to discard one (数据执行) or more audio time frames .
CN1494711A
CLAIM 8
. 如权利要求1所述的系统,其中至少一个处理器还被构造成在进行一或多个确定之前对接收的多模式输入数据执行 (to discard one) 一或多个识别操作。

CN1494711A
CLAIM 33
. 如权利要求32所述的系统,其中一或多个识别操作中的一个包括语音识别 (speech recognition circuit)

US7979277B2
CLAIM 7
. The speech recognition circuit (语音识别) of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal for a predetermined time frame .
CN1494711A
CLAIM 33
. 如权利要求32所述的系统,其中一或多个识别操作中的一个包括语音识别 (speech recognition circuit)

US7979277B2
CLAIM 8
. The speech recognition circuit (语音识别) of claim 1 , wherein the processor is configured to divert to another task if the data flow stalls .
CN1494711A
CLAIM 33
. 如权利要求32所述的系统,其中一或多个识别操作中的一个包括语音识别 (speech recognition circuit)

US7979277B2
CLAIM 9
. The speech recognition circuit (语音识别) of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
CN1494711A
CLAIM 33
. 如权利要求32所述的系统,其中一或多个识别操作中的一个包括语音识别 (speech recognition circuit)

US7979277B2
CLAIM 10
. The speech recognition circuit (语音识别) of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory .
CN1494711A
CLAIM 33
. 如权利要求32所述的系统,其中一或多个识别操作中的一个包括语音识别 (speech recognition circuit)

US7979277B2
CLAIM 11
. The speech recognition circuit (语音识别) of claim 1 , comprising increasing the pipeline depth by computing extra front frames in advance .
CN1494711A
CLAIM 33
. 如权利要求32所述的系统,其中一或多个识别操作中的一个包括语音识别 (speech recognition circuit)

US7979277B2
CLAIM 12
. The speech recognition circuit (语音识别) of claim 1 , wherein the audio front end is configured to input a digital audio (多个音频) signal .
CN1494711A
CLAIM 26
. 如权利要求25所述的系统,其中一或多个音频 (digital audio) 捕捉设备包括一或多个话筒。

CN1494711A
CLAIM 33
. 如权利要求32所述的系统,其中一或多个识别操作中的一个包括语音识别 (speech recognition circuit)

US7979277B2
CLAIM 13
. A speech recognition circuit (语音识别) of claim 1 , wherein said distance comprises a Mahalanobis distance .
CN1494711A
CLAIM 33
. 如权利要求32所述的系统,其中一或多个识别操作中的一个包括语音识别 (speech recognition circuit)

US7979277B2
CLAIM 14
. A speech recognition circuit (语音识别) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means (计算方法, 计算系统) for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
CN1494711A
CLAIM 1
. 一种多模式会话计算系统 (calculating means, feature calculation, distance calculation) ,该系统包括:用户接口子系统,该用户接口子系统被构造成从采用用户接口子系统的环境输入多模式数据,多模式数据包含与第一模式输入传感器相关的数据,和与至少一个第二模式输入传感器相关的数据,并且环境包含一或多个用户和可被多模式系统控制的一或多个设备;至少一个处理器,所述至少一个处理器在操作中被连接到用户接口子系统,并且被构造成:(i)从用户接口子系统接收至少一部分多模式输入数据;(ii)根据至少一部分接收的多模式输入数据确定一或多个用户中的至少一个用户的意图、焦点和语气;和(iii)根据确定的意图,确定的焦点和确定的语气中的至少一个导致在环境中执行一或多个动作;和存储器,该存储器在操作中被连接到至少一个处理器,并且存储与处理器进行的意图、焦点和语气确定相关的至少一部分结果,以备后续确定中可能的使用。

CN1494711A
CLAIM 10
. 一种基于计算机的会话计算方法 (calculating means, feature calculation, distance calculation) ,该方法包括步骤:从包含一或多个用户和一或多个可控制设备的环境获得多模式数据,多模式数据包含与第一模式输入传感器相关的数据,和与至少一个第二模式输入传感器相关的数据;根据至少一部分获得的多模式输入数据确定一或多个用户中的至少一个用户的意图、焦点和语气;根据确定的意图,确定的焦点和确定的语气中的至少一个导致在环境中执行一或多个动作;和存储与意图、焦点和语气确定相关的至少一部分结果,以备后续确定中的可能使用。

CN1494711A
CLAIM 33
. 如权利要求32所述的系统,其中一或多个识别操作中的一个包括语音识别 (speech recognition circuit)

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation (计算方法, 计算系统) , to the distance calculation (计算方法, 计算系统) , and to the word identification .
CN1494711A
CLAIM 1
. 一种多模式会话计算系统 (calculating means, feature calculation, distance calculation) ,该系统包括:用户接口子系统,该用户接口子系统被构造成从采用用户接口子系统的环境输入多模式数据,多模式数据包含与第一模式输入传感器相关的数据,和与至少一个第二模式输入传感器相关的数据,并且环境包含一或多个用户和可被多模式系统控制的一或多个设备;至少一个处理器,所述至少一个处理器在操作中被连接到用户接口子系统,并且被构造成:(i)从用户接口子系统接收至少一部分多模式输入数据;(ii)根据至少一部分接收的多模式输入数据确定一或多个用户中的至少一个用户的意图、焦点和语气;和(iii)根据确定的意图,确定的焦点和确定的语气中的至少一个导致在环境中执行一或多个动作;和存储器,该存储器在操作中被连接到至少一个处理器,并且存储与处理器进行的意图、焦点和语气确定相关的至少一部分结果,以备后续确定中可能的使用。

CN1494711A
CLAIM 10
. 一种基于计算机的会话计算方法 (calculating means, feature calculation, distance calculation) ,该方法包括步骤:从包含一或多个用户和一或多个可控制设备的环境获得多模式数据,多模式数据包含与第一模式输入传感器相关的数据,和与至少一个第二模式输入传感器相关的数据;根据至少一部分获得的多模式输入数据确定一或多个用户中的至少一个用户的意图、焦点和语气;根据确定的意图,确定的焦点和确定的语气中的至少一个导致在环境中执行一或多个动作;和存储与意图、焦点和语气确定相关的至少一部分结果,以备后续确定中的可能使用。




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
EP1215658A2

Filed: 2001-11-29     Issued: 2002-06-19

Visual activation of voice controlled apparatus

(Original Assignee) Hewlett Packard Co     (Current Assignee) HP Inc

Stephen John Hinde, Timothy Alan Wilkinson, Stephen Pollard, Andrew Arthur Hunter
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (same characteristic) ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
EP1215658A2
CLAIM 4
A method according to claim 1 , further involving : detecting when the user is speaking and determining characteristics of the user' ;
s voice ;
where the user is detected as speaking whilst the apparatus is initially enabled for voice control , continuing enablement of the apparatus for voice control following the user ceasing to look towards the apparatus but only in respect of a voice having the same characteristic (time frame) s as that of the voice detected whilst the apparatus was initially enabled , and only whilst that voice continues speaking and for a timeout period thereafter , recommencement of speaking by the same voice during this timeout period continuing enablement of voice control with timing of the timeout period being reset .

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal for a predetermined time frame (same characteristic) .
EP1215658A2
CLAIM 4
A method according to claim 1 , further involving : detecting when the user is speaking and determining characteristics of the user' ;
s voice ;
where the user is detected as speaking whilst the apparatus is initially enabled for voice control , continuing enablement of the apparatus for voice control following the user ceasing to look towards the apparatus but only in respect of a voice having the same characteristic (time frame) s as that of the voice detected whilst the apparatus was initially enabled , and only whilst that voice continues speaking and for a timeout period thereafter , recommencement of speaking by the same voice during this timeout period continuing enablement of voice control with timing of the timeout period being reset .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator (speech recognition means) has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
EP1215658A2
CLAIM 14
A method according to claim 1 , wherein speech recognition means (speech accelerator) of the apparatus ignores voice input from the user unless whilst the user is looking towards the apparatus , the user speaks a predetermined key word .

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end is configured to input a digital audio (said time) signal .
EP1215658A2
CLAIM 24
An arrangement according to claim 22 , further comprising a speaking detector for detecting when a user is speaking , the control means comprising : initial-enablement means for effecting the said initial enabling of the apparatus for voice control ;
delayed-disablement means including timing means for timing a timeout period ;
and means for activating the delayed-disablement means upon the speaking detector detecting a user speaking whilst the apparatus is initially enabled by the initial-enablement means ;
the delayed-disablement means , when activated , being operative to keep the apparatus enabled for voice control following the detection means ceasing to detect that the user is looking towards the apparatus but only whilst the speaking detector continues to detect that the user is speaking and for the duration thereafter of the said time (digital audio, digital audio signal) out period as timed by the timing means , the delayed-disablement means being responsive to the speaking detector detecting recommencement of speaking by the user during this timeout period to reset timing of the timeout period .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (same characteristic) ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
EP1215658A2
CLAIM 4
A method according to claim 1 , further involving : detecting when the user is speaking and determining characteristics of the user' ;
s voice ;
where the user is detected as speaking whilst the apparatus is initially enabled for voice control , continuing enablement of the apparatus for voice control following the user ceasing to look towards the apparatus but only in respect of a voice having the same characteristic (time frame) s as that of the voice detected whilst the apparatus was initially enabled , and only whilst that voice continues speaking and for a timeout period thereafter , recommencement of speaking by the same voice during this timeout period continuing enablement of voice control with timing of the timeout period being reset .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (same characteristic) ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
EP1215658A2
CLAIM 4
A method according to claim 1 , further involving : detecting when the user is speaking and determining characteristics of the user' ;
s voice ;
where the user is detected as speaking whilst the apparatus is initially enabled for voice control , continuing enablement of the apparatus for voice control following the user ceasing to look towards the apparatus but only in respect of a voice having the same characteristic (time frame) s as that of the voice detected whilst the apparatus was initially enabled , and only whilst that voice continues speaking and for a timeout period thereafter , recommencement of speaking by the same voice during this timeout period continuing enablement of voice control with timing of the timeout period being reset .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (same characteristic) ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
EP1215658A2
CLAIM 4
A method according to claim 1 , further involving : detecting when the user is speaking and determining characteristics of the user' ;
s voice ;
where the user is detected as speaking whilst the apparatus is initially enabled for voice control , continuing enablement of the apparatus for voice control following the user ceasing to look towards the apparatus but only in respect of a voice having the same characteristic (time frame) s as that of the voice detected whilst the apparatus was initially enabled , and only whilst that voice continues speaking and for a timeout period thereafter , recommencement of speaking by the same voice during this timeout period continuing enablement of voice control with timing of the timeout period being reset .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6721699B2

Filed: 2001-11-12     Issued: 2004-04-13

Method and system of Chinese speech pitch extraction

(Original Assignee) Intel Corp     (Current Assignee) Intel Corp

Bo Xu, Liang He, Wen Ke
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal (window function) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (square root) ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US6721699B2
CLAIM 1
. A method for Chinese speech pitch extraction , comprising : pre-computing an anti-bias auto-correlation of a Hamming window function (audio signal) ;
for at least one frame , saving a first candidate as an unvoiced candidate , and detecting other voiced candidates from the anti-bias auto-correlation function ;
and calculating a cost value for a pitch path according to a voiced/unvoiced intensity function based on the unvoiced and voice candidates , saving a predetermined number of least-cost paths , and outputting at least a portion of contiguous frames with low time delay .

US6721699B2
CLAIM 4
. The method of claim 1 , wherein the unvoiced intensity function is : I (C 0)=VoicingThreshold+(1 . 0−{square root (time frame) over (NormalizedEnergy)}) 2 (1 . 0−VoicingThreshold) ;
and the voiced intensity function is : I  (C k) = R *  (m k) * (Minimum     Weight + log 10  [ F  (C k) - F min ] log 10  [ (F max) - F min ] * (1 . 0 - Minimum     Weight)) .

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal (window function) for a predetermined time frame (square root) .
US6721699B2
CLAIM 1
. A method for Chinese speech pitch extraction , comprising : pre-computing an anti-bias auto-correlation of a Hamming window function (audio signal) ;
for at least one frame , saving a first candidate as an unvoiced candidate , and detecting other voiced candidates from the anti-bias auto-correlation function ;
and calculating a cost value for a pitch path according to a voiced/unvoiced intensity function based on the unvoiced and voice candidates , saving a predetermined number of least-cost paths , and outputting at least a portion of contiguous frames with low time delay .

US6721699B2
CLAIM 4
. The method of claim 1 , wherein the unvoiced intensity function is : I (C 0)=VoicingThreshold+(1 . 0−{square root (time frame) over (NormalizedEnergy)}) 2 (1 . 0−VoicingThreshold) ;
and the voiced intensity function is : I  (C k) = R *  (m k) * (Minimum     Weight + log 10  [ F  (C k) - F min ] log 10  [ (F max) - F min ] * (1 . 0 - Minimum     Weight)) .

US7979277B2
CLAIM 10
. The speech recognition circuit of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory (speech signal) .
US6721699B2
CLAIM 10
. The method of claim 1 , further comprising : segmenting a speech signal (result memory) into a plurality of frames .

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end is configured to input a digital audio signal (window function) .
US6721699B2
CLAIM 1
. A method for Chinese speech pitch extraction , comprising : pre-computing an anti-bias auto-correlation of a Hamming window function (audio signal) ;
for at least one frame , saving a first candidate as an unvoiced candidate , and detecting other voiced candidates from the anti-bias auto-correlation function ;
and calculating a cost value for a pitch path according to a voiced/unvoiced intensity function based on the unvoiced and voice candidates , saving a predetermined number of least-cost paths , and outputting at least a portion of contiguous frames with low time delay .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal (window function) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (square root) ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US6721699B2
CLAIM 1
. A method for Chinese speech pitch extraction , comprising : pre-computing an anti-bias auto-correlation of a Hamming window function (audio signal) ;
for at least one frame , saving a first candidate as an unvoiced candidate , and detecting other voiced candidates from the anti-bias auto-correlation function ;
and calculating a cost value for a pitch path according to a voiced/unvoiced intensity function based on the unvoiced and voice candidates , saving a predetermined number of least-cost paths , and outputting at least a portion of contiguous frames with low time delay .

US6721699B2
CLAIM 4
. The method of claim 1 , wherein the unvoiced intensity function is : I (C 0)=VoicingThreshold+(1 . 0−{square root (time frame) over (NormalizedEnergy)}) 2 (1 . 0−VoicingThreshold) ;
and the voiced intensity function is : I  (C k) = R *  (m k) * (Minimum     Weight + log 10  [ F  (C k) - F min ] log 10  [ (F max) - F min ] * (1 . 0 - Minimum     Weight)) .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal (window function) using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (square root) ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US6721699B2
CLAIM 1
. A method for Chinese speech pitch extraction , comprising : pre-computing an anti-bias auto-correlation of a Hamming window function (audio signal) ;
for at least one frame , saving a first candidate as an unvoiced candidate , and detecting other voiced candidates from the anti-bias auto-correlation function ;
and calculating a cost value for a pitch path according to a voiced/unvoiced intensity function based on the unvoiced and voice candidates , saving a predetermined number of least-cost paths , and outputting at least a portion of contiguous frames with low time delay .

US6721699B2
CLAIM 4
. The method of claim 1 , wherein the unvoiced intensity function is : I (C 0)=VoicingThreshold+(1 . 0−{square root (time frame) over (NormalizedEnergy)}) 2 (1 . 0−VoicingThreshold) ;
and the voiced intensity function is : I  (C k) = R *  (m k) * (Minimum     Weight + log 10  [ F  (C k) - F min ] log 10  [ (F max) - F min ] * (1 . 0 - Minimum     Weight)) .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal (window function) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (square root) ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
US6721699B2
CLAIM 1
. A method for Chinese speech pitch extraction , comprising : pre-computing an anti-bias auto-correlation of a Hamming window function (audio signal) ;
for at least one frame , saving a first candidate as an unvoiced candidate , and detecting other voiced candidates from the anti-bias auto-correlation function ;
and calculating a cost value for a pitch path according to a voiced/unvoiced intensity function based on the unvoiced and voice candidates , saving a predetermined number of least-cost paths , and outputting at least a portion of contiguous frames with low time delay .

US6721699B2
CLAIM 4
. The method of claim 1 , wherein the unvoiced intensity function is : I (C 0)=VoicingThreshold+(1 . 0−{square root (time frame) over (NormalizedEnergy)}) 2 (1 . 0−VoicingThreshold) ;
and the voiced intensity function is : I  (C k) = R *  (m k) * (Minimum     Weight + log 10  [ F  (C k) - F min ] log 10  [ (F max) - F min ] * (1 . 0 - Minimum     Weight)) .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
EP1318505A1

Filed: 2001-09-04     Issued: 2003-06-11

Emotion recognizing method, sensibility creating method, device, and software

(Original Assignee) A G I Inc; AGI Inc     (Current Assignee) A G I Inc ; AGI Inc

Shunji Mitsuyoshi
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end (signal input) for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
EP1318505A1
CLAIM 2
An emotion detecting system for detecting an emotion of a subject , comprising : a voice inputting unit for inputting a voice signal ;
an intensity detecting unit for detecting an intensity of a voice based on the voice signal input (audio front end) ted by said voice inputting unit ;
a tempo detecting unit for detecting speed the voice emerges at as a tempo based on the voice signal inputted by said voice inputting unit ;
an intonation detecting unit for detecting intonation expressing an intensity-change pattern in a word of the voice based on the voice signal inputted by said voice inputting unit ;
a change-amount detecting unit for obtaining amounts of change in the intensity of the voice detected by said intensity detecting unit , the tempo of the voice detected by said tempo detecting unit , and the intonation in the voice detected by said intonation detecting unit , respectively ;
and an emotion detecting unit for outputting signals expressing states of emotion of at least anger , sadness , and pleasure , respectively , based on the amounts of change detected by said change-amount detecting unit .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator (voice recognition) has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
EP1318505A1
CLAIM 17
The sensibility generator according to claim 16 , further comprising : a voice recognition (speech accelerator) unit for recognizing the voice inputted from said voice inputting unit , and for outputting character information ;
and a natural language processing unit for subjecting vocal information recognized by said voice recognition unit to natural language processing , and for generating meaning information expressing a meaning of the inputted voice .

US7979277B2
CLAIM 10
. The speech recognition circuit of claim 1 , wherein the accelerator signals (time intervals) to the search stage when the distances for a new frame are available in a result memory (receiving pieces) .
EP1318505A1
CLAIM 3
The emotion detecting system according to claim 2 , wherein said intonation detecting unit includes : a bandpass filter unit for extracting specific frequency components from the voice signal inputted separately for each word ;
an area separating unit for separating a power spectrum of the signal extracted by said bandpass filter unit into a plurality of areas based on the intensity of the power spectrum ;
and an intonation calculating unit for calculating a value of the intonation based on time intervals (accelerator signals) between respective centers of the plurality of areas separated by said area separating unit .

EP1318505A1
CLAIM 5
The emotion detecting system according to claim 2 , further comprising : an emotion information storing unit for sequentially receiving pieces (result memory) of information concerning the states of emotion detected by said emotion detecting unit and for storing the pieces of information therein ;
and an oblivion processing unit for deleting information which has been stored for a predetermined period of time since the information was initially stored , among the pieces of information concerning states of emotion stored in said emotion information storing unit in the past , and for excluding at least information showing a larger amount of change in emotion than a predetermined amount and information matching a predetermined change pattern , from said information to be deleted .

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end (signal input) is configured to input a digital audio signal .
EP1318505A1
CLAIM 2
An emotion detecting system for detecting an emotion of a subject , comprising : a voice inputting unit for inputting a voice signal ;
an intensity detecting unit for detecting an intensity of a voice based on the voice signal input (audio front end) ted by said voice inputting unit ;
a tempo detecting unit for detecting speed the voice emerges at as a tempo based on the voice signal inputted by said voice inputting unit ;
an intonation detecting unit for detecting intonation expressing an intensity-change pattern in a word of the voice based on the voice signal inputted by said voice inputting unit ;
a change-amount detecting unit for obtaining amounts of change in the intensity of the voice detected by said intensity detecting unit , the tempo of the voice detected by said tempo detecting unit , and the intonation in the voice detected by said intonation detecting unit , respectively ;
and an emotion detecting unit for outputting signals expressing states of emotion of at least anger , sadness , and pleasure , respectively , based on the amounts of change detected by said change-amount detecting unit .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end (signal input) for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
EP1318505A1
CLAIM 2
An emotion detecting system for detecting an emotion of a subject , comprising : a voice inputting unit for inputting a voice signal ;
an intensity detecting unit for detecting an intensity of a voice based on the voice signal input (audio front end) ted by said voice inputting unit ;
a tempo detecting unit for detecting speed the voice emerges at as a tempo based on the voice signal inputted by said voice inputting unit ;
an intonation detecting unit for detecting intonation expressing an intensity-change pattern in a word of the voice based on the voice signal inputted by said voice inputting unit ;
a change-amount detecting unit for obtaining amounts of change in the intensity of the voice detected by said intensity detecting unit , the tempo of the voice detected by said tempo detecting unit , and the intonation in the voice detected by said intonation detecting unit , respectively ;
and an emotion detecting unit for outputting signals expressing states of emotion of at least anger , sadness , and pleasure , respectively , based on the amounts of change detected by said change-amount detecting unit .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal using an audio front end (signal input) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
EP1318505A1
CLAIM 2
An emotion detecting system for detecting an emotion of a subject , comprising : a voice inputting unit for inputting a voice signal ;
an intensity detecting unit for detecting an intensity of a voice based on the voice signal input (audio front end) ted by said voice inputting unit ;
a tempo detecting unit for detecting speed the voice emerges at as a tempo based on the voice signal inputted by said voice inputting unit ;
an intonation detecting unit for detecting intonation expressing an intensity-change pattern in a word of the voice based on the voice signal inputted by said voice inputting unit ;
a change-amount detecting unit for obtaining amounts of change in the intensity of the voice detected by said intensity detecting unit , the tempo of the voice detected by said tempo detecting unit , and the intonation in the voice detected by said intonation detecting unit , respectively ;
and an emotion detecting unit for outputting signals expressing states of emotion of at least anger , sadness , and pleasure , respectively , based on the amounts of change detected by said change-amount detecting unit .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6594348B1

Filed: 2001-08-24     Issued: 2003-07-15

Voice browser and a method at a voice browser

(Original Assignee) Pipebeach AB     (Current Assignee) Hewlett Packard Development Co LP

Hans Bjurstrom, Christer Granberg, Jesper Hogberg, Berndt Johannsen, Scott Mcglashan
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances (audio objects) indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US6594348B1
CLAIM 1
. A voice browser in a voice browser system , said voice browser being arranged at a server connected to the Internet and responsive to Dual Tone MultiFrequency (DTMF) tones received from a telecommunications network , wherein said voice browser includes : an object model comprising elements defined in a retrieved HTML page and defining navigation positions within said HTML page ;
audio means for playing an audio stream derived from an element of said HTML page ;
a voice browser controller for controlling the operation of said voice browser ;
and a dialogue state structure , having a plurality of states and transitions between states , storing text and audio objects (calculating distances) to be outputted to said audio means ;
and a dialogue controller arranged to control a dialogue with a user based on said dialogue state structure and to respond to an interpreted DTMF tone with an event to said voice browser controller , wherein said voice browser controller , in response to an event including an interpreted DTMF tone of a first predetermined set of interpreted DTMF tones , is arranged to control voice browser function associated with said interpreted DTMF tone and to control from which state in said dialogue state structure , or in a second dialogue state structure associated with a second retrieved HTML page , said dialogue should resume after an execution of said function ;
said voice browser controller , in response to an event including an interpreted DTMF tone of a second predetermined set of interpreted DTMF tones , is arranged to direct said interpreted DTMF tone to an application of said retrieved HTML page ;
each of said states is associated with a corresponding position in said object mode ;
and said voice browser further includes synchronisation means for synchronising said dialogue state structure , with respect to a current state , with a new position in said object model .

US7979277B2
CLAIM 3
. A speech recognition circuit as claimed in claim 1 , comprising dynamic scheduling whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results (time window) and/or availability of space for storing more feature vectors and/or distance results .
US6594348B1
CLAIM 10
. A voice browser as claimed in claim 9 , wherein said voice browser controller is arranged to , in response to each additionally received second DTMF tone interpretation , received within a respective time window (distance results) , revert from the current position of said object model to a previous position designating the start of the previously read HTML element , until the top position of said object model designating the start of the HTML page is reached .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US20030036903A1

Filed: 2001-08-16     Issued: 2003-02-20

Retraining and updating speech models for speech recognition

(Original Assignee) Sony Corp     (Current Assignee) Sony Corp ; Sony Electronics Inc

Courtney Konopka, Lars Almstrand
US7979277B2
CLAIM 1
. A speech recognition circuit (recognizing speech) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor (digital representation) , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US20030036903A1
CLAIM 1
. A method of updating speech models for speech recognition , comprising the steps of : identifying speech data (audio time frame, time frame) for a predetermined set of utterances from a class of users , said utterances differing from a predetermined set of stored speech models by at least a predetermined amount ;
collecting said identified speech data for similar utterances from said class of users ;
correcting said predetermined set of stored speech models as a function of the collected speech data so that the corrected speech models are an improved match to said utterances than said predetermined set of stored speech models ;
and updating said predetermined set of speech models with said corrected speech models for subsequent speech recognition of utterances from said class of users .

US20030036903A1
CLAIM 9
. A method of building speech models for recognizing speech (speech recognition circuit, word identification) of users of a particular class , comprising the steps of : registering users in accordance with predetermined criteria that characterize the speech of said particular class of users ;
collecting a set of registration utterances from a user ;
determining a best match of each said utterance to a stored speech model ;
collecting utterances from users of said particular class that differ from said stored , best match speech model by at least a predetermined amount ;
and retraining said stored speech model to reduce to less than said predetermined amount , the difference between the retrained speech model and said identified utterances from said users of said particular class .

US20030036903A1
CLAIM 23
. A method of creating speech models for speech recognition , comprising the steps of : registering users in accordance with predetermined criteria that characterize the speech of a particular class of users ;
generating digital representation (second processor) s of utterances from said users ;
collecting from said particular class of users those digital representations of similar utterances that differ by at least a predetermined amount from a set of stored speech models that are determined to be a best match to said utterances , and collecting corrections to said set of stored speech models that reduce the differences between an utterance and said set of models to a minimum ;
building a set of updated speech models based on said collected corrections when the number of utterances that differ from said stored best match set of speech models by at least said predetermined amount , exceeds a threshold ;
and using said set of updated speech models as said stored set of speech models for further speech recognition .

US7979277B2
CLAIM 2
. A speech recognition circuit (recognizing speech) as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor .
US20030036903A1
CLAIM 9
. A method of building speech models for recognizing speech (speech recognition circuit, word identification) of users of a particular class , comprising the steps of : registering users in accordance with predetermined criteria that characterize the speech of said particular class of users ;
collecting a set of registration utterances from a user ;
determining a best match of each said utterance to a stored speech model ;
collecting utterances from users of said particular class that differ from said stored , best match speech model by at least a predetermined amount ;
and retraining said stored speech model to reduce to less than said predetermined amount , the difference between the retrained speech model and said identified utterances from said users of said particular class .

US7979277B2
CLAIM 3
. A speech recognition circuit (recognizing speech) as claimed in claim 1 , comprising dynamic scheduling whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
US20030036903A1
CLAIM 9
. A method of building speech models for recognizing speech (speech recognition circuit, word identification) of users of a particular class , comprising the steps of : registering users in accordance with predetermined criteria that characterize the speech of said particular class of users ;
collecting a set of registration utterances from a user ;
determining a best match of each said utterance to a stored speech model ;
collecting utterances from users of said particular class that differ from said stored , best match speech model by at least a predetermined amount ;
and retraining said stored speech model to reduce to less than said predetermined amount , the difference between the retrained speech model and said identified utterances from said users of said particular class .

US7979277B2
CLAIM 4
. A speech recognition circuit (recognizing speech) as claimed in claim 1 , wherein the first processor supports multi-threaded operation , and runs the search stage and front ends as separate threads .
US20030036903A1
CLAIM 9
. A method of building speech models for recognizing speech (speech recognition circuit, word identification) of users of a particular class , comprising the steps of : registering users in accordance with predetermined criteria that characterize the speech of said particular class of users ;
collecting a set of registration utterances from a user ;
determining a best match of each said utterance to a stored speech model ;
collecting utterances from users of said particular class that differ from said stored , best match speech model by at least a predetermined amount ;
and retraining said stored speech model to reduce to less than said predetermined amount , the difference between the retrained speech model and said identified utterances from said users of said particular class .

US7979277B2
CLAIM 5
. A speech recognition circuit (recognizing speech) as claimed in claim 1 , wherein the said calculating circuit is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
US20030036903A1
CLAIM 9
. A method of building speech models for recognizing speech (speech recognition circuit, word identification) of users of a particular class , comprising the steps of : registering users in accordance with predetermined criteria that characterize the speech of said particular class of users ;
collecting a set of registration utterances from a user ;
determining a best match of each said utterance to a stored speech model ;
collecting utterances from users of said particular class that differ from said stored , best match speech model by at least a predetermined amount ;
and retraining said stored speech model to reduce to less than said predetermined amount , the difference between the retrained speech model and said identified utterances from said users of said particular class .

US7979277B2
CLAIM 6
. The speech recognition circuit (recognizing speech) of claim 1 , comprising control means adapted to implement frame dropping , to discard one or more audio time frames .
US20030036903A1
CLAIM 9
. A method of building speech models for recognizing speech (speech recognition circuit, word identification) of users of a particular class , comprising the steps of : registering users in accordance with predetermined criteria that characterize the speech of said particular class of users ;
collecting a set of registration utterances from a user ;
determining a best match of each said utterance to a stored speech model ;
collecting utterances from users of said particular class that differ from said stored , best match speech model by at least a predetermined amount ;
and retraining said stored speech model to reduce to less than said predetermined amount , the difference between the retrained speech model and said identified utterances from said users of said particular class .

US7979277B2
CLAIM 7
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal for a predetermined time frame (speech data) .
US20030036903A1
CLAIM 1
. A method of updating speech models for speech recognition , comprising the steps of : identifying speech data (audio time frame, time frame) for a predetermined set of utterances from a class of users , said utterances differing from a predetermined set of stored speech models by at least a predetermined amount ;
collecting said identified speech data for similar utterances from said class of users ;
correcting said predetermined set of stored speech models as a function of the collected speech data so that the corrected speech models are an improved match to said utterances than said predetermined set of stored speech models ;
and updating said predetermined set of speech models with said corrected speech models for subsequent speech recognition of utterances from said class of users .

US20030036903A1
CLAIM 9
. A method of building speech models for recognizing speech (speech recognition circuit, word identification) of users of a particular class , comprising the steps of : registering users in accordance with predetermined criteria that characterize the speech of said particular class of users ;
collecting a set of registration utterances from a user ;
determining a best match of each said utterance to a stored speech model ;
collecting utterances from users of said particular class that differ from said stored , best match speech model by at least a predetermined amount ;
and retraining said stored speech model to reduce to less than said predetermined amount , the difference between the retrained speech model and said identified utterances from said users of said particular class .

US7979277B2
CLAIM 8
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the processor is configured to divert to another task if the data flow stalls .
US20030036903A1
CLAIM 9
. A method of building speech models for recognizing speech (speech recognition circuit, word identification) of users of a particular class , comprising the steps of : registering users in accordance with predetermined criteria that characterize the speech of said particular class of users ;
collecting a set of registration utterances from a user ;
determining a best match of each said utterance to a stored speech model ;
collecting utterances from users of said particular class that differ from said stored , best match speech model by at least a predetermined amount ;
and retraining said stored speech model to reduce to less than said predetermined amount , the difference between the retrained speech model and said identified utterances from said users of said particular class .

US7979277B2
CLAIM 9
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
US20030036903A1
CLAIM 9
. A method of building speech models for recognizing speech (speech recognition circuit, word identification) of users of a particular class , comprising the steps of : registering users in accordance with predetermined criteria that characterize the speech of said particular class of users ;
collecting a set of registration utterances from a user ;
determining a best match of each said utterance to a stored speech model ;
collecting utterances from users of said particular class that differ from said stored , best match speech model by at least a predetermined amount ;
and retraining said stored speech model to reduce to less than said predetermined amount , the difference between the retrained speech model and said identified utterances from said users of said particular class .

US7979277B2
CLAIM 10
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory (plural user) .
US20030036903A1
CLAIM 9
. A method of building speech models for recognizing speech (speech recognition circuit, word identification) of users of a particular class , comprising the steps of : registering users in accordance with predetermined criteria that characterize the speech of said particular class of users ;
collecting a set of registration utterances from a user ;
determining a best match of each said utterance to a stored speech model ;
collecting utterances from users of said particular class that differ from said stored , best match speech model by at least a predetermined amount ;
and retraining said stored speech model to reduce to less than said predetermined amount , the difference between the retrained speech model and said identified utterances from said users of said particular class .

US20030036903A1
CLAIM 24
. A system for updating speech models for speech recognition , comprising : plural user (result memory) processors each programmed to : identify acoustic subword data for a predetermined set of utterances from a class of users , said utterances differing from a predetermined set of stored speech models by at least a predetermined amount ;
collect said identified acoustic subword data for similar utterances from said class of users ;
and correct said predetermined set of stored speech models as a function of the collected acoustic subword data so that the corrected speech models are a closer match to said utterances than said predetermined set of stored speech models ;
and a central processor , programmed to update said predetermined set of speech models at user processors with said corrected speech models for subsequent speech recognition of utterances from said class of users .

US7979277B2
CLAIM 11
. The speech recognition circuit (recognizing speech) of claim 1 , comprising increasing the pipeline depth by computing extra front frames in advance .
US20030036903A1
CLAIM 9
. A method of building speech models for recognizing speech (speech recognition circuit, word identification) of users of a particular class , comprising the steps of : registering users in accordance with predetermined criteria that characterize the speech of said particular class of users ;
collecting a set of registration utterances from a user ;
determining a best match of each said utterance to a stored speech model ;
collecting utterances from users of said particular class that differ from said stored , best match speech model by at least a predetermined amount ;
and retraining said stored speech model to reduce to less than said predetermined amount , the difference between the retrained speech model and said identified utterances from said users of said particular class .

US7979277B2
CLAIM 12
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the audio front end is configured to input a digital audio signal .
US20030036903A1
CLAIM 9
. A method of building speech models for recognizing speech (speech recognition circuit, word identification) of users of a particular class , comprising the steps of : registering users in accordance with predetermined criteria that characterize the speech of said particular class of users ;
collecting a set of registration utterances from a user ;
determining a best match of each said utterance to a stored speech model ;
collecting utterances from users of said particular class that differ from said stored , best match speech model by at least a predetermined amount ;
and retraining said stored speech model to reduce to less than said predetermined amount , the difference between the retrained speech model and said identified utterances from said users of said particular class .

US7979277B2
CLAIM 13
. A speech recognition circuit (recognizing speech) of claim 1 , wherein said distance comprises a Mahalanobis distance .
US20030036903A1
CLAIM 9
. A method of building speech models for recognizing speech (speech recognition circuit, word identification) of users of a particular class , comprising the steps of : registering users in accordance with predetermined criteria that characterize the speech of said particular class of users ;
collecting a set of registration utterances from a user ;
determining a best match of each said utterance to a stored speech model ;
collecting utterances from users of said particular class that differ from said stored , best match speech model by at least a predetermined amount ;
and retraining said stored speech model to reduce to less than said predetermined amount , the difference between the retrained speech model and said identified utterances from said users of said particular class .

US7979277B2
CLAIM 14
. A speech recognition circuit (recognizing speech) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US20030036903A1
CLAIM 1
. A method of updating speech models for speech recognition , comprising the steps of : identifying speech data (audio time frame, time frame) for a predetermined set of utterances from a class of users , said utterances differing from a predetermined set of stored speech models by at least a predetermined amount ;
collecting said identified speech data for similar utterances from said class of users ;
correcting said predetermined set of stored speech models as a function of the collected speech data so that the corrected speech models are an improved match to said utterances than said predetermined set of stored speech models ;
and updating said predetermined set of speech models with said corrected speech models for subsequent speech recognition of utterances from said class of users .

US20030036903A1
CLAIM 9
. A method of building speech models for recognizing speech (speech recognition circuit, word identification) of users of a particular class , comprising the steps of : registering users in accordance with predetermined criteria that characterize the speech of said particular class of users ;
collecting a set of registration utterances from a user ;
determining a best match of each said utterance to a stored speech model ;
collecting utterances from users of said particular class that differ from said stored , best match speech model by at least a predetermined amount ;
and retraining said stored speech model to reduce to less than said predetermined amount , the difference between the retrained speech model and said identified utterances from said users of said particular class .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US20030036903A1
CLAIM 1
. A method of updating speech models for speech recognition , comprising the steps of : identifying speech data (audio time frame, time frame) for a predetermined set of utterances from a class of users , said utterances differing from a predetermined set of stored speech models by at least a predetermined amount ;
collecting said identified speech data for similar utterances from said class of users ;
correcting said predetermined set of stored speech models as a function of the collected speech data so that the corrected speech models are an improved match to said utterances than said predetermined set of stored speech models ;
and updating said predetermined set of speech models with said corrected speech models for subsequent speech recognition of utterances from said class of users .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification (recognizing speech) .
US20030036903A1
CLAIM 1
. A method of updating speech models for speech recognition , comprising the steps of : identifying speech data (audio time frame, time frame) for a predetermined set of utterances from a class of users , said utterances differing from a predetermined set of stored speech models by at least a predetermined amount ;
collecting said identified speech data for similar utterances from said class of users ;
correcting said predetermined set of stored speech models as a function of the collected speech data so that the corrected speech models are an improved match to said utterances than said predetermined set of stored speech models ;
and updating said predetermined set of speech models with said corrected speech models for subsequent speech recognition of utterances from said class of users .

US20030036903A1
CLAIM 9
. A method of building speech models for recognizing speech (speech recognition circuit, word identification) of users of a particular class , comprising the steps of : registering users in accordance with predetermined criteria that characterize the speech of said particular class of users ;
collecting a set of registration utterances from a user ;
determining a best match of each said utterance to a stored speech model ;
collecting utterances from users of said particular class that differ from said stored , best match speech model by at least a predetermined amount ;
and retraining said stored speech model to reduce to less than said predetermined amount , the difference between the retrained speech model and said identified utterances from said users of said particular class .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US20020161579A1

Filed: 2001-08-14     Issued: 2002-10-31

Systems and methods for automated audio transcription, translation, and transfer

(Original Assignee) Speche Communications     (Current Assignee) COURTROOM CONNECT

Richard Saindon, Stephen Brand
US7979277B2
CLAIM 6
. The speech recognition circuit of claim 1 , comprising control means (comprises information) adapted to implement frame dropping , to discard one or more audio time frames .
US20020161579A1
CLAIM 14
. The system of claim 9 , wherein said audio information comprises information (control means) obtained from live event audio , speech audio , and motion picture audio .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator (first language) has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
US20020161579A1
CLAIM 16
. The system of claim 9 , further comprising a language translator configured to receive said text information and convert said text information from a first language (speech accelerator) into one or more other languages .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6483927B2

Filed: 2001-06-28     Issued: 2002-11-19

Synchronizing readers of hidden auxiliary data in quantization-based data hiding schemes

(Original Assignee) Digimarc Corp     (Current Assignee) Digimarc Corp

Hugh L. Brunk, Ravi K. Sharma
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal (message signal, audio signal) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US6483927B2
CLAIM 1
. A method of hiding auxiliary data in a media signal such that the auxiliary data is imperceptible to a viewer or listener yet recoverable by an automated auxiliary data reader , comprising : quantizing a first set of feature samples of the media signal to embed a reference signal into the media signal , where the reference signal is comprised of symbols associated with quantizers , and the symbols of the reference signal are embedded by quantizing corresponding samples in the first set with the quantizers associated with the symbols ;
quantizing a second set of feature samples of the media signal to embed a message signal (speech accelerator, audio signal) into the media signal ;
wherein the embedded reference signal is readable in a geometrically or temporally distorted version of the media signal by estimating quantizers for symbols of the reference signal , and using the estimated quantizers to detect the reference signal .

US6483927B2
CLAIM 6
. The method of claim 1 wherein the media signal comprises an audio signal (speech accelerator, audio signal) .

US7979277B2
CLAIM 6
. The speech recognition circuit of claim 1 , comprising control means adapted to implement frame dropping (video signal) , to discard one or more audio time frames .
US6483927B2
CLAIM 3
. The method of claim 2 wherein the media signal comprises a video signal (frame dropping) .

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal (message signal, audio signal) for a predetermined time frame .
US6483927B2
CLAIM 1
. A method of hiding auxiliary data in a media signal such that the auxiliary data is imperceptible to a viewer or listener yet recoverable by an automated auxiliary data reader , comprising : quantizing a first set of feature samples of the media signal to embed a reference signal into the media signal , where the reference signal is comprised of symbols associated with quantizers , and the symbols of the reference signal are embedded by quantizing corresponding samples in the first set with the quantizers associated with the symbols ;
quantizing a second set of feature samples of the media signal to embed a message signal (speech accelerator, audio signal) into the media signal ;
wherein the embedded reference signal is readable in a geometrically or temporally distorted version of the media signal by estimating quantizers for symbols of the reference signal , and using the estimated quantizers to detect the reference signal .

US6483927B2
CLAIM 6
. The method of claim 1 wherein the media signal comprises an audio signal (speech accelerator, audio signal) .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator (message signal, audio signal) has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
US6483927B2
CLAIM 1
. A method of hiding auxiliary data in a media signal such that the auxiliary data is imperceptible to a viewer or listener yet recoverable by an automated auxiliary data reader , comprising : quantizing a first set of feature samples of the media signal to embed a reference signal into the media signal , where the reference signal is comprised of symbols associated with quantizers , and the symbols of the reference signal are embedded by quantizing corresponding samples in the first set with the quantizers associated with the symbols ;
quantizing a second set of feature samples of the media signal to embed a message signal (speech accelerator, audio signal) into the media signal ;
wherein the embedded reference signal is readable in a geometrically or temporally distorted version of the media signal by estimating quantizers for symbols of the reference signal , and using the estimated quantizers to detect the reference signal .

US6483927B2
CLAIM 6
. The method of claim 1 wherein the media signal comprises an audio signal (speech accelerator, audio signal) .

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end is configured to input a digital audio signal (message signal, audio signal) .
US6483927B2
CLAIM 1
. A method of hiding auxiliary data in a media signal such that the auxiliary data is imperceptible to a viewer or listener yet recoverable by an automated auxiliary data reader , comprising : quantizing a first set of feature samples of the media signal to embed a reference signal into the media signal , where the reference signal is comprised of symbols associated with quantizers , and the symbols of the reference signal are embedded by quantizing corresponding samples in the first set with the quantizers associated with the symbols ;
quantizing a second set of feature samples of the media signal to embed a message signal (speech accelerator, audio signal) into the media signal ;
wherein the embedded reference signal is readable in a geometrically or temporally distorted version of the media signal by estimating quantizers for symbols of the reference signal , and using the estimated quantizers to detect the reference signal .

US6483927B2
CLAIM 6
. The method of claim 1 wherein the media signal comprises an audio signal (speech accelerator, audio signal) .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal (message signal, audio signal) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US6483927B2
CLAIM 1
. A method of hiding auxiliary data in a media signal such that the auxiliary data is imperceptible to a viewer or listener yet recoverable by an automated auxiliary data reader , comprising : quantizing a first set of feature samples of the media signal to embed a reference signal into the media signal , where the reference signal is comprised of symbols associated with quantizers , and the symbols of the reference signal are embedded by quantizing corresponding samples in the first set with the quantizers associated with the symbols ;
quantizing a second set of feature samples of the media signal to embed a message signal (speech accelerator, audio signal) into the media signal ;
wherein the embedded reference signal is readable in a geometrically or temporally distorted version of the media signal by estimating quantizers for symbols of the reference signal , and using the estimated quantizers to detect the reference signal .

US6483927B2
CLAIM 6
. The method of claim 1 wherein the media signal comprises an audio signal (speech accelerator, audio signal) .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal (message signal, audio signal) using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US6483927B2
CLAIM 1
. A method of hiding auxiliary data in a media signal such that the auxiliary data is imperceptible to a viewer or listener yet recoverable by an automated auxiliary data reader , comprising : quantizing a first set of feature samples of the media signal to embed a reference signal into the media signal , where the reference signal is comprised of symbols associated with quantizers , and the symbols of the reference signal are embedded by quantizing corresponding samples in the first set with the quantizers associated with the symbols ;
quantizing a second set of feature samples of the media signal to embed a message signal (speech accelerator, audio signal) into the media signal ;
wherein the embedded reference signal is readable in a geometrically or temporally distorted version of the media signal by estimating quantizers for symbols of the reference signal , and using the estimated quantizers to detect the reference signal .

US6483927B2
CLAIM 6
. The method of claim 1 wherein the media signal comprises an audio signal (speech accelerator, audio signal) .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal (message signal, audio signal) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
US6483927B2
CLAIM 1
. A method of hiding auxiliary data in a media signal such that the auxiliary data is imperceptible to a viewer or listener yet recoverable by an automated auxiliary data reader , comprising : quantizing a first set of feature samples of the media signal to embed a reference signal into the media signal , where the reference signal is comprised of symbols associated with quantizers , and the symbols of the reference signal are embedded by quantizing corresponding samples in the first set with the quantizers associated with the symbols ;
quantizing a second set of feature samples of the media signal to embed a message signal (speech accelerator, audio signal) into the media signal ;
wherein the embedded reference signal is readable in a geometrically or temporally distorted version of the media signal by estimating quantizers for symbols of the reference signal , and using the estimated quantizers to detect the reference signal .

US6483927B2
CLAIM 6
. The method of claim 1 wherein the media signal comprises an audio signal (speech accelerator, audio signal) .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6785647B2

Filed: 2001-04-20     Issued: 2004-08-31

Speech recognition system with network accessible speech processing resources

(Original Assignee) William R. Hutchison     (Current Assignee) ME ME ME Inc ; Sensory

William R. Hutchison
US7979277B2
CLAIM 1
. A speech recognition circuit (extraction process) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US6785647B2
CLAIM 21
. The speech recognition system of claim 18 wherein the speaker-dependent speech recognition resources comprise feature extraction process (speech recognition circuit, word identification) es .

US7979277B2
CLAIM 2
. A speech recognition circuit (extraction process) as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor .
US6785647B2
CLAIM 21
. The speech recognition system of claim 18 wherein the speaker-dependent speech recognition resources comprise feature extraction process (speech recognition circuit, word identification) es .

US7979277B2
CLAIM 3
. A speech recognition circuit (extraction process) as claimed in claim 1 , comprising dynamic scheduling whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
US6785647B2
CLAIM 21
. The speech recognition system of claim 18 wherein the speaker-dependent speech recognition resources comprise feature extraction process (speech recognition circuit, word identification) es .

US7979277B2
CLAIM 4
. A speech recognition circuit (extraction process) as claimed in claim 1 , wherein the first processor supports multi-threaded operation , and runs the search stage and front ends as separate threads .
US6785647B2
CLAIM 21
. The speech recognition system of claim 18 wherein the speaker-dependent speech recognition resources comprise feature extraction process (speech recognition circuit, word identification) es .

US7979277B2
CLAIM 5
. A speech recognition circuit (extraction process) as claimed in claim 1 , wherein the said calculating circuit is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
US6785647B2
CLAIM 21
. The speech recognition system of claim 18 wherein the speaker-dependent speech recognition resources comprise feature extraction process (speech recognition circuit, word identification) es .

US7979277B2
CLAIM 6
. The speech recognition circuit (extraction process) of claim 1 , comprising control means adapted to implement frame dropping , to discard one (third interface) or more audio time frames .
US6785647B2
CLAIM 21
. The speech recognition system of claim 18 wherein the speaker-dependent speech recognition resources comprise feature extraction process (speech recognition circuit, word identification) es .

US6785647B2
CLAIM 25
. A speech-enabled software application comprising : a first interface for receiving a voice signal from a speaker ;
a second interface for sending the voice signal over a network to a centralized speech recognition server ;
a third interface (to discard one) for receiving phoneme probabilities from the speech recognition server corresponding to the voice signal ;
and processes for using the phoneme probabilities to launch speech-enabled functions for the speaker .

US7979277B2
CLAIM 7
. The speech recognition circuit (extraction process) of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal for a predetermined time frame .
US6785647B2
CLAIM 21
. The speech recognition system of claim 18 wherein the speaker-dependent speech recognition resources comprise feature extraction process (speech recognition circuit, word identification) es .

US7979277B2
CLAIM 8
. The speech recognition circuit (extraction process) of claim 1 , wherein the processor is configured to divert to another task if the data flow stalls .
US6785647B2
CLAIM 21
. The speech recognition system of claim 18 wherein the speaker-dependent speech recognition resources comprise feature extraction process (speech recognition circuit, word identification) es .

US7979277B2
CLAIM 9
. The speech recognition circuit (extraction process) of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
US6785647B2
CLAIM 21
. The speech recognition system of claim 18 wherein the speaker-dependent speech recognition resources comprise feature extraction process (speech recognition circuit, word identification) es .

US7979277B2
CLAIM 10
. The speech recognition circuit (extraction process) of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory .
US6785647B2
CLAIM 21
. The speech recognition system of claim 18 wherein the speaker-dependent speech recognition resources comprise feature extraction process (speech recognition circuit, word identification) es .

US7979277B2
CLAIM 11
. The speech recognition circuit (extraction process) of claim 1 , comprising increasing the pipeline depth by computing extra front frames in advance .
US6785647B2
CLAIM 21
. The speech recognition system of claim 18 wherein the speaker-dependent speech recognition resources comprise feature extraction process (speech recognition circuit, word identification) es .

US7979277B2
CLAIM 12
. The speech recognition circuit (extraction process) of claim 1 , wherein the audio front end is configured to input a digital audio signal .
US6785647B2
CLAIM 21
. The speech recognition system of claim 18 wherein the speaker-dependent speech recognition resources comprise feature extraction process (speech recognition circuit, word identification) es .

US7979277B2
CLAIM 13
. A speech recognition circuit (extraction process) of claim 1 , wherein said distance comprises a Mahalanobis distance .
US6785647B2
CLAIM 21
. The speech recognition system of claim 18 wherein the speaker-dependent speech recognition resources comprise feature extraction process (speech recognition circuit, word identification) es .

US7979277B2
CLAIM 14
. A speech recognition circuit (extraction process) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US6785647B2
CLAIM 21
. The speech recognition system of claim 18 wherein the speaker-dependent speech recognition resources comprise feature extraction process (speech recognition circuit, word identification) es .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification (extraction process) .
US6785647B2
CLAIM 21
. The speech recognition system of claim 18 wherein the speaker-dependent speech recognition resources comprise feature extraction process (speech recognition circuit, word identification) es .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US20010047258A1

Filed: 2001-03-16     Issued: 2001-11-29

Method and system of configuring a speech recognition system

(Original Assignee) Nokia Oyj     (Current Assignee) Nokia Technologies Oy

Anthony Rodrigo
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances (determining means) indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US20010047258A1
CLAIM 1
. Speech control system for a telecommunication network (4) , comprising : a) loading means for loading a state definition information from a network application server (5) , wherein said state definition information defines possible states of the network application server (5) ;
b) determining means (calculating distances) for determining a set of valid commands for said network application server (5) on the basis of said state definition information ;
and c) checking means for checking a validity of a text command , obtained by converting an input speech command to be used for controlling said network application server (5) , by comparing said text command with said determined set of valid commands .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator (speech recognition means) has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
US20010047258A1
CLAIM 4
. System according to any one of the preceding claims , wherein said speech control system comprises a speech recognition means (speech accelerator) (6) for converting an input speech command received from a subscriber terminal (1) into said text command to be supplied to said network application server (5) .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US20020091527A1

Filed: 2001-01-08     Issued: 2002-07-11

Distributed speech recognition server system for mobile internet/intranet communication

(Original Assignee) VerbalTek Inc     (Current Assignee) VerbalTek Inc

Shyue-Chin Shiau
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal (mobile phones) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US20020091527A1
CLAIM 3
. The speech recognition server system of claim 2 wherein the telephone handsets comprise wireless mobile phones (audio signal) .

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal (mobile phones) for a predetermined time frame .
US20020091527A1
CLAIM 3
. The speech recognition server system of claim 2 wherein the telephone handsets comprise wireless mobile phones (audio signal) .

US7979277B2
CLAIM 8
. The speech recognition circuit of claim 1 , wherein the processor is configured to divert to another task if the data flow (data flow) stalls .
US20020091527A1
CLAIM 52
. A speech recognition server system for implementation in a communications network having at least one site server , at least one gateway server , at least one content server , and a plurality of clients each having a keypad and a micro-browser , said speech recognition server system comprising : a hotkey , disposed on the keypad , for initializing a voice session ;
a vocoder for generating voice frame data responsive to an input speech ;
a client speech subroutine , coupled to said vocoder , for performing speech feature extraction on said voice frame data and to generate digitized voice signals therefrom ;
a system-specific profile database for storing and transmitting system-specific client profiles ;
a payload formatter , communicable with said client speech subroutine and said system-specific profile database , for formatting a client payload data flow (data flow) received from said client speech subroutine with data received from said system-specific profile database ;
a speech recognition server , communicable with the gateway server for speech recognition of the formatted client payload ;
a transaction protocol (TP) socket , communicable with said payload formatter and the gateway server , for receiving the formatted client payload from said payload formatter , converting the client payload to a wireless speech TP query , and transmitting the wireless speech TP query via the gateway server through the communications network to said speech recognition server , and further for receiving a recognized wireless speech TP query from said speech recognition server , converting the recognized wireless speech TP query to a resource identifier , and transmitting the resource identifier to the micro-browser for identifying the resource responsive to the resource identifier ;
a wireless transaction protocol socket , communicable with the micro-browser and gateway server , for receiving the resource query from the micro-browser , generating a wireless session resource query , and transmitting the resource query via the gateway server and through the communications network to the contents server , and further for receiving content from the content server via the site server , the communications network , and the gateway server , and transmitting the content via the micro-browser to the client for display ;
and an event handler , communicable with said hotkey , said client speech subroutine , said TP socket , the micro-browser , and said payload formatter , for transmitting event command signals and synchronizing the voice session thereamong .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator (speech recognition means) has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
US20020091527A1
CLAIM 61
. A speech recognition server system for implementation in a communications network having a plurality of sites each having a site map and a plurality of sub-sites , said speech recognition server system comprising : a site map table for mapping the site map at the plurality of sites ;
mirroring means , coupled to said site map table , for mirroring the site map at the plurality of sites to said site map table ;
speech recognition means (speech accelerator) for recognizing an input speech selecting one of said plurality of sites and sub-sites ;
and first child process means , coupled to said speech recognition means , for launching one of the plurality of sites responsive to the input speech ;
second child process means , coupled to said speech recognition means , for launching one of the plurality of sub-sites responsive to the input speech ;
and third child process means , coupled to said speech recognition means , for launching information at the sub-site responsive to an input query .

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end is configured to input a digital audio (voice signals) signal .
US20020091527A1
CLAIM 3
. The speech recognition server system of claim 2 wherein the telephone handsets comprise wireless mobile phones (audio signal) .

US20020091527A1
CLAIM 52
. A speech recognition server system for implementation in a communications network having at least one site server , at least one gateway server , at least one content server , and a plurality of clients each having a keypad and a micro-browser , said speech recognition server system comprising : a hotkey , disposed on the keypad , for initializing a voice session ;
a vocoder for generating voice frame data responsive to an input speech ;
a client speech subroutine , coupled to said vocoder , for performing speech feature extraction on said voice frame data and to generate digitized voice signals (digital audio, digital audio signal) therefrom ;
a system-specific profile database for storing and transmitting system-specific client profiles ;
a payload formatter , communicable with said client speech subroutine and said system-specific profile database , for formatting a client payload data flow received from said client speech subroutine with data received from said system-specific profile database ;
a speech recognition server , communicable with the gateway server for speech recognition of the formatted client payload ;
a transaction protocol (TP) socket , communicable with said payload formatter and the gateway server , for receiving the formatted client payload from said payload formatter , converting the client payload to a wireless speech TP query , and transmitting the wireless speech TP query via the gateway server through the communications network to said speech recognition server , and further for receiving a recognized wireless speech TP query from said speech recognition server , converting the recognized wireless speech TP query to a resource identifier , and transmitting the resource identifier to the micro-browser for identifying the resource responsive to the resource identifier ;
a wireless transaction protocol socket , communicable with the micro-browser and gateway server , for receiving the resource query from the micro-browser , generating a wireless session resource query , and transmitting the resource query via the gateway server and through the communications network to the contents server , and further for receiving content from the content server via the site server , the communications network , and the gateway server , and transmitting the content via the micro-browser to the client for display ;
and an event handler , communicable with said hotkey , said client speech subroutine , said TP socket , the micro-browser , and said payload formatter , for transmitting event command signals and synchronizing the voice session thereamong .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal (mobile phones) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow (data flow) .
US20020091527A1
CLAIM 3
. The speech recognition server system of claim 2 wherein the telephone handsets comprise wireless mobile phones (audio signal) .

US20020091527A1
CLAIM 52
. A speech recognition server system for implementation in a communications network having at least one site server , at least one gateway server , at least one content server , and a plurality of clients each having a keypad and a micro-browser , said speech recognition server system comprising : a hotkey , disposed on the keypad , for initializing a voice session ;
a vocoder for generating voice frame data responsive to an input speech ;
a client speech subroutine , coupled to said vocoder , for performing speech feature extraction on said voice frame data and to generate digitized voice signals therefrom ;
a system-specific profile database for storing and transmitting system-specific client profiles ;
a payload formatter , communicable with said client speech subroutine and said system-specific profile database , for formatting a client payload data flow (data flow) received from said client speech subroutine with data received from said system-specific profile database ;
a speech recognition server , communicable with the gateway server for speech recognition of the formatted client payload ;
a transaction protocol (TP) socket , communicable with said payload formatter and the gateway server , for receiving the formatted client payload from said payload formatter , converting the client payload to a wireless speech TP query , and transmitting the wireless speech TP query via the gateway server through the communications network to said speech recognition server , and further for receiving a recognized wireless speech TP query from said speech recognition server , converting the recognized wireless speech TP query to a resource identifier , and transmitting the resource identifier to the micro-browser for identifying the resource responsive to the resource identifier ;
a wireless transaction protocol socket , communicable with the micro-browser and gateway server , for receiving the resource query from the micro-browser , generating a wireless session resource query , and transmitting the resource query via the gateway server and through the communications network to the contents server , and further for receiving content from the content server via the site server , the communications network , and the gateway server , and transmitting the content via the micro-browser to the client for display ;
and an event handler , communicable with said hotkey , said client speech subroutine , said TP socket , the micro-browser , and said payload formatter , for transmitting event command signals and synchronizing the voice session thereamong .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal (mobile phones) using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US20020091527A1
CLAIM 3
. The speech recognition server system of claim 2 wherein the telephone handsets comprise wireless mobile phones (audio signal) .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal (mobile phones) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
US20020091527A1
CLAIM 3
. The speech recognition server system of claim 2 wherein the telephone handsets comprise wireless mobile phones (audio signal) .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US20020077830A1

Filed: 2000-12-19     Issued: 2002-06-20

Method for activating context sensitive speech recognition in a terminal

(Original Assignee) Nokia Oyj     (Current Assignee) Nokia Oyj

Riku Suomela, Juha Lehikoinen
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor (one second) , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US20020077830A1
CLAIM 10
. The method of claim 1 , wherein said step (d) further comprises receiving at least one second (first processor) command via speech recognition during the speech recognition time period and saving said at least one second command in a command buffer .

US7979277B2
CLAIM 2
. A speech recognition circuit as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor (one second) .
US20020077830A1
CLAIM 10
. The method of claim 1 , wherein said step (d) further comprises receiving at least one second (first processor) command via speech recognition during the speech recognition time period and saving said at least one second command in a command buffer .

US7979277B2
CLAIM 3
. A speech recognition circuit as claimed in claim 1 , comprising dynamic scheduling (secondary input) whether the first processor (one second) should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
US20020077830A1
CLAIM 10
. The method of claim 1 , wherein said step (d) further comprises receiving at least one second (first processor) command via speech recognition during the speech recognition time period and saving said at least one second command in a command buffer .

US20020077830A1
CLAIM 27
. A terminal capable of speech recognition , comprising : a central processing unit ;
a memory unit connected to said central processing unit ;
a primary input connected to said central processing unit for receiving inputted commands ;
a secondary input (dynamic scheduling, comprising dynamic scheduling) connected to said central processing unit for receiving audible commands ;
a speech recognition algorithm connected to said central processing unit for executing speech recognition ;
and a primary control circuit connected to said central processing unit for processing said inputted and audible commands and activating speech recognition in response to an event for a speech recognition time period and deactivating speech recognition after the speech recognition time period has elapsed .

US7979277B2
CLAIM 4
. A speech recognition circuit as claimed in claim 1 , wherein the first processor (one second) supports multi-threaded operation , and runs the search stage and front ends as separate threads .
US20020077830A1
CLAIM 10
. The method of claim 1 , wherein said step (d) further comprises receiving at least one second (first processor) command via speech recognition during the speech recognition time period and saving said at least one second command in a command buffer .

US7979277B2
CLAIM 6
. The speech recognition circuit of claim 1 , comprising control means (said input) adapted to implement frame dropping , to discard one or more audio time frames .
US20020077830A1
CLAIM 27
. A terminal capable of speech recognition , comprising : a central processing unit ;
a memory unit connected to said central processing unit ;
a primary input connected to said central processing unit for receiving inputted commands ;
a secondary input connected to said central processing unit for receiving audible commands ;
a speech recognition algorithm connected to said central processing unit for executing speech recognition ;
and a primary control circuit connected to said central processing unit for processing said input (control means) ted and audible commands and activating speech recognition in response to an event for a speech recognition time period and deactivating speech recognition after the speech recognition time period has elapsed .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6415256B1

Filed: 2000-11-27     Issued: 2002-07-02

Integrated handwriting and speed recognition systems

(Original Assignee) Richard Joseph Ditzik     (Current Assignee) NETAIRUS TECHNOLOGIES LLC

Richard Joseph Ditzik
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector (said display screen) from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US6415256B1
CLAIM 1
. A computer program , residing on one or more computer-readable mediums , comprising instructions for causing at least one computer system to : a) control data and information flow to and from said computer system and at least one user interface ;
b) receive speech input data spoken by a user via speech input means and convert said input data into computer recognizable data under control of said computer system ;
c) recognize said speech data (audio time frame, time frame) by identifying best matches to known words or phases of a spoken language and output recognized speech text or data ;
d) receive handwriting data from a user via a pen input means under control of said computer system , convert this data to electronic ink form of data and , at the option of the user , select recognition of said handwriting data ;
e) relate said recognized speech data with said recognized handwriting data , at the option of said user , so that enhanced understanding of said information is accomplished ;
and f) format for display said recognized speech data , said recognized handwriting data , or said converted electronic ink data .

US6415256B1
CLAIM 20
. A method of controlling a user interface for viewing and control of a speech and pen data processing system , said method comprising the steps of : a) displaying text , characters images , and/or graphics on a display screen of a display device ;
b) running an operating system supporting a graphic user interface for example Windows™ on said display screen (feature vector, feature calculation) of said display device ;
c) accepting pen input data from a pen input system and showing said pen input data on said display screen ;
d) accepting speech input from a speech input means , recognizing the speech input and showing recognized speech text on said display screen ;
and e) displaying , at the option of the user , system setup data of said speech and pen data processing system on said display screen .

US7979277B2
CLAIM 3
. A speech recognition circuit as claimed in claim 1 , comprising dynamic scheduling (more display) whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results (recognition accuracy) and/or availability of space for storing more feature vectors and/or distance results .
US6415256B1
CLAIM 13
. A computer program as recited in claim 1 , in which said computer program controls two or more display (dynamic scheduling, digital audio, digital audio signal) able cursors that are independently controlled by said computer program and user inputs .

US6415256B1
CLAIM 14
. A computer program , residing on one or more computer-readable mediums , comprising instructions for causing at least one computer system to : a) control certain program functions of computer system , including data input and data output of data ;
b) receive speech input spoken by a user and convert said speech input into computer recognizable data under control of said computer system ;
c) recognize said speech data by identifying best matches to known words or phases of a spoken language , wherein said received speech and recognized speech forms speech input means ;
d) receive handwriting data from a user via a pen input device under control of said computer system and recognize said handwriting data by identifying best matches to known written words or symbols , wherein the meaning of said handwriting data is either consistent with or inconsistent with said speech input data ;
and e) combine recognition results of said handwriting and said speech recognition data , in a manner to provide output text or data with improved recognition accuracy (distance results) .

US7979277B2
CLAIM 6
. The speech recognition circuit of claim 1 , comprising control means (writing data, said input) adapted to implement frame dropping , to discard one or more audio time frames .
US6415256B1
CLAIM 1
. A computer program , residing on one or more computer-readable mediums , comprising instructions for causing at least one computer system to : a) control data and information flow to and from said computer system and at least one user interface ;
b) receive speech input data spoken by a user via speech input means and convert said input (control means, result memory) data into computer recognizable data under control of said computer system ;
c) recognize said speech data by identifying best matches to known words or phases of a spoken language and output recognized speech text or data ;
d) receive handwriting data (control means, result memory) from a user via a pen input means under control of said computer system , convert this data to electronic ink form of data and , at the option of the user , select recognition of said handwriting data ;
e) relate said recognized speech data with said recognized handwriting data , at the option of said user , so that enhanced understanding of said information is accomplished ;
and f) format for display said recognized speech data , said recognized handwriting data , or said converted electronic ink data .

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector (said display screen) comprises a plurality of spectral components of an audio signal for a predetermined time frame (speech data) .
US6415256B1
CLAIM 1
. A computer program , residing on one or more computer-readable mediums , comprising instructions for causing at least one computer system to : a) control data and information flow to and from said computer system and at least one user interface ;
b) receive speech input data spoken by a user via speech input means and convert said input data into computer recognizable data under control of said computer system ;
c) recognize said speech data (audio time frame, time frame) by identifying best matches to known words or phases of a spoken language and output recognized speech text or data ;
d) receive handwriting data from a user via a pen input means under control of said computer system , convert this data to electronic ink form of data and , at the option of the user , select recognition of said handwriting data ;
e) relate said recognized speech data with said recognized handwriting data , at the option of said user , so that enhanced understanding of said information is accomplished ;
and f) format for display said recognized speech data , said recognized handwriting data , or said converted electronic ink data .

US6415256B1
CLAIM 20
. A method of controlling a user interface for viewing and control of a speech and pen data processing system , said method comprising the steps of : a) displaying text , characters images , and/or graphics on a display screen of a display device ;
b) running an operating system supporting a graphic user interface for example Windows™ on said display screen (feature vector, feature calculation) of said display device ;
c) accepting pen input data from a pen input system and showing said pen input data on said display screen ;
d) accepting speech input from a speech input means , recognizing the speech input and showing recognized speech text on said display screen ;
and e) displaying , at the option of the user , system setup data of said speech and pen data processing system on said display screen .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector (said display screen) from the front end .
US6415256B1
CLAIM 20
. A method of controlling a user interface for viewing and control of a speech and pen data processing system , said method comprising the steps of : a) displaying text , characters images , and/or graphics on a display screen of a display device ;
b) running an operating system supporting a graphic user interface for example Windows™ on said display screen (feature vector, feature calculation) of said display device ;
c) accepting pen input data from a pen input system and showing said pen input data on said display screen ;
d) accepting speech input from a speech input means , recognizing the speech input and showing recognized speech text on said display screen ;
and e) displaying , at the option of the user , system setup data of said speech and pen data processing system on said display screen .

US7979277B2
CLAIM 10
. The speech recognition circuit of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory (writing data, said input) .
US6415256B1
CLAIM 1
. A computer program , residing on one or more computer-readable mediums , comprising instructions for causing at least one computer system to : a) control data and information flow to and from said computer system and at least one user interface ;
b) receive speech input data spoken by a user via speech input means and convert said input (control means, result memory) data into computer recognizable data under control of said computer system ;
c) recognize said speech data by identifying best matches to known words or phases of a spoken language and output recognized speech text or data ;
d) receive handwriting data (control means, result memory) from a user via a pen input means under control of said computer system , convert this data to electronic ink form of data and , at the option of the user , select recognition of said handwriting data ;
e) relate said recognized speech data with said recognized handwriting data , at the option of said user , so that enhanced understanding of said information is accomplished ;
and f) format for display said recognized speech data , said recognized handwriting data , or said converted electronic ink data .

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end is configured to input a digital audio (more display) signal .
US6415256B1
CLAIM 13
. A computer program as recited in claim 1 , in which said computer program controls two or more display (dynamic scheduling, digital audio, digital audio signal) able cursors that are independently controlled by said computer program and user inputs .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector (said display screen) from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US6415256B1
CLAIM 1
. A computer program , residing on one or more computer-readable mediums , comprising instructions for causing at least one computer system to : a) control data and information flow to and from said computer system and at least one user interface ;
b) receive speech input data spoken by a user via speech input means and convert said input data into computer recognizable data under control of said computer system ;
c) recognize said speech data (audio time frame, time frame) by identifying best matches to known words or phases of a spoken language and output recognized speech text or data ;
d) receive handwriting data from a user via a pen input means under control of said computer system , convert this data to electronic ink form of data and , at the option of the user , select recognition of said handwriting data ;
e) relate said recognized speech data with said recognized handwriting data , at the option of said user , so that enhanced understanding of said information is accomplished ;
and f) format for display said recognized speech data , said recognized handwriting data , or said converted electronic ink data .

US6415256B1
CLAIM 20
. A method of controlling a user interface for viewing and control of a speech and pen data processing system , said method comprising the steps of : a) displaying text , characters images , and/or graphics on a display screen of a display device ;
b) running an operating system supporting a graphic user interface for example Windows™ on said display screen (feature vector, feature calculation) of said display device ;
c) accepting pen input data from a pen input system and showing said pen input data on said display screen ;
d) accepting speech input from a speech input means , recognizing the speech input and showing recognized speech text on said display screen ;
and e) displaying , at the option of the user , system setup data of said speech and pen data processing system on said display screen .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector (said display screen) from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US6415256B1
CLAIM 1
. A computer program , residing on one or more computer-readable mediums , comprising instructions for causing at least one computer system to : a) control data and information flow to and from said computer system and at least one user interface ;
b) receive speech input data spoken by a user via speech input means and convert said input data into computer recognizable data under control of said computer system ;
c) recognize said speech data (audio time frame, time frame) by identifying best matches to known words or phases of a spoken language and output recognized speech text or data ;
d) receive handwriting data from a user via a pen input means under control of said computer system , convert this data to electronic ink form of data and , at the option of the user , select recognition of said handwriting data ;
e) relate said recognized speech data with said recognized handwriting data , at the option of said user , so that enhanced understanding of said information is accomplished ;
and f) format for display said recognized speech data , said recognized handwriting data , or said converted electronic ink data .

US6415256B1
CLAIM 20
. A method of controlling a user interface for viewing and control of a speech and pen data processing system , said method comprising the steps of : a) displaying text , characters images , and/or graphics on a display screen of a display device ;
b) running an operating system supporting a graphic user interface for example Windows™ on said display screen (feature vector, feature calculation) of said display device ;
c) accepting pen input data from a pen input system and showing said pen input data on said display screen ;
d) accepting speech input from a speech input means , recognizing the speech input and showing recognized speech text on said display screen ;
and e) displaying , at the option of the user , system setup data of said speech and pen data processing system on said display screen .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector (said display screen) from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation (said display screen) , to the distance calculation , and to the word identification (more port) .
US6415256B1
CLAIM 1
. A computer program , residing on one or more computer-readable mediums , comprising instructions for causing at least one computer system to : a) control data and information flow to and from said computer system and at least one user interface ;
b) receive speech input data spoken by a user via speech input means and convert said input data into computer recognizable data under control of said computer system ;
c) recognize said speech data (audio time frame, time frame) by identifying best matches to known words or phases of a spoken language and output recognized speech text or data ;
d) receive handwriting data from a user via a pen input means under control of said computer system , convert this data to electronic ink form of data and , at the option of the user , select recognition of said handwriting data ;
e) relate said recognized speech data with said recognized handwriting data , at the option of said user , so that enhanced understanding of said information is accomplished ;
and f) format for display said recognized speech data , said recognized handwriting data , or said converted electronic ink data .

US6415256B1
CLAIM 16
. A computer program as recited in claim 14 , further causing the computer system to a) receive user input to select one or more port (word identification) ions of said output text or data ;
and b) process additional cycles of speech input functions or pen input functions for additional information recognition on said user selected portion of said output text or data .

US6415256B1
CLAIM 20
. A method of controlling a user interface for viewing and control of a speech and pen data processing system , said method comprising the steps of : a) displaying text , characters images , and/or graphics on a display screen of a display device ;
b) running an operating system supporting a graphic user interface for example Windows™ on said display screen (feature vector, feature calculation) of said display device ;
c) accepting pen input data from a pen input system and showing said pen input data on said display screen ;
d) accepting speech input from a speech input means , recognizing the speech input and showing recognized speech text on said display screen ;
and e) displaying , at the option of the user , system setup data of said speech and pen data processing system on said display screen .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
EP1107227A2

Filed: 2000-11-21     Issued: 2001-06-13

Voice processing

(Original Assignee) Sony Corp     (Current Assignee) Sony Corp

Yasuharu Asano, Hongchang Pao
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit (control means, input voice) for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
EP1107227A2
CLAIM 1
An voice processing device built into a robot , said voice processing device comprising : voice processing means for processing voice ;
and control means (calculating circuit, control means, distance calculation, feature calculation) for controlling voice processing by said voice processing means , based on the state of said robot .

EP1107227A2
CLAIM 6
An voice processing device according to Claim 1 , wherein said voice processing means extract the control pitch information or phonemics information of the input voice (calculating circuit, control means, distance calculation, feature calculation) ;
and wherein the emotion state of said robot is changed based on said pitch information or phonemics information , or said robot takes actions corresponding to said pitch information or phonemics information .

US7979277B2
CLAIM 5
. A speech recognition circuit as claimed in claim 1 , wherein the said calculating circuit (control means, input voice) is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
EP1107227A2
CLAIM 1
An voice processing device built into a robot , said voice processing device comprising : voice processing means for processing voice ;
and control means (calculating circuit, control means, distance calculation, feature calculation) for controlling voice processing by said voice processing means , based on the state of said robot .

EP1107227A2
CLAIM 6
An voice processing device according to Claim 1 , wherein said voice processing means extract the control pitch information or phonemics information of the input voice (calculating circuit, control means, distance calculation, feature calculation) ;
and wherein the emotion state of said robot is changed based on said pitch information or phonemics information , or said robot takes actions corresponding to said pitch information or phonemics information .

US7979277B2
CLAIM 6
. The speech recognition circuit of claim 1 , comprising control means (control means, input voice) adapted to implement frame dropping , to discard one or more audio time frames .
EP1107227A2
CLAIM 1
An voice processing device built into a robot , said voice processing device comprising : voice processing means for processing voice ;
and control means (calculating circuit, control means, distance calculation, feature calculation) for controlling voice processing by said voice processing means , based on the state of said robot .

EP1107227A2
CLAIM 6
An voice processing device according to Claim 1 , wherein said voice processing means extract the control pitch information or phonemics information of the input voice (calculating circuit, control means, distance calculation, feature calculation) ;
and wherein the emotion state of said robot is changed based on said pitch information or phonemics information , or said robot takes actions corresponding to said pitch information or phonemics information .

US7979277B2
CLAIM 8
. The speech recognition circuit of claim 1 , wherein the processor is configured to divert to another task if the data flow (recording program) stalls .
EP1107227A2
CLAIM 11
A recording medium recording program (data flow) s to be executed by a computer , for causing a robot to perform voice processing , said program comprising : an voice processing step for processing voice ;
and a control step for controlling voice processing in said voice processing step , based on the state of said robot .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator (voice recognition) has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
EP1107227A2
CLAIM 7
An voice processing device according to Claim 1 , wherein said voice processing means comprises voice recognizing means for recognizing input voice ;
and wherein said robot takes actions corresponding to the reliability of the voice recognition (speech accelerator) results output from said voice recognizing means , or the emotion state of said robot is changed based on said reliability .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow (recording program) .
EP1107227A2
CLAIM 11
A recording medium recording program (data flow) s to be executed by a computer , for causing a robot to perform voice processing , said program comprising : an voice processing step for processing voice ;
and a control step for controlling voice processing in said voice processing step , based on the state of said robot .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit (control means, input voice) ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
EP1107227A2
CLAIM 1
An voice processing device built into a robot , said voice processing device comprising : voice processing means for processing voice ;
and control means (calculating circuit, control means, distance calculation, feature calculation) for controlling voice processing by said voice processing means , based on the state of said robot .

EP1107227A2
CLAIM 6
An voice processing device according to Claim 1 , wherein said voice processing means extract the control pitch information or phonemics information of the input voice (calculating circuit, control means, distance calculation, feature calculation) ;
and wherein the emotion state of said robot is changed based on said pitch information or phonemics information , or said robot takes actions corresponding to said pitch information or phonemics information .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation (control means, input voice) , to the distance calculation (control means, input voice) , and to the word identification .
EP1107227A2
CLAIM 1
An voice processing device built into a robot , said voice processing device comprising : voice processing means for processing voice ;
and control means (calculating circuit, control means, distance calculation, feature calculation) for controlling voice processing by said voice processing means , based on the state of said robot .

EP1107227A2
CLAIM 6
An voice processing device according to Claim 1 , wherein said voice processing means extract the control pitch information or phonemics information of the input voice (calculating circuit, control means, distance calculation, feature calculation) ;
and wherein the emotion state of said robot is changed based on said pitch information or phonemics information , or said robot takes actions corresponding to said pitch information or phonemics information .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
CN1293428A

Filed: 2000-11-10     Issued: 2001-05-02

基于语音识别的信息校核方法

(Original Assignee) Tsinghua University     (Current Assignee) Tsinghua University ; Qinghua University ; QINGHUA UNIV

刘加, 单翼翔, 刘润生
US7979277B2
CLAIM 1
. A speech recognition circuit (识别结果) , comprising : an audio front end for calculating a feature vector from an audio signal (音命令) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (音命令) ;

a calculating circuit (计算机中) for calculating distances (计算机中) indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
CN1293428A
CLAIM 1
. 本发明提出的一种基于语音识别的信息校核方法,包括语音信号的端点检测及语音识别参数提取、非特定人语音识别模型的预先训练、非特定人语音识别、语音识别置信测度与拒识模型、语音识别置信测度与拒识模型、非特定人语音识别的说话人自适应学习、语音识别词条的生成、语音提示各部分,具体包括以下步骤:A、语音信号的端点检测及语音识别参数提取:(1)语音信号通过计算机的声卡A/D进行采样成为原始数字语音信号;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)使用语音信号的短时能量、波形走势特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;(4)对分帧加窗处理后的语音信号进行语音(识别)特征提取;B、非特定人语音识别模型的预先训练:(1)预先采集大量的语音数据,建立训练语音数据库,采集的语音与要识别的语音的语言种类相一致;(2)从所说的数据库中的语音信号提取语音特征参数,然后在PC机上通过预先的学习过程将这些特征参数转变成识别模型的参数;识别模型采用基于音素隐含码尔科夫模型(Hidden Markov Model,HMM),训练的方法是根据最大似然准则,对HMM模型参数(包均值与方差)进行估值;C、非特定人语音识别:(1)将所说的语音特征与语音识别模型进行模式匹配,通过N-best维特比(Viterbi)帧同步束搜索算法,实时地提取前三选最好识别结果 (speech recognition circuit, distance results) ,在识别搜索过程中保留了所有有用“关键词”信息,不需要再进行回溯;(2)输入语音信息,每校核一条该语音信息,就自动剪掉该词条对应的语音发音模板,减少搜索空间,以提高校核过程的语音识别速度与识别精度。识别过程的语言模型采用基于多子树三元词对文法;D、语音识别置信测度与拒识模型:在维特比(Viterbi)帧同步束搜索过程中结合置信测度与拒识模型的计算。通过判定识别语音的置信度的大小,确定是否接受或拒识该语音识别结果,同时拒掉在操作过程的无关语音;E、非特定人语音识别的说话人自适应学习;采用说话人自适应方法对识别模型进行调整;所说的自适应方法采用最大后验概率方法,通过迭代方法逐步修正识别模板参数;F、语音识别词条的生成:根据需要校核的数据文本信息,借助发音字典自动生成要识别词条的语音发音模板。输入的语音信息与这些发音模板信息通过前面的非特定人语音识别进行比较;发音字典由识别词汇汉字与对应的汉语拼音构成,预先存放在计算机中 (calculating circuit, calculating means, feature calculation, distance calculation, calculating distances, computing extra front frames) ;G、语音提示:采用语音合成技术进行语音提示,语音合成模型参数分析提取过程在计算机上通过预先处理后完成,并存储在计算机的硬盘中用于语音合成,语音合成模型使用码激励语音编码模型;语音提示用于回放识别结果,若回放语音与输入语音一致,则表示识别结果正确;若不一致,则要求使用者读入语音命令 (audio signal, audio time frame) ,重新进行该语音命令的识别。

US7979277B2
CLAIM 2
. A speech recognition circuit (识别结果) as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor .
CN1293428A
CLAIM 1
. 本发明提出的一种基于语音识别的信息校核方法,包括语音信号的端点检测及语音识别参数提取、非特定人语音识别模型的预先训练、非特定人语音识别、语音识别置信测度与拒识模型、语音识别置信测度与拒识模型、非特定人语音识别的说话人自适应学习、语音识别词条的生成、语音提示各部分,具体包括以下步骤:A、语音信号的端点检测及语音识别参数提取:(1)语音信号通过计算机的声卡A/D进行采样成为原始数字语音信号;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)使用语音信号的短时能量、波形走势特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;(4)对分帧加窗处理后的语音信号进行语音(识别)特征提取;B、非特定人语音识别模型的预先训练:(1)预先采集大量的语音数据,建立训练语音数据库,采集的语音与要识别的语音的语言种类相一致;(2)从所说的数据库中的语音信号提取语音特征参数,然后在PC机上通过预先的学习过程将这些特征参数转变成识别模型的参数;识别模型采用基于音素隐含码尔科夫模型(Hidden Markov Model,HMM),训练的方法是根据最大似然准则,对HMM模型参数(包均值与方差)进行估值;C、非特定人语音识别:(1)将所说的语音特征与语音识别模型进行模式匹配,通过N-best维特比(Viterbi)帧同步束搜索算法,实时地提取前三选最好识别结果 (speech recognition circuit, distance results) ,在识别搜索过程中保留了所有有用“关键词”信息,不需要再进行回溯;(2)输入语音信息,每校核一条该语音信息,就自动剪掉该词条对应的语音发音模板,减少搜索空间,以提高校核过程的语音识别速度与识别精度。识别过程的语言模型采用基于多子树三元词对文法;D、语音识别置信测度与拒识模型:在维特比(Viterbi)帧同步束搜索过程中结合置信测度与拒识模型的计算。通过判定识别语音的置信度的大小,确定是否接受或拒识该语音识别结果,同时拒掉在操作过程的无关语音;E、非特定人语音识别的说话人自适应学习;采用说话人自适应方法对识别模型进行调整;所说的自适应方法采用最大后验概率方法,通过迭代方法逐步修正识别模板参数;F、语音识别词条的生成:根据需要校核的数据文本信息,借助发音字典自动生成要识别词条的语音发音模板。输入的语音信息与这些发音模板信息通过前面的非特定人语音识别进行比较;发音字典由识别词汇汉字与对应的汉语拼音构成,预先存放在计算机中;G、语音提示:采用语音合成技术进行语音提示,语音合成模型参数分析提取过程在计算机上通过预先处理后完成,并存储在计算机的硬盘中用于语音合成,语音合成模型使用码激励语音编码模型;语音提示用于回放识别结果,若回放语音与输入语音一致,则表示识别结果正确;若不一致,则要求使用者读入语音命令,重新进行该语音命令的识别。

US7979277B2
CLAIM 3
. A speech recognition circuit (识别结果) as claimed in claim 1 , comprising dynamic scheduling whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results (识别结果) and/or availability of space for storing more feature vectors and/or distance results .
CN1293428A
CLAIM 1
. 本发明提出的一种基于语音识别的信息校核方法,包括语音信号的端点检测及语音识别参数提取、非特定人语音识别模型的预先训练、非特定人语音识别、语音识别置信测度与拒识模型、语音识别置信测度与拒识模型、非特定人语音识别的说话人自适应学习、语音识别词条的生成、语音提示各部分,具体包括以下步骤:A、语音信号的端点检测及语音识别参数提取:(1)语音信号通过计算机的声卡A/D进行采样成为原始数字语音信号;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)使用语音信号的短时能量、波形走势特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;(4)对分帧加窗处理后的语音信号进行语音(识别)特征提取;B、非特定人语音识别模型的预先训练:(1)预先采集大量的语音数据,建立训练语音数据库,采集的语音与要识别的语音的语言种类相一致;(2)从所说的数据库中的语音信号提取语音特征参数,然后在PC机上通过预先的学习过程将这些特征参数转变成识别模型的参数;识别模型采用基于音素隐含码尔科夫模型(Hidden Markov Model,HMM),训练的方法是根据最大似然准则,对HMM模型参数(包均值与方差)进行估值;C、非特定人语音识别:(1)将所说的语音特征与语音识别模型进行模式匹配,通过N-best维特比(Viterbi)帧同步束搜索算法,实时地提取前三选最好识别结果 (speech recognition circuit, distance results) ,在识别搜索过程中保留了所有有用“关键词”信息,不需要再进行回溯;(2)输入语音信息,每校核一条该语音信息,就自动剪掉该词条对应的语音发音模板,减少搜索空间,以提高校核过程的语音识别速度与识别精度。识别过程的语言模型采用基于多子树三元词对文法;D、语音识别置信测度与拒识模型:在维特比(Viterbi)帧同步束搜索过程中结合置信测度与拒识模型的计算。通过判定识别语音的置信度的大小,确定是否接受或拒识该语音识别结果,同时拒掉在操作过程的无关语音;E、非特定人语音识别的说话人自适应学习;采用说话人自适应方法对识别模型进行调整;所说的自适应方法采用最大后验概率方法,通过迭代方法逐步修正识别模板参数;F、语音识别词条的生成:根据需要校核的数据文本信息,借助发音字典自动生成要识别词条的语音发音模板。输入的语音信息与这些发音模板信息通过前面的非特定人语音识别进行比较;发音字典由识别词汇汉字与对应的汉语拼音构成,预先存放在计算机中;G、语音提示:采用语音合成技术进行语音提示,语音合成模型参数分析提取过程在计算机上通过预先处理后完成,并存储在计算机的硬盘中用于语音合成,语音合成模型使用码激励语音编码模型;语音提示用于回放识别结果,若回放语音与输入语音一致,则表示识别结果正确;若不一致,则要求使用者读入语音命令,重新进行该语音命令的识别。

US7979277B2
CLAIM 4
. A speech recognition circuit (识别结果) as claimed in claim 1 , wherein the first processor supports multi-threaded operation , and runs the search stage and front ends as separate threads .
CN1293428A
CLAIM 1
. 本发明提出的一种基于语音识别的信息校核方法,包括语音信号的端点检测及语音识别参数提取、非特定人语音识别模型的预先训练、非特定人语音识别、语音识别置信测度与拒识模型、语音识别置信测度与拒识模型、非特定人语音识别的说话人自适应学习、语音识别词条的生成、语音提示各部分,具体包括以下步骤:A、语音信号的端点检测及语音识别参数提取:(1)语音信号通过计算机的声卡A/D进行采样成为原始数字语音信号;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)使用语音信号的短时能量、波形走势特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;(4)对分帧加窗处理后的语音信号进行语音(识别)特征提取;B、非特定人语音识别模型的预先训练:(1)预先采集大量的语音数据,建立训练语音数据库,采集的语音与要识别的语音的语言种类相一致;(2)从所说的数据库中的语音信号提取语音特征参数,然后在PC机上通过预先的学习过程将这些特征参数转变成识别模型的参数;识别模型采用基于音素隐含码尔科夫模型(Hidden Markov Model,HMM),训练的方法是根据最大似然准则,对HMM模型参数(包均值与方差)进行估值;C、非特定人语音识别:(1)将所说的语音特征与语音识别模型进行模式匹配,通过N-best维特比(Viterbi)帧同步束搜索算法,实时地提取前三选最好识别结果 (speech recognition circuit, distance results) ,在识别搜索过程中保留了所有有用“关键词”信息,不需要再进行回溯;(2)输入语音信息,每校核一条该语音信息,就自动剪掉该词条对应的语音发音模板,减少搜索空间,以提高校核过程的语音识别速度与识别精度。识别过程的语言模型采用基于多子树三元词对文法;D、语音识别置信测度与拒识模型:在维特比(Viterbi)帧同步束搜索过程中结合置信测度与拒识模型的计算。通过判定识别语音的置信度的大小,确定是否接受或拒识该语音识别结果,同时拒掉在操作过程的无关语音;E、非特定人语音识别的说话人自适应学习;采用说话人自适应方法对识别模型进行调整;所说的自适应方法采用最大后验概率方法,通过迭代方法逐步修正识别模板参数;F、语音识别词条的生成:根据需要校核的数据文本信息,借助发音字典自动生成要识别词条的语音发音模板。输入的语音信息与这些发音模板信息通过前面的非特定人语音识别进行比较;发音字典由识别词汇汉字与对应的汉语拼音构成,预先存放在计算机中;G、语音提示:采用语音合成技术进行语音提示,语音合成模型参数分析提取过程在计算机上通过预先处理后完成,并存储在计算机的硬盘中用于语音合成,语音合成模型使用码激励语音编码模型;语音提示用于回放识别结果,若回放语音与输入语音一致,则表示识别结果正确;若不一致,则要求使用者读入语音命令,重新进行该语音命令的识别。

US7979277B2
CLAIM 5
. A speech recognition circuit (识别结果) as claimed in claim 1 , wherein the said calculating circuit (计算机中) is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
CN1293428A
CLAIM 1
. 本发明提出的一种基于语音识别的信息校核方法,包括语音信号的端点检测及语音识别参数提取、非特定人语音识别模型的预先训练、非特定人语音识别、语音识别置信测度与拒识模型、语音识别置信测度与拒识模型、非特定人语音识别的说话人自适应学习、语音识别词条的生成、语音提示各部分,具体包括以下步骤:A、语音信号的端点检测及语音识别参数提取:(1)语音信号通过计算机的声卡A/D进行采样成为原始数字语音信号;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)使用语音信号的短时能量、波形走势特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;(4)对分帧加窗处理后的语音信号进行语音(识别)特征提取;B、非特定人语音识别模型的预先训练:(1)预先采集大量的语音数据,建立训练语音数据库,采集的语音与要识别的语音的语言种类相一致;(2)从所说的数据库中的语音信号提取语音特征参数,然后在PC机上通过预先的学习过程将这些特征参数转变成识别模型的参数;识别模型采用基于音素隐含码尔科夫模型(Hidden Markov Model,HMM),训练的方法是根据最大似然准则,对HMM模型参数(包均值与方差)进行估值;C、非特定人语音识别:(1)将所说的语音特征与语音识别模型进行模式匹配,通过N-best维特比(Viterbi)帧同步束搜索算法,实时地提取前三选最好识别结果 (speech recognition circuit, distance results) ,在识别搜索过程中保留了所有有用“关键词”信息,不需要再进行回溯;(2)输入语音信息,每校核一条该语音信息,就自动剪掉该词条对应的语音发音模板,减少搜索空间,以提高校核过程的语音识别速度与识别精度。识别过程的语言模型采用基于多子树三元词对文法;D、语音识别置信测度与拒识模型:在维特比(Viterbi)帧同步束搜索过程中结合置信测度与拒识模型的计算。通过判定识别语音的置信度的大小,确定是否接受或拒识该语音识别结果,同时拒掉在操作过程的无关语音;E、非特定人语音识别的说话人自适应学习;采用说话人自适应方法对识别模型进行调整;所说的自适应方法采用最大后验概率方法,通过迭代方法逐步修正识别模板参数;F、语音识别词条的生成:根据需要校核的数据文本信息,借助发音字典自动生成要识别词条的语音发音模板。输入的语音信息与这些发音模板信息通过前面的非特定人语音识别进行比较;发音字典由识别词汇汉字与对应的汉语拼音构成,预先存放在计算机中 (calculating circuit, calculating means, feature calculation, distance calculation, calculating distances, computing extra front frames) ;G、语音提示:采用语音合成技术进行语音提示,语音合成模型参数分析提取过程在计算机上通过预先处理后完成,并存储在计算机的硬盘中用于语音合成,语音合成模型使用码激励语音编码模型;语音提示用于回放识别结果,若回放语音与输入语音一致,则表示识别结果正确;若不一致,则要求使用者读入语音命令,重新进行该语音命令的识别。

US7979277B2
CLAIM 6
. The speech recognition circuit (识别结果) of claim 1 , comprising control means adapted to implement frame dropping , to discard one or more audio time frames .
CN1293428A
CLAIM 1
. 本发明提出的一种基于语音识别的信息校核方法,包括语音信号的端点检测及语音识别参数提取、非特定人语音识别模型的预先训练、非特定人语音识别、语音识别置信测度与拒识模型、语音识别置信测度与拒识模型、非特定人语音识别的说话人自适应学习、语音识别词条的生成、语音提示各部分,具体包括以下步骤:A、语音信号的端点检测及语音识别参数提取:(1)语音信号通过计算机的声卡A/D进行采样成为原始数字语音信号;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)使用语音信号的短时能量、波形走势特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;(4)对分帧加窗处理后的语音信号进行语音(识别)特征提取;B、非特定人语音识别模型的预先训练:(1)预先采集大量的语音数据,建立训练语音数据库,采集的语音与要识别的语音的语言种类相一致;(2)从所说的数据库中的语音信号提取语音特征参数,然后在PC机上通过预先的学习过程将这些特征参数转变成识别模型的参数;识别模型采用基于音素隐含码尔科夫模型(Hidden Markov Model,HMM),训练的方法是根据最大似然准则,对HMM模型参数(包均值与方差)进行估值;C、非特定人语音识别:(1)将所说的语音特征与语音识别模型进行模式匹配,通过N-best维特比(Viterbi)帧同步束搜索算法,实时地提取前三选最好识别结果 (speech recognition circuit, distance results) ,在识别搜索过程中保留了所有有用“关键词”信息,不需要再进行回溯;(2)输入语音信息,每校核一条该语音信息,就自动剪掉该词条对应的语音发音模板,减少搜索空间,以提高校核过程的语音识别速度与识别精度。识别过程的语言模型采用基于多子树三元词对文法;D、语音识别置信测度与拒识模型:在维特比(Viterbi)帧同步束搜索过程中结合置信测度与拒识模型的计算。通过判定识别语音的置信度的大小,确定是否接受或拒识该语音识别结果,同时拒掉在操作过程的无关语音;E、非特定人语音识别的说话人自适应学习;采用说话人自适应方法对识别模型进行调整;所说的自适应方法采用最大后验概率方法,通过迭代方法逐步修正识别模板参数;F、语音识别词条的生成:根据需要校核的数据文本信息,借助发音字典自动生成要识别词条的语音发音模板。输入的语音信息与这些发音模板信息通过前面的非特定人语音识别进行比较;发音字典由识别词汇汉字与对应的汉语拼音构成,预先存放在计算机中;G、语音提示:采用语音合成技术进行语音提示,语音合成模型参数分析提取过程在计算机上通过预先处理后完成,并存储在计算机的硬盘中用于语音合成,语音合成模型使用码激励语音编码模型;语音提示用于回放识别结果,若回放语音与输入语音一致,则表示识别结果正确;若不一致,则要求使用者读入语音命令,重新进行该语音命令的识别。

US7979277B2
CLAIM 7
. The speech recognition circuit (识别结果) of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal (音命令) for a predetermined time frame .
CN1293428A
CLAIM 1
. 本发明提出的一种基于语音识别的信息校核方法,包括语音信号的端点检测及语音识别参数提取、非特定人语音识别模型的预先训练、非特定人语音识别、语音识别置信测度与拒识模型、语音识别置信测度与拒识模型、非特定人语音识别的说话人自适应学习、语音识别词条的生成、语音提示各部分,具体包括以下步骤:A、语音信号的端点检测及语音识别参数提取:(1)语音信号通过计算机的声卡A/D进行采样成为原始数字语音信号;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)使用语音信号的短时能量、波形走势特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;(4)对分帧加窗处理后的语音信号进行语音(识别)特征提取;B、非特定人语音识别模型的预先训练:(1)预先采集大量的语音数据,建立训练语音数据库,采集的语音与要识别的语音的语言种类相一致;(2)从所说的数据库中的语音信号提取语音特征参数,然后在PC机上通过预先的学习过程将这些特征参数转变成识别模型的参数;识别模型采用基于音素隐含码尔科夫模型(Hidden Markov Model,HMM),训练的方法是根据最大似然准则,对HMM模型参数(包均值与方差)进行估值;C、非特定人语音识别:(1)将所说的语音特征与语音识别模型进行模式匹配,通过N-best维特比(Viterbi)帧同步束搜索算法,实时地提取前三选最好识别结果 (speech recognition circuit, distance results) ,在识别搜索过程中保留了所有有用“关键词”信息,不需要再进行回溯;(2)输入语音信息,每校核一条该语音信息,就自动剪掉该词条对应的语音发音模板,减少搜索空间,以提高校核过程的语音识别速度与识别精度。识别过程的语言模型采用基于多子树三元词对文法;D、语音识别置信测度与拒识模型:在维特比(Viterbi)帧同步束搜索过程中结合置信测度与拒识模型的计算。通过判定识别语音的置信度的大小,确定是否接受或拒识该语音识别结果,同时拒掉在操作过程的无关语音;E、非特定人语音识别的说话人自适应学习;采用说话人自适应方法对识别模型进行调整;所说的自适应方法采用最大后验概率方法,通过迭代方法逐步修正识别模板参数;F、语音识别词条的生成:根据需要校核的数据文本信息,借助发音字典自动生成要识别词条的语音发音模板。输入的语音信息与这些发音模板信息通过前面的非特定人语音识别进行比较;发音字典由识别词汇汉字与对应的汉语拼音构成,预先存放在计算机中;G、语音提示:采用语音合成技术进行语音提示,语音合成模型参数分析提取过程在计算机上通过预先处理后完成,并存储在计算机的硬盘中用于语音合成,语音合成模型使用码激励语音编码模型;语音提示用于回放识别结果,若回放语音与输入语音一致,则表示识别结果正确;若不一致,则要求使用者读入语音命令 (audio signal, audio time frame) ,重新进行该语音命令的识别。

US7979277B2
CLAIM 8
. The speech recognition circuit (识别结果) of claim 1 , wherein the processor is configured to divert to another task if the data flow stalls .
CN1293428A
CLAIM 1
. 本发明提出的一种基于语音识别的信息校核方法,包括语音信号的端点检测及语音识别参数提取、非特定人语音识别模型的预先训练、非特定人语音识别、语音识别置信测度与拒识模型、语音识别置信测度与拒识模型、非特定人语音识别的说话人自适应学习、语音识别词条的生成、语音提示各部分,具体包括以下步骤:A、语音信号的端点检测及语音识别参数提取:(1)语音信号通过计算机的声卡A/D进行采样成为原始数字语音信号;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)使用语音信号的短时能量、波形走势特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;(4)对分帧加窗处理后的语音信号进行语音(识别)特征提取;B、非特定人语音识别模型的预先训练:(1)预先采集大量的语音数据,建立训练语音数据库,采集的语音与要识别的语音的语言种类相一致;(2)从所说的数据库中的语音信号提取语音特征参数,然后在PC机上通过预先的学习过程将这些特征参数转变成识别模型的参数;识别模型采用基于音素隐含码尔科夫模型(Hidden Markov Model,HMM),训练的方法是根据最大似然准则,对HMM模型参数(包均值与方差)进行估值;C、非特定人语音识别:(1)将所说的语音特征与语音识别模型进行模式匹配,通过N-best维特比(Viterbi)帧同步束搜索算法,实时地提取前三选最好识别结果 (speech recognition circuit, distance results) ,在识别搜索过程中保留了所有有用“关键词”信息,不需要再进行回溯;(2)输入语音信息,每校核一条该语音信息,就自动剪掉该词条对应的语音发音模板,减少搜索空间,以提高校核过程的语音识别速度与识别精度。识别过程的语言模型采用基于多子树三元词对文法;D、语音识别置信测度与拒识模型:在维特比(Viterbi)帧同步束搜索过程中结合置信测度与拒识模型的计算。通过判定识别语音的置信度的大小,确定是否接受或拒识该语音识别结果,同时拒掉在操作过程的无关语音;E、非特定人语音识别的说话人自适应学习;采用说话人自适应方法对识别模型进行调整;所说的自适应方法采用最大后验概率方法,通过迭代方法逐步修正识别模板参数;F、语音识别词条的生成:根据需要校核的数据文本信息,借助发音字典自动生成要识别词条的语音发音模板。输入的语音信息与这些发音模板信息通过前面的非特定人语音识别进行比较;发音字典由识别词汇汉字与对应的汉语拼音构成,预先存放在计算机中;G、语音提示:采用语音合成技术进行语音提示,语音合成模型参数分析提取过程在计算机上通过预先处理后完成,并存储在计算机的硬盘中用于语音合成,语音合成模型使用码激励语音编码模型;语音提示用于回放识别结果,若回放语音与输入语音一致,则表示识别结果正确;若不一致,则要求使用者读入语音命令,重新进行该语音命令的识别。

US7979277B2
CLAIM 9
. The speech recognition circuit (识别结果) of claim 1 , wherein the speech accelerator (语音识别方法, 语音编码) has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
CN1293428A
CLAIM 1
. 本发明提出的一种基于语音识别的信息校核方法,包括语音信号的端点检测及语音识别参数提取、非特定人语音识别模型的预先训练、非特定人语音识别、语音识别置信测度与拒识模型、语音识别置信测度与拒识模型、非特定人语音识别的说话人自适应学习、语音识别词条的生成、语音提示各部分,具体包括以下步骤:A、语音信号的端点检测及语音识别参数提取:(1)语音信号通过计算机的声卡A/D进行采样成为原始数字语音信号;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)使用语音信号的短时能量、波形走势特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;(4)对分帧加窗处理后的语音信号进行语音(识别)特征提取;B、非特定人语音识别模型的预先训练:(1)预先采集大量的语音数据,建立训练语音数据库,采集的语音与要识别的语音的语言种类相一致;(2)从所说的数据库中的语音信号提取语音特征参数,然后在PC机上通过预先的学习过程将这些特征参数转变成识别模型的参数;识别模型采用基于音素隐含码尔科夫模型(Hidden Markov Model,HMM),训练的方法是根据最大似然准则,对HMM模型参数(包均值与方差)进行估值;C、非特定人语音识别:(1)将所说的语音特征与语音识别模型进行模式匹配,通过N-best维特比(Viterbi)帧同步束搜索算法,实时地提取前三选最好识别结果 (speech recognition circuit, distance results) ,在识别搜索过程中保留了所有有用“关键词”信息,不需要再进行回溯;(2)输入语音信息,每校核一条该语音信息,就自动剪掉该词条对应的语音发音模板,减少搜索空间,以提高校核过程的语音识别速度与识别精度。识别过程的语言模型采用基于多子树三元词对文法;D、语音识别置信测度与拒识模型:在维特比(Viterbi)帧同步束搜索过程中结合置信测度与拒识模型的计算。通过判定识别语音的置信度的大小,确定是否接受或拒识该语音识别结果,同时拒掉在操作过程的无关语音;E、非特定人语音识别的说话人自适应学习;采用说话人自适应方法对识别模型进行调整;所说的自适应方法采用最大后验概率方法,通过迭代方法逐步修正识别模板参数;F、语音识别词条的生成:根据需要校核的数据文本信息,借助发音字典自动生成要识别词条的语音发音模板。输入的语音信息与这些发音模板信息通过前面的非特定人语音识别进行比较;发音字典由识别词汇汉字与对应的汉语拼音构成,预先存放在计算机中;G、语音提示:采用语音合成技术进行语音提示,语音合成模型参数分析提取过程在计算机上通过预先处理后完成,并存储在计算机的硬盘中用于语音合成,语音合成模型使用码激励语音编码 (speech accelerator) 模型;语音提示用于回放识别结果,若回放语音与输入语音一致,则表示识别结果正确;若不一致,则要求使用者读入语音命令,重新进行该语音命令的识别。

CN1293428A
CLAIM 2
. 如权利要求1所述的基于信息校核的语音识别方法 (speech accelerator) ,其特征在于,所说的语音信号的端点检测及语音识别参数提取采用语音/噪声最大似然判决器与波形走势判决器结合的检测方法;所说的语音识别特征参数提取是根据人耳的听觉特性计算出来的一种MFCC特征矢量参数。

US7979277B2
CLAIM 10
. The speech recognition circuit (识别结果) of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory .
CN1293428A
CLAIM 1
. 本发明提出的一种基于语音识别的信息校核方法,包括语音信号的端点检测及语音识别参数提取、非特定人语音识别模型的预先训练、非特定人语音识别、语音识别置信测度与拒识模型、语音识别置信测度与拒识模型、非特定人语音识别的说话人自适应学习、语音识别词条的生成、语音提示各部分,具体包括以下步骤:A、语音信号的端点检测及语音识别参数提取:(1)语音信号通过计算机的声卡A/D进行采样成为原始数字语音信号;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)使用语音信号的短时能量、波形走势特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;(4)对分帧加窗处理后的语音信号进行语音(识别)特征提取;B、非特定人语音识别模型的预先训练:(1)预先采集大量的语音数据,建立训练语音数据库,采集的语音与要识别的语音的语言种类相一致;(2)从所说的数据库中的语音信号提取语音特征参数,然后在PC机上通过预先的学习过程将这些特征参数转变成识别模型的参数;识别模型采用基于音素隐含码尔科夫模型(Hidden Markov Model,HMM),训练的方法是根据最大似然准则,对HMM模型参数(包均值与方差)进行估值;C、非特定人语音识别:(1)将所说的语音特征与语音识别模型进行模式匹配,通过N-best维特比(Viterbi)帧同步束搜索算法,实时地提取前三选最好识别结果 (speech recognition circuit, distance results) ,在识别搜索过程中保留了所有有用“关键词”信息,不需要再进行回溯;(2)输入语音信息,每校核一条该语音信息,就自动剪掉该词条对应的语音发音模板,减少搜索空间,以提高校核过程的语音识别速度与识别精度。识别过程的语言模型采用基于多子树三元词对文法;D、语音识别置信测度与拒识模型:在维特比(Viterbi)帧同步束搜索过程中结合置信测度与拒识模型的计算。通过判定识别语音的置信度的大小,确定是否接受或拒识该语音识别结果,同时拒掉在操作过程的无关语音;E、非特定人语音识别的说话人自适应学习;采用说话人自适应方法对识别模型进行调整;所说的自适应方法采用最大后验概率方法,通过迭代方法逐步修正识别模板参数;F、语音识别词条的生成:根据需要校核的数据文本信息,借助发音字典自动生成要识别词条的语音发音模板。输入的语音信息与这些发音模板信息通过前面的非特定人语音识别进行比较;发音字典由识别词汇汉字与对应的汉语拼音构成,预先存放在计算机中;G、语音提示:采用语音合成技术进行语音提示,语音合成模型参数分析提取过程在计算机上通过预先处理后完成,并存储在计算机的硬盘中用于语音合成,语音合成模型使用码激励语音编码模型;语音提示用于回放识别结果,若回放语音与输入语音一致,则表示识别结果正确;若不一致,则要求使用者读入语音命令,重新进行该语音命令的识别。

US7979277B2
CLAIM 11
. The speech recognition circuit (识别结果) of claim 1 , comprising increasing the pipeline depth by computing extra front frames (计算机中) in advance .
CN1293428A
CLAIM 1
. 本发明提出的一种基于语音识别的信息校核方法,包括语音信号的端点检测及语音识别参数提取、非特定人语音识别模型的预先训练、非特定人语音识别、语音识别置信测度与拒识模型、语音识别置信测度与拒识模型、非特定人语音识别的说话人自适应学习、语音识别词条的生成、语音提示各部分,具体包括以下步骤:A、语音信号的端点检测及语音识别参数提取:(1)语音信号通过计算机的声卡A/D进行采样成为原始数字语音信号;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)使用语音信号的短时能量、波形走势特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;(4)对分帧加窗处理后的语音信号进行语音(识别)特征提取;B、非特定人语音识别模型的预先训练:(1)预先采集大量的语音数据,建立训练语音数据库,采集的语音与要识别的语音的语言种类相一致;(2)从所说的数据库中的语音信号提取语音特征参数,然后在PC机上通过预先的学习过程将这些特征参数转变成识别模型的参数;识别模型采用基于音素隐含码尔科夫模型(Hidden Markov Model,HMM),训练的方法是根据最大似然准则,对HMM模型参数(包均值与方差)进行估值;C、非特定人语音识别:(1)将所说的语音特征与语音识别模型进行模式匹配,通过N-best维特比(Viterbi)帧同步束搜索算法,实时地提取前三选最好识别结果 (speech recognition circuit, distance results) ,在识别搜索过程中保留了所有有用“关键词”信息,不需要再进行回溯;(2)输入语音信息,每校核一条该语音信息,就自动剪掉该词条对应的语音发音模板,减少搜索空间,以提高校核过程的语音识别速度与识别精度。识别过程的语言模型采用基于多子树三元词对文法;D、语音识别置信测度与拒识模型:在维特比(Viterbi)帧同步束搜索过程中结合置信测度与拒识模型的计算。通过判定识别语音的置信度的大小,确定是否接受或拒识该语音识别结果,同时拒掉在操作过程的无关语音;E、非特定人语音识别的说话人自适应学习;采用说话人自适应方法对识别模型进行调整;所说的自适应方法采用最大后验概率方法,通过迭代方法逐步修正识别模板参数;F、语音识别词条的生成:根据需要校核的数据文本信息,借助发音字典自动生成要识别词条的语音发音模板。输入的语音信息与这些发音模板信息通过前面的非特定人语音识别进行比较;发音字典由识别词汇汉字与对应的汉语拼音构成,预先存放在计算机中 (calculating circuit, calculating means, feature calculation, distance calculation, calculating distances, computing extra front frames) ;G、语音提示:采用语音合成技术进行语音提示,语音合成模型参数分析提取过程在计算机上通过预先处理后完成,并存储在计算机的硬盘中用于语音合成,语音合成模型使用码激励语音编码模型;语音提示用于回放识别结果,若回放语音与输入语音一致,则表示识别结果正确;若不一致,则要求使用者读入语音命令,重新进行该语音命令的识别。

US7979277B2
CLAIM 12
. The speech recognition circuit (识别结果) of claim 1 , wherein the audio front end is configured to input a digital audio signal (音命令) .
CN1293428A
CLAIM 1
. 本发明提出的一种基于语音识别的信息校核方法,包括语音信号的端点检测及语音识别参数提取、非特定人语音识别模型的预先训练、非特定人语音识别、语音识别置信测度与拒识模型、语音识别置信测度与拒识模型、非特定人语音识别的说话人自适应学习、语音识别词条的生成、语音提示各部分,具体包括以下步骤:A、语音信号的端点检测及语音识别参数提取:(1)语音信号通过计算机的声卡A/D进行采样成为原始数字语音信号;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)使用语音信号的短时能量、波形走势特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;(4)对分帧加窗处理后的语音信号进行语音(识别)特征提取;B、非特定人语音识别模型的预先训练:(1)预先采集大量的语音数据,建立训练语音数据库,采集的语音与要识别的语音的语言种类相一致;(2)从所说的数据库中的语音信号提取语音特征参数,然后在PC机上通过预先的学习过程将这些特征参数转变成识别模型的参数;识别模型采用基于音素隐含码尔科夫模型(Hidden Markov Model,HMM),训练的方法是根据最大似然准则,对HMM模型参数(包均值与方差)进行估值;C、非特定人语音识别:(1)将所说的语音特征与语音识别模型进行模式匹配,通过N-best维特比(Viterbi)帧同步束搜索算法,实时地提取前三选最好识别结果 (speech recognition circuit, distance results) ,在识别搜索过程中保留了所有有用“关键词”信息,不需要再进行回溯;(2)输入语音信息,每校核一条该语音信息,就自动剪掉该词条对应的语音发音模板,减少搜索空间,以提高校核过程的语音识别速度与识别精度。识别过程的语言模型采用基于多子树三元词对文法;D、语音识别置信测度与拒识模型:在维特比(Viterbi)帧同步束搜索过程中结合置信测度与拒识模型的计算。通过判定识别语音的置信度的大小,确定是否接受或拒识该语音识别结果,同时拒掉在操作过程的无关语音;E、非特定人语音识别的说话人自适应学习;采用说话人自适应方法对识别模型进行调整;所说的自适应方法采用最大后验概率方法,通过迭代方法逐步修正识别模板参数;F、语音识别词条的生成:根据需要校核的数据文本信息,借助发音字典自动生成要识别词条的语音发音模板。输入的语音信息与这些发音模板信息通过前面的非特定人语音识别进行比较;发音字典由识别词汇汉字与对应的汉语拼音构成,预先存放在计算机中;G、语音提示:采用语音合成技术进行语音提示,语音合成模型参数分析提取过程在计算机上通过预先处理后完成,并存储在计算机的硬盘中用于语音合成,语音合成模型使用码激励语音编码模型;语音提示用于回放识别结果,若回放语音与输入语音一致,则表示识别结果正确;若不一致,则要求使用者读入语音命令 (audio signal, audio time frame) ,重新进行该语音命令的识别。

US7979277B2
CLAIM 13
. A speech recognition circuit (识别结果) of claim 1 , wherein said distance comprises a Mahalanobis distance .
CN1293428A
CLAIM 1
. 本发明提出的一种基于语音识别的信息校核方法,包括语音信号的端点检测及语音识别参数提取、非特定人语音识别模型的预先训练、非特定人语音识别、语音识别置信测度与拒识模型、语音识别置信测度与拒识模型、非特定人语音识别的说话人自适应学习、语音识别词条的生成、语音提示各部分,具体包括以下步骤:A、语音信号的端点检测及语音识别参数提取:(1)语音信号通过计算机的声卡A/D进行采样成为原始数字语音信号;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)使用语音信号的短时能量、波形走势特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;(4)对分帧加窗处理后的语音信号进行语音(识别)特征提取;B、非特定人语音识别模型的预先训练:(1)预先采集大量的语音数据,建立训练语音数据库,采集的语音与要识别的语音的语言种类相一致;(2)从所说的数据库中的语音信号提取语音特征参数,然后在PC机上通过预先的学习过程将这些特征参数转变成识别模型的参数;识别模型采用基于音素隐含码尔科夫模型(Hidden Markov Model,HMM),训练的方法是根据最大似然准则,对HMM模型参数(包均值与方差)进行估值;C、非特定人语音识别:(1)将所说的语音特征与语音识别模型进行模式匹配,通过N-best维特比(Viterbi)帧同步束搜索算法,实时地提取前三选最好识别结果 (speech recognition circuit, distance results) ,在识别搜索过程中保留了所有有用“关键词”信息,不需要再进行回溯;(2)输入语音信息,每校核一条该语音信息,就自动剪掉该词条对应的语音发音模板,减少搜索空间,以提高校核过程的语音识别速度与识别精度。识别过程的语言模型采用基于多子树三元词对文法;D、语音识别置信测度与拒识模型:在维特比(Viterbi)帧同步束搜索过程中结合置信测度与拒识模型的计算。通过判定识别语音的置信度的大小,确定是否接受或拒识该语音识别结果,同时拒掉在操作过程的无关语音;E、非特定人语音识别的说话人自适应学习;采用说话人自适应方法对识别模型进行调整;所说的自适应方法采用最大后验概率方法,通过迭代方法逐步修正识别模板参数;F、语音识别词条的生成:根据需要校核的数据文本信息,借助发音字典自动生成要识别词条的语音发音模板。输入的语音信息与这些发音模板信息通过前面的非特定人语音识别进行比较;发音字典由识别词汇汉字与对应的汉语拼音构成,预先存放在计算机中;G、语音提示:采用语音合成技术进行语音提示,语音合成模型参数分析提取过程在计算机上通过预先处理后完成,并存储在计算机的硬盘中用于语音合成,语音合成模型使用码激励语音编码模型;语音提示用于回放识别结果,若回放语音与输入语音一致,则表示识别结果正确;若不一致,则要求使用者读入语音命令,重新进行该语音命令的识别。

US7979277B2
CLAIM 14
. A speech recognition circuit (识别结果) , comprising : an audio front end for calculating a feature vector from an audio signal (音命令) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (音命令) ;

calculating means (计算机中) for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
CN1293428A
CLAIM 1
. 本发明提出的一种基于语音识别的信息校核方法,包括语音信号的端点检测及语音识别参数提取、非特定人语音识别模型的预先训练、非特定人语音识别、语音识别置信测度与拒识模型、语音识别置信测度与拒识模型、非特定人语音识别的说话人自适应学习、语音识别词条的生成、语音提示各部分,具体包括以下步骤:A、语音信号的端点检测及语音识别参数提取:(1)语音信号通过计算机的声卡A/D进行采样成为原始数字语音信号;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)使用语音信号的短时能量、波形走势特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;(4)对分帧加窗处理后的语音信号进行语音(识别)特征提取;B、非特定人语音识别模型的预先训练:(1)预先采集大量的语音数据,建立训练语音数据库,采集的语音与要识别的语音的语言种类相一致;(2)从所说的数据库中的语音信号提取语音特征参数,然后在PC机上通过预先的学习过程将这些特征参数转变成识别模型的参数;识别模型采用基于音素隐含码尔科夫模型(Hidden Markov Model,HMM),训练的方法是根据最大似然准则,对HMM模型参数(包均值与方差)进行估值;C、非特定人语音识别:(1)将所说的语音特征与语音识别模型进行模式匹配,通过N-best维特比(Viterbi)帧同步束搜索算法,实时地提取前三选最好识别结果 (speech recognition circuit, distance results) ,在识别搜索过程中保留了所有有用“关键词”信息,不需要再进行回溯;(2)输入语音信息,每校核一条该语音信息,就自动剪掉该词条对应的语音发音模板,减少搜索空间,以提高校核过程的语音识别速度与识别精度。识别过程的语言模型采用基于多子树三元词对文法;D、语音识别置信测度与拒识模型:在维特比(Viterbi)帧同步束搜索过程中结合置信测度与拒识模型的计算。通过判定识别语音的置信度的大小,确定是否接受或拒识该语音识别结果,同时拒掉在操作过程的无关语音;E、非特定人语音识别的说话人自适应学习;采用说话人自适应方法对识别模型进行调整;所说的自适应方法采用最大后验概率方法,通过迭代方法逐步修正识别模板参数;F、语音识别词条的生成:根据需要校核的数据文本信息,借助发音字典自动生成要识别词条的语音发音模板。输入的语音信息与这些发音模板信息通过前面的非特定人语音识别进行比较;发音字典由识别词汇汉字与对应的汉语拼音构成,预先存放在计算机中 (calculating circuit, calculating means, feature calculation, distance calculation, calculating distances, computing extra front frames) ;G、语音提示:采用语音合成技术进行语音提示,语音合成模型参数分析提取过程在计算机上通过预先处理后完成,并存储在计算机的硬盘中用于语音合成,语音合成模型使用码激励语音编码模型;语音提示用于回放识别结果,若回放语音与输入语音一致,则表示识别结果正确;若不一致,则要求使用者读入语音命令 (audio signal, audio time frame) ,重新进行该语音命令的识别。

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal (音命令) using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (音命令) ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit (计算机中) ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
CN1293428A
CLAIM 1
. 本发明提出的一种基于语音识别的信息校核方法,包括语音信号的端点检测及语音识别参数提取、非特定人语音识别模型的预先训练、非特定人语音识别、语音识别置信测度与拒识模型、语音识别置信测度与拒识模型、非特定人语音识别的说话人自适应学习、语音识别词条的生成、语音提示各部分,具体包括以下步骤:A、语音信号的端点检测及语音识别参数提取:(1)语音信号通过计算机的声卡A/D进行采样成为原始数字语音信号;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)使用语音信号的短时能量、波形走势特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;(4)对分帧加窗处理后的语音信号进行语音(识别)特征提取;B、非特定人语音识别模型的预先训练:(1)预先采集大量的语音数据,建立训练语音数据库,采集的语音与要识别的语音的语言种类相一致;(2)从所说的数据库中的语音信号提取语音特征参数,然后在PC机上通过预先的学习过程将这些特征参数转变成识别模型的参数;识别模型采用基于音素隐含码尔科夫模型(Hidden Markov Model,HMM),训练的方法是根据最大似然准则,对HMM模型参数(包均值与方差)进行估值;C、非特定人语音识别:(1)将所说的语音特征与语音识别模型进行模式匹配,通过N-best维特比(Viterbi)帧同步束搜索算法,实时地提取前三选最好识别结果,在识别搜索过程中保留了所有有用“关键词”信息,不需要再进行回溯;(2)输入语音信息,每校核一条该语音信息,就自动剪掉该词条对应的语音发音模板,减少搜索空间,以提高校核过程的语音识别速度与识别精度。识别过程的语言模型采用基于多子树三元词对文法;D、语音识别置信测度与拒识模型:在维特比(Viterbi)帧同步束搜索过程中结合置信测度与拒识模型的计算。通过判定识别语音的置信度的大小,确定是否接受或拒识该语音识别结果,同时拒掉在操作过程的无关语音;E、非特定人语音识别的说话人自适应学习;采用说话人自适应方法对识别模型进行调整;所说的自适应方法采用最大后验概率方法,通过迭代方法逐步修正识别模板参数;F、语音识别词条的生成:根据需要校核的数据文本信息,借助发音字典自动生成要识别词条的语音发音模板。输入的语音信息与这些发音模板信息通过前面的非特定人语音识别进行比较;发音字典由识别词汇汉字与对应的汉语拼音构成,预先存放在计算机中 (calculating circuit, calculating means, feature calculation, distance calculation, calculating distances, computing extra front frames) ;G、语音提示:采用语音合成技术进行语音提示,语音合成模型参数分析提取过程在计算机上通过预先处理后完成,并存储在计算机的硬盘中用于语音合成,语音合成模型使用码激励语音编码模型;语音提示用于回放识别结果,若回放语音与输入语音一致,则表示识别结果正确;若不一致,则要求使用者读入语音命令 (audio signal, audio time frame) ,重新进行该语音命令的识别。

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal (音命令) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (音命令) ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation (计算机中) , to the distance calculation (计算机中) , and to the word identification .
CN1293428A
CLAIM 1
. 本发明提出的一种基于语音识别的信息校核方法,包括语音信号的端点检测及语音识别参数提取、非特定人语音识别模型的预先训练、非特定人语音识别、语音识别置信测度与拒识模型、语音识别置信测度与拒识模型、非特定人语音识别的说话人自适应学习、语音识别词条的生成、语音提示各部分,具体包括以下步骤:A、语音信号的端点检测及语音识别参数提取:(1)语音信号通过计算机的声卡A/D进行采样成为原始数字语音信号;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)使用语音信号的短时能量、波形走势特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;(4)对分帧加窗处理后的语音信号进行语音(识别)特征提取;B、非特定人语音识别模型的预先训练:(1)预先采集大量的语音数据,建立训练语音数据库,采集的语音与要识别的语音的语言种类相一致;(2)从所说的数据库中的语音信号提取语音特征参数,然后在PC机上通过预先的学习过程将这些特征参数转变成识别模型的参数;识别模型采用基于音素隐含码尔科夫模型(Hidden Markov Model,HMM),训练的方法是根据最大似然准则,对HMM模型参数(包均值与方差)进行估值;C、非特定人语音识别:(1)将所说的语音特征与语音识别模型进行模式匹配,通过N-best维特比(Viterbi)帧同步束搜索算法,实时地提取前三选最好识别结果,在识别搜索过程中保留了所有有用“关键词”信息,不需要再进行回溯;(2)输入语音信息,每校核一条该语音信息,就自动剪掉该词条对应的语音发音模板,减少搜索空间,以提高校核过程的语音识别速度与识别精度。识别过程的语言模型采用基于多子树三元词对文法;D、语音识别置信测度与拒识模型:在维特比(Viterbi)帧同步束搜索过程中结合置信测度与拒识模型的计算。通过判定识别语音的置信度的大小,确定是否接受或拒识该语音识别结果,同时拒掉在操作过程的无关语音;E、非特定人语音识别的说话人自适应学习;采用说话人自适应方法对识别模型进行调整;所说的自适应方法采用最大后验概率方法,通过迭代方法逐步修正识别模板参数;F、语音识别词条的生成:根据需要校核的数据文本信息,借助发音字典自动生成要识别词条的语音发音模板。输入的语音信息与这些发音模板信息通过前面的非特定人语音识别进行比较;发音字典由识别词汇汉字与对应的汉语拼音构成,预先存放在计算机中 (calculating circuit, calculating means, feature calculation, distance calculation, calculating distances, computing extra front frames) ;G、语音提示:采用语音合成技术进行语音提示,语音合成模型参数分析提取过程在计算机上通过预先处理后完成,并存储在计算机的硬盘中用于语音合成,语音合成模型使用码激励语音编码模型;语音提示用于回放识别结果,若回放语音与输入语音一致,则表示识别结果正确;若不一致,则要求使用者读入语音命令 (audio signal, audio time frame) ,重新进行该语音命令的识别。




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
EP1100073A2

Filed: 2000-11-08     Issued: 2001-05-16

Classifying audio signals for later data retrieval

(Original Assignee) Sony Corp     (Current Assignee) Sony Corp

Mototsugu Sony Corporation Abe, Masayuki Sony Corporation Nishiguchi
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit (generating means) for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
EP1100073A2
CLAIM 25
An apparatus for generating descriptors comprising : a blocking means for dividing an input signal into blocks having a predetermined time length ;
a feature extracting means for extracting one or more than one characteristic quantities of a signal attribute from the signal of each block ;
a categorical classifying means for classifying the signal of each block into a category according to the characteristic quantities thereof ;
and a descriptor generating means (calculating circuit) for generating a descriptor for the signal according to the category of classification thereof .

US7979277B2
CLAIM 3
. A speech recognition circuit as claimed in claim 1 , comprising dynamic scheduling whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results (n points) and/or availability of space for storing more feature vectors and/or distance results .
EP1100073A2
CLAIM 41
The method for retrieving signals according to claim 33 , wherein points (distance results) of changes of the signal are detected by using the descriptor reflecting or corresponding to the result of said categorical classification .

US7979277B2
CLAIM 5
. A speech recognition circuit as claimed in claim 1 , wherein the said calculating circuit (generating means) is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
EP1100073A2
CLAIM 25
An apparatus for generating descriptors comprising : a blocking means for dividing an input signal into blocks having a predetermined time length ;
a feature extracting means for extracting one or more than one characteristic quantities of a signal attribute from the signal of each block ;
a categorical classifying means for classifying the signal of each block into a category according to the characteristic quantities thereof ;
and a descriptor generating means (calculating circuit) for generating a descriptor for the signal according to the category of classification thereof .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit (generating means) ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
EP1100073A2
CLAIM 25
An apparatus for generating descriptors comprising : a blocking means for dividing an input signal into blocks having a predetermined time length ;
a feature extracting means for extracting one or more than one characteristic quantities of a signal attribute from the signal of each block ;
a categorical classifying means for classifying the signal of each block into a category according to the characteristic quantities thereof ;
and a descriptor generating means (calculating circuit) for generating a descriptor for the signal according to the category of classification thereof .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
EP1189202A1

Filed: 2000-09-18     Issued: 2002-03-20

Duration models for speech recognition

(Original Assignee) Sony International Europe GmbH     (Current Assignee) Sony Deutschland GmbH

Krzysztof Advanced Technology Center Marasek, Silke Advanced Technology Center Goronzy, Ralf Advanced Technology Center Kompe
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector (feature vector) from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances (following steps) indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
EP1189202A1
CLAIM 5
Method for recognizing speech according to claim 4 , wherein each of said additional or non-duration related features is in each case chosen from the following set of possible features : a feature direction (feature direction) , which is defined to be 1 if a best word , speechelement or the like is found in a forward beam and which is defined to be 0 if it is found in the backward beam , a total number (T , n frames) of frames being contained in the utterance (U , W) , a total number (P , n phones) of phones , speech elements or the like being contained in the utterance (U , W) , a total number (T nosil , n frames nosil) of frames being contained in the utterance (U , W) without silence , an acoustic score (first score) for a first-best hypothesis (W 1) , i . e . : a difference (first second) in acoustic scores (first score , second score) between first- (W 1) and second-best hypotheses (W 2) , i . e . : first second : = p(W 1 |A) - p(W 2 |A) , a normalized difference (first second l) in acoustic scores (first score , second score) between first- (W 1) and second-best hypotheses (W 2) being normalized by the number (T) of frames being contained in said utterance (U , W) , i . e . : first second l : = first second / T , a normalized difference (first second f) in acoustic scores (first score , second score) between first- (W 1) and second-best hypotheses (W 2) being normalized by an acoustic score (first score) for a first-best hypothesis (W 1) , i . e . : first second f : = first second / first score , an average acoustic score (avg) for N-best hypotheses (W 1 , . . . , W N) , i . e . : a normalized average acoustic score (avg l) for N-best hypotheses (W 1 , . . . , W N) being normalized by a number (T) of frames being conteined in said utterance (U , W) , i . e . : avg l : = avg / T a normalized average acoustic score (avg f) for N-best hypotheses (W 1 , . . . , W N) being normalized by an acoustic score (first score) for the first-best hypothesis (W 1) , i . e . : avg f : = avg / first score , a difference (first avg) between an acoustic score (first score) for a first-best hypothesis (W 1) and an average acosutic score (avg) , i . e . : a normalized difference (first avg l) between an acoustic score (first score) for a first-best hypothesis (W1) and an average acosutic score (avg) being normalized by a number (T) of frames being contained in said utterance (U , W) , i . e . : first avg l : = first avg / T , a normalized difference (first avg f) between an acoustic score (first score) for a first-best hypothesis (W 1) and an average acosutic score (avg) being normalized by said acoustic score (first score) for said first-best hypothesis (W 1) . i . e . : first avg f : = first avg / first score , a difference (first last) between acoustic scores (first score , last score) for a first-best hypothesis (W 1) and for a last-best hypothesis (W N) , i . e . : first last : = first score - last score = p(W 1 |A) - p(W N |A) , a normalized difference (first last 1) between acoustic scores (first score , last score) for a first-best hypothesis (W 1) and for a last-best hypothesis (W N) being normalized by a number (T) of frames being contained in said utterance (U , W) , i . e . : first last l : = first last / T , a normalized difference (first last f) between acoustic scores (first score , last score) for a first-best hypothesis (W 1) and for a last-best hypothesis (W N) being normalized by said acoustic score (first score) for said first-best hypothesis (W 1) , i . e . : first last f : = first last / first score , a normalized score difference (first beambest) for all frames being contained in said utterance(U , W) being normalized by the number (T nosil) of frames being contained in said utterance (U , W) without silence , i . e . : where o t in each case is a features vector for the t th frame in the utterance (U , W) with t running from 1 to T , b in each case is a Gaussian probability density function (pbdf) and o OSSt denotes a feature vector (feature vector) for the t th frame in the utterance (U , W) found for the a state belonging to an optimal state sequence (OSS) , a number (first beambest zeros) of frames being contained in said utterance (U , W) for which a score difference is zero , a largest continuous difference (first beambest largest) in said normalized score difference (first beambest) , a best possible score in a beam , i . e . : a score difference (first best) for all frames being contained in said utterance(U , W) taking into account respective transition probabilities (aij) , i . e . : where o t in each case is a features vector for the t th frame in the utterance (U , W) with t running from 1 to T , b in each case is a Gaussian probability density function (pbdf) and o osst denotes a feature vector for the t th frame in the utterance (U , W) found for the a state belonging to an optimal state sequence (OSS) , a worst phone score (worst phonescore) in the first-best hypothesis (W 1) except for silence , i . e . : a normalized worst phone score (worst phonescore b) in the first-best hypothesis (W 1) except for silence being normalized by a best possible phone score (p best) in a beam , i . e . : worst phonescore b : = worst phonescore / p best , where an average phone score (avg phonescore) in the best hypothesis , except for silence , i . e . : a change (stdev phonescore) of a score within one phone , i . e . : a difference (worst phone best) between a best possible phone score (p best) and a worst phone score (worst phonescore) for a best hypothesis , i . e . : worst phone best : = p best - worst phonescore a worst frame score (worst frame score) for all frames in a best hypothesis (W 1) , i . e . : a normalized worst frame score (worst frame score b) for all frames in a best hypothesis (W 1) being normalized by a best possible frome score (p t , best) in a beam , i . e . : worst frame score b : = worst frame score / p t , best , where a sum (best in beam) of the differences between best frame scores for the hypotheses in the beam and for the best hypothesis , i . e . : a normalized sum (best in beam I) of the differences between best frame scores for the hypotheses in the beam and for the best hypothesis being normalized by a number (T) of frames being contained in said utterance (U , W) , i . e . : best in beam I : = best in beam / T an average acoustic score (avg framescore) for all frames in a best hypothesis , except for silence , i . e . : a signal-to-noise ratio (snr) , particularly computed on C 0 , or the like .

EP1189202A1
CLAIM 9
Method for recognizing speech according to anyone of the preceding claims , wherein further the following steps (calculating means, speech recognition method, calculating distances) are comprised : a) receiving a speech phrase (SP) , b) generating a representing signal (RS) which is representative for said speech phrase (SP) , c) determining from said representing signal (RS) at least a first sequence of speech elements (H1 , . . . , H6) out of a set of given possible speech elements (S1 , . . . , Sn , . . .) , d) generating for each speech elements (H1 , . . . , H6) , and/or for subsets/sub-sequences of said speech elements (H1 , . . . , H6) a duration measure (DS1 , . . . , DS6) in particular for constructing said duration related features (f j) , e) generating and/or outputting at least a first sequence of words from said speech elements (H1 , . . . , H6) which most probably correspond to said speech phrase (SP) , and f) generating and/or outputting a confidence measure (CM) which is at least based on said duration measures (DS1 , . . . , DS6) and/or on said duration related features (f j) and which is representative for the probability of a correct recognition of said speech elements (H1 , . . . , H6) and/or said sequence of words , wherein each of said duration measures (DS1 , . . . , DS6) comprises information at least on whether or not said speech elements (H1 , . . . , H6) , said and/or sub-sets/sub-sequences of speech elements (H1 , . . . , H6) within said representing signal of (RS) are compatible to a given and pre-defined duration model .

US7979277B2
CLAIM 2
. A speech recognition circuit as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing (time alignment) on the first processor .
EP1189202A1
CLAIM 13
Method according to claim 4 , characterized in that said determination of the statistical distributions of the durations is based on a forced time alignment (search stage processing) process of said training data .

US7979277B2
CLAIM 6
. The speech recognition circuit of claim 1 , comprising control means (comprises information) adapted to implement frame dropping , to discard one or more audio time frames .
EP1189202A1
CLAIM 9
Method for recognizing speech according to anyone of the preceding claims , wherein further the following steps are comprised : a) receiving a speech phrase (SP) , b) generating a representing signal (RS) which is representative for said speech phrase (SP) , c) determining from said representing signal (RS) at least a first sequence of speech elements (H1 , . . . , H6) out of a set of given possible speech elements (S1 , . . . , Sn , . . . ) , d) generating for each speech elements (H1 , . . . , H6) , and/or for subsets/sub-sequences of said speech elements (H1 , . . . , H6) a duration measure (DS1 , . . . , DS6) in particular for constructing said duration related features (f j) , e) generating and/or outputting at least a first sequence of words from said speech elements (H1 , . . . , H6) which most probably correspond to said speech phrase (SP) , and f) generating and/or outputting a confidence measure (CM) which is at least based on said duration measures (DS1 , . . . , DS6) and/or on said duration related features (f j) and which is representative for the probability of a correct recognition of said speech elements (H1 , . . . , H6) and/or said sequence of words , wherein each of said duration measures (DS1 , . . . , DS6) comprises information (control means) at least on whether or not said speech elements (H1 , . . . , H6) , said and/or sub-sets/sub-sequences of speech elements (H1 , . . . , H6) within said representing signal of (RS) are compatible to a given and pre-defined duration model .

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector (feature vector) comprises a plurality of spectral components of an audio signal for a predetermined time frame .
EP1189202A1
CLAIM 5
Method for recognizing speech according to claim 4 , wherein each of said additional or non-duration related features is in each case chosen from the following set of possible features : a feature direction (feature direction) , which is defined to be 1 if a best word , speechelement or the like is found in a forward beam and which is defined to be 0 if it is found in the backward beam , a total number (T , n frames) of frames being contained in the utterance (U , W) , a total number (P , n phones) of phones , speech elements or the like being contained in the utterance (U , W) , a total number (T nosil , n frames nosil) of frames being contained in the utterance (U , W) without silence , an acoustic score (first score) for a first-best hypothesis (W 1) , i . e . : a difference (first second) in acoustic scores (first score , second score) between first- (W 1) and second-best hypotheses (W 2) , i . e . : first second : = p(W 1 |A) - p(W 2 |A) , a normalized difference (first second l) in acoustic scores (first score , second score) between first- (W 1) and second-best hypotheses (W 2) being normalized by the number (T) of frames being contained in said utterance (U , W) , i . e . : first second l : = first second / T , a normalized difference (first second f) in acoustic scores (first score , second score) between first- (W 1) and second-best hypotheses (W 2) being normalized by an acoustic score (first score) for a first-best hypothesis (W 1) , i . e . : first second f : = first second / first score , an average acoustic score (avg) for N-best hypotheses (W 1 , . . . , W N) , i . e . : a normalized average acoustic score (avg l) for N-best hypotheses (W 1 , . . . , W N) being normalized by a number (T) of frames being conteined in said utterance (U , W) , i . e . : avg l : = avg / T a normalized average acoustic score (avg f) for N-best hypotheses (W 1 , . . . , W N) being normalized by an acoustic score (first score) for the first-best hypothesis (W 1) , i . e . : avg f : = avg / first score , a difference (first avg) between an acoustic score (first score) for a first-best hypothesis (W 1) and an average acosutic score (avg) , i . e . : a normalized difference (first avg l) between an acoustic score (first score) for a first-best hypothesis (W1) and an average acosutic score (avg) being normalized by a number (T) of frames being contained in said utterance (U , W) , i . e . : first avg l : = first avg / T , a normalized difference (first avg f) between an acoustic score (first score) for a first-best hypothesis (W 1) and an average acosutic score (avg) being normalized by said acoustic score (first score) for said first-best hypothesis (W 1) . i . e . : first avg f : = first avg / first score , a difference (first last) between acoustic scores (first score , last score) for a first-best hypothesis (W 1) and for a last-best hypothesis (W N) , i . e . : first last : = first score - last score = p(W 1 |A) - p(W N |A) , a normalized difference (first last 1) between acoustic scores (first score , last score) for a first-best hypothesis (W 1) and for a last-best hypothesis (W N) being normalized by a number (T) of frames being contained in said utterance (U , W) , i . e . : first last l : = first last / T , a normalized difference (first last f) between acoustic scores (first score , last score) for a first-best hypothesis (W 1) and for a last-best hypothesis (W N) being normalized by said acoustic score (first score) for said first-best hypothesis (W 1) , i . e . : first last f : = first last / first score , a normalized score difference (first beambest) for all frames being contained in said utterance(U , W) being normalized by the number (T nosil) of frames being contained in said utterance (U , W) without silence , i . e . : where o t in each case is a features vector for the t th frame in the utterance (U , W) with t running from 1 to T , b in each case is a Gaussian probability density function (pbdf) and o OSSt denotes a feature vector (feature vector) for the t th frame in the utterance (U , W) found for the a state belonging to an optimal state sequence (OSS) , a number (first beambest zeros) of frames being contained in said utterance (U , W) for which a score difference is zero , a largest continuous difference (first beambest largest) in said normalized score difference (first beambest) , a best possible score in a beam , i . e . : a score difference (first best) for all frames being contained in said utterance(U , W) taking into account respective transition probabilities (aij) , i . e . : where o t in each case is a features vector for the t th frame in the utterance (U , W) with t running from 1 to T , b in each case is a Gaussian probability density function (pbdf) and o osst denotes a feature vector for the t th frame in the utterance (U , W) found for the a state belonging to an optimal state sequence (OSS) , a worst phone score (worst phonescore) in the first-best hypothesis (W 1) except for silence , i . e . : a normalized worst phone score (worst phonescore b) in the first-best hypothesis (W 1) except for silence being normalized by a best possible phone score (p best) in a beam , i . e . : worst phonescore b : = worst phonescore / p best , where an average phone score (avg phonescore) in the best hypothesis , except for silence , i . e . : a change (stdev phonescore) of a score within one phone , i . e . : a difference (worst phone best) between a best possible phone score (p best) and a worst phone score (worst phonescore) for a best hypothesis , i . e . : worst phone best : = p best - worst phonescore a worst frame score (worst frame score) for all frames in a best hypothesis (W 1) , i . e . : a normalized worst frame score (worst frame score b) for all frames in a best hypothesis (W 1) being normalized by a best possible frome score (p t , best) in a beam , i . e . : worst frame score b : = worst frame score / p t , best , where a sum (best in beam) of the differences between best frame scores for the hypotheses in the beam and for the best hypothesis , i . e . : a normalized sum (best in beam I) of the differences between best frame scores for the hypotheses in the beam and for the best hypothesis being normalized by a number (T) of frames being contained in said utterance (U , W) , i . e . : best in beam I : = best in beam / T an average acoustic score (avg framescore) for all frames in a best hypothesis , except for silence , i . e . : a signal-to-noise ratio (snr) , particularly computed on C 0 , or the like .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector (feature vector) from the front end .
EP1189202A1
CLAIM 5
Method for recognizing speech according to claim 4 , wherein each of said additional or non-duration related features is in each case chosen from the following set of possible features : a feature direction (feature direction) , which is defined to be 1 if a best word , speechelement or the like is found in a forward beam and which is defined to be 0 if it is found in the backward beam , a total number (T , n frames) of frames being contained in the utterance (U , W) , a total number (P , n phones) of phones , speech elements or the like being contained in the utterance (U , W) , a total number (T nosil , n frames nosil) of frames being contained in the utterance (U , W) without silence , an acoustic score (first score) for a first-best hypothesis (W 1) , i . e . : a difference (first second) in acoustic scores (first score , second score) between first- (W 1) and second-best hypotheses (W 2) , i . e . : first second : = p(W 1 |A) - p(W 2 |A) , a normalized difference (first second l) in acoustic scores (first score , second score) between first- (W 1) and second-best hypotheses (W 2) being normalized by the number (T) of frames being contained in said utterance (U , W) , i . e . : first second l : = first second / T , a normalized difference (first second f) in acoustic scores (first score , second score) between first- (W 1) and second-best hypotheses (W 2) being normalized by an acoustic score (first score) for a first-best hypothesis (W 1) , i . e . : first second f : = first second / first score , an average acoustic score (avg) for N-best hypotheses (W 1 , . . . , W N) , i . e . : a normalized average acoustic score (avg l) for N-best hypotheses (W 1 , . . . , W N) being normalized by a number (T) of frames being conteined in said utterance (U , W) , i . e . : avg l : = avg / T a normalized average acoustic score (avg f) for N-best hypotheses (W 1 , . . . , W N) being normalized by an acoustic score (first score) for the first-best hypothesis (W 1) , i . e . : avg f : = avg / first score , a difference (first avg) between an acoustic score (first score) for a first-best hypothesis (W 1) and an average acosutic score (avg) , i . e . : a normalized difference (first avg l) between an acoustic score (first score) for a first-best hypothesis (W1) and an average acosutic score (avg) being normalized by a number (T) of frames being contained in said utterance (U , W) , i . e . : first avg l : = first avg / T , a normalized difference (first avg f) between an acoustic score (first score) for a first-best hypothesis (W 1) and an average acosutic score (avg) being normalized by said acoustic score (first score) for said first-best hypothesis (W 1) . i . e . : first avg f : = first avg / first score , a difference (first last) between acoustic scores (first score , last score) for a first-best hypothesis (W 1) and for a last-best hypothesis (W N) , i . e . : first last : = first score - last score = p(W 1 |A) - p(W N |A) , a normalized difference (first last 1) between acoustic scores (first score , last score) for a first-best hypothesis (W 1) and for a last-best hypothesis (W N) being normalized by a number (T) of frames being contained in said utterance (U , W) , i . e . : first last l : = first last / T , a normalized difference (first last f) between acoustic scores (first score , last score) for a first-best hypothesis (W 1) and for a last-best hypothesis (W N) being normalized by said acoustic score (first score) for said first-best hypothesis (W 1) , i . e . : first last f : = first last / first score , a normalized score difference (first beambest) for all frames being contained in said utterance(U , W) being normalized by the number (T nosil) of frames being contained in said utterance (U , W) without silence , i . e . : where o t in each case is a features vector for the t th frame in the utterance (U , W) with t running from 1 to T , b in each case is a Gaussian probability density function (pbdf) and o OSSt denotes a feature vector (feature vector) for the t th frame in the utterance (U , W) found for the a state belonging to an optimal state sequence (OSS) , a number (first beambest zeros) of frames being contained in said utterance (U , W) for which a score difference is zero , a largest continuous difference (first beambest largest) in said normalized score difference (first beambest) , a best possible score in a beam , i . e . : a score difference (first best) for all frames being contained in said utterance(U , W) taking into account respective transition probabilities (aij) , i . e . : where o t in each case is a features vector for the t th frame in the utterance (U , W) with t running from 1 to T , b in each case is a Gaussian probability density function (pbdf) and o osst denotes a feature vector for the t th frame in the utterance (U , W) found for the a state belonging to an optimal state sequence (OSS) , a worst phone score (worst phonescore) in the first-best hypothesis (W 1) except for silence , i . e . : a normalized worst phone score (worst phonescore b) in the first-best hypothesis (W 1) except for silence being normalized by a best possible phone score (p best) in a beam , i . e . : worst phonescore b : = worst phonescore / p best , where an average phone score (avg phonescore) in the best hypothesis , except for silence , i . e . : a change (stdev phonescore) of a score within one phone , i . e . : a difference (worst phone best) between a best possible phone score (p best) and a worst phone score (worst phonescore) for a best hypothesis , i . e . : worst phone best : = p best - worst phonescore a worst frame score (worst frame score) for all frames in a best hypothesis (W 1) , i . e . : a normalized worst frame score (worst frame score b) for all frames in a best hypothesis (W 1) being normalized by a best possible frome score (p t , best) in a beam , i . e . : worst frame score b : = worst frame score / p t , best , where a sum (best in beam) of the differences between best frame scores for the hypotheses in the beam and for the best hypothesis , i . e . : a normalized sum (best in beam I) of the differences between best frame scores for the hypotheses in the beam and for the best hypothesis being normalized by a number (T) of frames being contained in said utterance (U , W) , i . e . : best in beam I : = best in beam / T an average acoustic score (avg framescore) for all frames in a best hypothesis , except for silence , i . e . : a signal-to-noise ratio (snr) , particularly computed on C 0 , or the like .

US7979277B2
CLAIM 10
. The speech recognition circuit of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory (said determination, speech signal) .
EP1189202A1
CLAIM 13
Method according to claim 4 , characterized in that said determination (result memory) of the statistical distributions of the durations is based on a forced time alignment process of said training data .

EP1189202A1
CLAIM 14
Method according to any of the claims 4 or 5 , characterized in that during the determination of the statistical distribution of the durations the training data are provided as an input speech signal (result memory) together with the correct transcription thereof and that then the recognizing process is forced recognize the correct transcription .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector (feature vector) from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means (following steps) for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
EP1189202A1
CLAIM 9
Method for recognizing speech according to anyone of the preceding claims , wherein further the following steps (calculating means, speech recognition method, calculating distances) are comprised : a) receiving a speech phrase (SP) , b) generating a representing signal (RS) which is representative for said speech phrase (SP) , c) determining from said representing signal (RS) at least a first sequence of speech elements (H1 , . . . , H6) out of a set of given possible speech elements (S1 , . . . , Sn , . . . ) , d) generating for each speech elements (H1 , . . . , H6) , and/or for subsets/sub-sequences of said speech elements (H1 , . . . , H6) a duration measure (DS1 , . . . , DS6) in particular for constructing said duration related features (f j) , e) generating and/or outputting at least a first sequence of words from said speech elements (H1 , . . . , H6) which most probably correspond to said speech phrase (SP) , and f) generating and/or outputting a confidence measure (CM) which is at least based on said duration measures (DS1 , . . . , DS6) and/or on said duration related features (f j) and which is representative for the probability of a correct recognition of said speech elements (H1 , . . . , H6) and/or said sequence of words , wherein each of said duration measures (DS1 , . . . , DS6) comprises information at least on whether or not said speech elements (H1 , . . . , H6) , said and/or sub-sets/sub-sequences of speech elements (H1 , . . . , H6) within said representing signal of (RS) are compatible to a given and pre-defined duration model .

US7979277B2
CLAIM 15
. A speech recognition method (following steps) , comprising : calculating a feature vector (feature vector) from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
EP1189202A1
CLAIM 9
Method for recognizing speech according to anyone of the preceding claims , wherein further the following steps (calculating means, speech recognition method, calculating distances) are comprised : a) receiving a speech phrase (SP) , b) generating a representing signal (RS) which is representative for said speech phrase (SP) , c) determining from said representing signal (RS) at least a first sequence of speech elements (H1 , . . . , H6) out of a set of given possible speech elements (S1 , . . . , Sn , . . . ) , d) generating for each speech elements (H1 , . . . , H6) , and/or for subsets/sub-sequences of said speech elements (H1 , . . . , H6) a duration measure (DS1 , . . . , DS6) in particular for constructing said duration related features (f j) , e) generating and/or outputting at least a first sequence of words from said speech elements (H1 , . . . , H6) which most probably correspond to said speech phrase (SP) , and f) generating and/or outputting a confidence measure (CM) which is at least based on said duration measures (DS1 , . . . , DS6) and/or on said duration related features (f j) and which is representative for the probability of a correct recognition of said speech elements (H1 , . . . , H6) and/or said sequence of words , wherein each of said duration measures (DS1 , . . . , DS6) comprises information at least on whether or not said speech elements (H1 , . . . , H6) , said and/or sub-sets/sub-sequences of speech elements (H1 , . . . , H6) within said representing signal of (RS) are compatible to a given and pre-defined duration model .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method (following steps) , the code comprising : code for controlling the processor to calculate a feature vector (feature vector) from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
EP1189202A1
CLAIM 9
Method for recognizing speech according to anyone of the preceding claims , wherein further the following steps (calculating means, speech recognition method, calculating distances) are comprised : a) receiving a speech phrase (SP) , b) generating a representing signal (RS) which is representative for said speech phrase (SP) , c) determining from said representing signal (RS) at least a first sequence of speech elements (H1 , . . . , H6) out of a set of given possible speech elements (S1 , . . . , Sn , . . . ) , d) generating for each speech elements (H1 , . . . , H6) , and/or for subsets/sub-sequences of said speech elements (H1 , . . . , H6) a duration measure (DS1 , . . . , DS6) in particular for constructing said duration related features (f j) , e) generating and/or outputting at least a first sequence of words from said speech elements (H1 , . . . , H6) which most probably correspond to said speech phrase (SP) , and f) generating and/or outputting a confidence measure (CM) which is at least based on said duration measures (DS1 , . . . , DS6) and/or on said duration related features (f j) and which is representative for the probability of a correct recognition of said speech elements (H1 , . . . , H6) and/or said sequence of words , wherein each of said duration measures (DS1 , . . . , DS6) comprises information at least on whether or not said speech elements (H1 , . . . , H6) , said and/or sub-sets/sub-sequences of speech elements (H1 , . . . , H6) within said representing signal of (RS) are compatible to a given and pre-defined duration model .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6718308B1

Filed: 2000-07-07     Issued: 2004-04-06

Media presentation system controlled by voice to text commands

(Original Assignee) Daniel L. Nolting     

Daniel L. Nolting
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor (first direction) , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US6718308B1
CLAIM 13
. A system utilizing a computer for enabling a user to vocally assemble and display a plurality of media searched and retrieved from variable external data media sources and manipulated based on preferences of said user , comprising : a voice recognition module working in conjunction with said computer for converting an inputted voice utterance into computer-readable text in the form of a plurality of search commands , manipulation commands , and navigation commands ;
filtering means for taking each of said search commands after such conversion by said voice-recognition means and identifying one of said external data media sources for committing and retrieving said media in the form of search results , said filtering means further comprising a first direction (first processor) al means for activating each of a plane search , a media capture , and a second directional means ;
a results file having a results counter ;
means for providing that said results file calculate an amount of said search results produced by said search command performed by said user for said media ;
means for creating a table for said search results ;
platform means for allowing said user to see a layout of said media after said search results are retrieved from said external data media sources ;
means for moving each of said search results independently out of said table onto said platform means ;
means for activating a juxtaposition process to allow each of said manipulation commands to be performed by said user on said search results after such removal out of said table onto said platform means ;
and , means for providing a mirror image of said media after said manipulation commands are performed as a preliminary view before an epic view projection of said media , wherein said epic view projection utilizes said navigation commands as part of a presentation program .

US7979277B2
CLAIM 2
. A speech recognition circuit as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor (first direction) .
US6718308B1
CLAIM 13
. A system utilizing a computer for enabling a user to vocally assemble and display a plurality of media searched and retrieved from variable external data media sources and manipulated based on preferences of said user , comprising : a voice recognition module working in conjunction with said computer for converting an inputted voice utterance into computer-readable text in the form of a plurality of search commands , manipulation commands , and navigation commands ;
filtering means for taking each of said search commands after such conversion by said voice-recognition means and identifying one of said external data media sources for committing and retrieving said media in the form of search results , said filtering means further comprising a first direction (first processor) al means for activating each of a plane search , a media capture , and a second directional means ;
a results file having a results counter ;
means for providing that said results file calculate an amount of said search results produced by said search command performed by said user for said media ;
means for creating a table for said search results ;
platform means for allowing said user to see a layout of said media after said search results are retrieved from said external data media sources ;
means for moving each of said search results independently out of said table onto said platform means ;
means for activating a juxtaposition process to allow each of said manipulation commands to be performed by said user on said search results after such removal out of said table onto said platform means ;
and , means for providing a mirror image of said media after said manipulation commands are performed as a preliminary view before an epic view projection of said media , wherein said epic view projection utilizes said navigation commands as part of a presentation program .

US7979277B2
CLAIM 3
. A speech recognition circuit as claimed in claim 1 , comprising dynamic scheduling whether the first processor (first direction) should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
US6718308B1
CLAIM 13
. A system utilizing a computer for enabling a user to vocally assemble and display a plurality of media searched and retrieved from variable external data media sources and manipulated based on preferences of said user , comprising : a voice recognition module working in conjunction with said computer for converting an inputted voice utterance into computer-readable text in the form of a plurality of search commands , manipulation commands , and navigation commands ;
filtering means for taking each of said search commands after such conversion by said voice-recognition means and identifying one of said external data media sources for committing and retrieving said media in the form of search results , said filtering means further comprising a first direction (first processor) al means for activating each of a plane search , a media capture , and a second directional means ;
a results file having a results counter ;
means for providing that said results file calculate an amount of said search results produced by said search command performed by said user for said media ;
means for creating a table for said search results ;
platform means for allowing said user to see a layout of said media after said search results are retrieved from said external data media sources ;
means for moving each of said search results independently out of said table onto said platform means ;
means for activating a juxtaposition process to allow each of said manipulation commands to be performed by said user on said search results after such removal out of said table onto said platform means ;
and , means for providing a mirror image of said media after said manipulation commands are performed as a preliminary view before an epic view projection of said media , wherein said epic view projection utilizes said navigation commands as part of a presentation program .

US7979277B2
CLAIM 4
. A speech recognition circuit as claimed in claim 1 , wherein the first processor (first direction) supports multi-threaded operation , and runs the search stage and front ends as separate threads .
US6718308B1
CLAIM 13
. A system utilizing a computer for enabling a user to vocally assemble and display a plurality of media searched and retrieved from variable external data media sources and manipulated based on preferences of said user , comprising : a voice recognition module working in conjunction with said computer for converting an inputted voice utterance into computer-readable text in the form of a plurality of search commands , manipulation commands , and navigation commands ;
filtering means for taking each of said search commands after such conversion by said voice-recognition means and identifying one of said external data media sources for committing and retrieving said media in the form of search results , said filtering means further comprising a first direction (first processor) al means for activating each of a plane search , a media capture , and a second directional means ;
a results file having a results counter ;
means for providing that said results file calculate an amount of said search results produced by said search command performed by said user for said media ;
means for creating a table for said search results ;
platform means for allowing said user to see a layout of said media after said search results are retrieved from said external data media sources ;
means for moving each of said search results independently out of said table onto said platform means ;
means for activating a juxtaposition process to allow each of said manipulation commands to be performed by said user on said search results after such removal out of said table onto said platform means ;
and , means for providing a mirror image of said media after said manipulation commands are performed as a preliminary view before an epic view projection of said media , wherein said epic view projection utilizes said navigation commands as part of a presentation program .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator (voice recognition) has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
US6718308B1
CLAIM 1
. A system utilizing a computer for enabling a user to vocally assemble , manipulate , and display a plurality of media searched and retrieved from variable external data media sources based on preferences of said user , comprising : a voice recognition (speech accelerator) module working in conjunction with said computer for converting an inputted voice utterance into computer-readable text in the form of a plurality of search commands , manipulation commands , and navigation commands ;
filtering means for taking said search commands after such conversion by said voice-recognition means and identifying one of said external data media sources for committing and retrieving said media ;
juxtapositioning means for preparing said media after such retrieving by said filtering means for local display ;
platform means whereon said media is vocally manipulated and organized based on each of said manipulation commands performed by said user after such preparing from said juxtapositioning means ;
and , means for providing a mirror image of said media after such manipulation and organization as a preliminary view before an epic view projection of said media wherein said epic view projection utilizes each of said navigation commands as part of a presentation program .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means (desired characteristic) for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US6718308B1
CLAIM 27
. The method of claim 26 , wherein for the step of filtering said search command , multiple directionals are simultaneously performing a series of pre-set scripts for a desired characteristic (calculating means) of said media .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6757718B1

Filed: 2000-06-30     Issued: 2004-06-29

Mobile navigation of network-based electronic information using spoken input

(Original Assignee) SRI International Inc     (Current Assignee) IPA TECHNOLOGIES INC.

Christine Halverson, Luc Julia, Dimitris Voutsas, Adam Cheyer
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit (cellular telephone) for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US6757718B1
CLAIM 5
. The method of claim 1 , wherein the data link includes a cellular telephone (calculating circuit) system .

US7979277B2
CLAIM 5
. A speech recognition circuit as claimed in claim 1 , wherein the said calculating circuit (cellular telephone) is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
US6757718B1
CLAIM 5
. The method of claim 1 , wherein the data link includes a cellular telephone (calculating circuit) system .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means (control device) for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US6757718B1
CLAIM 1
. A method for speech-based navigation of an electronic data source located at one or more network servers located remotely from a user , wherein a data link is established between a mobile information appliance of the user and the one or more network servers , comprising the steps of : (a) receiving a spoken request for desired information from the user utilizing the mobile information appliance of the user , wherein said mobile information appliance comprises a portable remote control device (calculating means) or a set-top box for a television ;
(b) rendering an interpretation of the spoken request ;
(c) constructing a navigation query based upon the interpretation ;
(d) utilizing the navigation query to select a portion of the electronic data source ;
and (e) transmitting the selected portion of the electronic data source from the network server to the mobile information appliance of the user .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit (cellular telephone) ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US6757718B1
CLAIM 5
. The method of claim 1 , wherein the data link includes a cellular telephone (calculating circuit) system .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6453252B1

Filed: 2000-05-15     Issued: 2002-09-17

Process for identifying audio content

(Original Assignee) Creative Technology Ltd     (Current Assignee) Creative Technology Ltd

Jean Laroche
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor (selected frequency) , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US6453252B1
CLAIM 1
. A method of identifying a digital audio signal by monitoring the content of the audio signal , said method comprising the acts of : selecting a set of frequency subbands of said audio signal , with each frequency having a selected frequency (second processor) range ;
for each subband , generating subband energy signal having a magnitude , in decibels (dB) , equal to signal energy in the subband ;
forming an energy flux signal for each subband having a magnitude equal to the difference between subband energy signals of neighboring frames ;
determining the magnitude of frequency components bins of the energy flux signal for each subband ;
forming a fingerprint comprising the magnitudes of the frequency component bins of the energy flux signal for all subbands ;
and comparing the fingerprint for the audio file to fingerprints in a database to identify the audio file .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature (desired frequency) vector from the front end .
US6453252B1
CLAIM 2
. The method of claim 1 where said step of generating a subband energy signal comprises the acts of : for each subband , filtering the audio signal to obtain a filtered signal (next feature vector) having only frequency components in the subband ;
and calculating the power of the filtered signal .

US6453252B1
CLAIM 4
. The method of claim 3 where said step of generating a subband energy signal comprises the acts of : dividing a segment of the signal into overlapping frames ;
for each frame , determining the magnitude of frequency bins at different frequencies ;
selecting a set of frequency subbands of a desired frequency (next feature) range ;
for each subband and each frame , summing the frequency bins of the frame located within the subband to form a subband energy signal having a magnitude expressed in decibels (dB) for the given frame .

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end is configured to input a digital audio signal (digital audio signal) .
US6453252B1
CLAIM 1
. A method of identifying a digital audio signal (digital audio signal) by monitoring the content of the audio signal , said method comprising the acts of : selecting a set of frequency subbands of said audio signal , with each frequency having a selected frequency range ;
for each subband , generating subband energy signal having a magnitude , in decibels (dB) , equal to signal energy in the subband ;
forming an energy flux signal for each subband having a magnitude equal to the difference between subband energy signals of neighboring frames ;
determining the magnitude of frequency components bins of the energy flux signal for each subband ;
forming a fingerprint comprising the magnitudes of the frequency component bins of the energy flux signal for all subbands ;
and comparing the fingerprint for the audio file to fingerprints in a database to identify the audio file .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
GB2358253A

Filed: 2000-05-10     Issued: 2001-07-18

Signal identification device using genetic algorithm and on-line identification system

(Original Assignee) KYUSHU KYOHAN Co Ltd     (Current Assignee) KYUSHU KYOHAN Co Ltd

Ho Jinyama, Toshio Toyota, Takashi Nishimura
US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means (calculating means) for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
GB2358253A
CLAIM 1
. An identification apparatus using a genetic algorithm , comprising : means for self-reorganizing primitive feature parameters , which is defill E d for identifying state of a signal measured by some kind of sensor , by a genetic algorit im!to automatically generate a new feature parameter as a GA feature parameter ;
calculating means (calculating means) for calculating a value of said generated GA feature para Metsr by the use of a measured signal ;
means for evaluating a suitability of said GA feature parameter subjected Itc s id calculation by a Distinction Index or a Similarity index , and if evaluated as a su Ja e feature parameter , applying it to an identification operation , or if not so , repet9i ig to define another primitive feature parameters for identifying said signal features or st Aes , ;
means for generating said GA feature parameters sequentially when a numt er)f states is to be identified ;
means for determining a possibility distribution function and a criteria for s d identification operation to said automatically generated GA feature parameter by a probability theory , Dempster & ;
Shafer probability theory , and possibility theory ;
means for identifying states represented by said signal features by the use of a GIA I identification device using said suitable GA feature parameter and said criteria ;
means for transferring said GA feature parameter and said possibility distrib4ticri function , which are generated by a separated computer , to said GA identification de vicl and identifying said signal features or states by the use of said GA identification ch wic e storing said GA feature parameter and said possibility distribution function ;
and means for online identifying said signal features or said states by the use o th i B signals measured by some kind of sensors and said GA feature parameters in an on-line identification operation .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification (said signal) .
GB2358253A
CLAIM 1
. An identification apparatus using a genetic algorithm , comprising : means for self-reorganizing primitive feature parameters , which is defill E d for identifying state of a signal measured by some kind of sensor , by a genetic algorit im!to automatically generate a new feature parameter as a GA feature parameter ;
calculating means for calculating a value of said generated GA feature para Metsr by the use of a measured signal ;
means for evaluating a suitability of said GA feature parameter subjected Itc s id calculation by a Distinction Index or a Similarity index , and if evaluated as a su Ja e feature parameter , applying it to an identification operation , or if not so , repet9i ig to define another primitive feature parameters for identifying said signal (word identification) features or st Aes , ;
means for generating said GA feature parameters sequentially when a numt er)f states is to be identified ;
means for determining a possibility distribution function and a criteria for s d identification operation to said automatically generated GA feature parameter by a probability theory , Dempster & ;
Shafer probability theory , and possibility theory ;
means for identifying states represented by said signal features by the use of a GIA I identification device using said suitable GA feature parameter and said criteria ;
means for transferring said GA feature parameter and said possibility distrib4ticri function , which are generated by a separated computer , to said GA identification de vicl and identifying said signal features or states by the use of said GA identification ch wic e storing said GA feature parameter and said possibility distribution function ;
and means for online identifying said signal features or said states by the use o th i B signals measured by some kind of sensors and said GA feature parameters in an on-line identification operation .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6629073B1

Filed: 2000-04-27     Issued: 2003-09-30

Speech recognition method and apparatus utilizing multi-unit models

(Original Assignee) Microsoft Corp     (Current Assignee) Microsoft Technology Licensing LLC

Hsiao-Wuen Hon, Kuansan Wang
US7979277B2
CLAIM 1
. A speech recognition circuit (recognizing speech) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US6629073B1
CLAIM 18
. A method of recognizing speech (speech recognition circuit, word identification) from a series of feature vectors representing a speech signal , the method comprising : determining a set of sub-unit probabilities describing the likelihood of feature vectors spanned by a set of individual acoustic sub-units , the set of individual acoustic sub-units spanning less than all of the feature vectors in the series of feature vectors ;
determining a large-unit probability describing the likelihood of feature vectors spanned by a large acoustic unit , the large acoustic unit representing a combination of at least two acoustic sub-units such that the large acoustic unit spans the feature vectors spanned by the combination of at least two acoustic sub-units ;
and identifying a most likely sequence of hypothesized words based on the sub-unit probabilities and the large-unit probability .

US7979277B2
CLAIM 2
. A speech recognition circuit (recognizing speech) as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor .
US6629073B1
CLAIM 18
. A method of recognizing speech (speech recognition circuit, word identification) from a series of feature vectors representing a speech signal , the method comprising : determining a set of sub-unit probabilities describing the likelihood of feature vectors spanned by a set of individual acoustic sub-units , the set of individual acoustic sub-units spanning less than all of the feature vectors in the series of feature vectors ;
determining a large-unit probability describing the likelihood of feature vectors spanned by a large acoustic unit , the large acoustic unit representing a combination of at least two acoustic sub-units such that the large acoustic unit spans the feature vectors spanned by the combination of at least two acoustic sub-units ;
and identifying a most likely sequence of hypothesized words based on the sub-unit probabilities and the large-unit probability .

US7979277B2
CLAIM 3
. A speech recognition circuit (recognizing speech) as claimed in claim 1 , comprising dynamic scheduling whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
US6629073B1
CLAIM 18
. A method of recognizing speech (speech recognition circuit, word identification) from a series of feature vectors representing a speech signal , the method comprising : determining a set of sub-unit probabilities describing the likelihood of feature vectors spanned by a set of individual acoustic sub-units , the set of individual acoustic sub-units spanning less than all of the feature vectors in the series of feature vectors ;
determining a large-unit probability describing the likelihood of feature vectors spanned by a large acoustic unit , the large acoustic unit representing a combination of at least two acoustic sub-units such that the large acoustic unit spans the feature vectors spanned by the combination of at least two acoustic sub-units ;
and identifying a most likely sequence of hypothesized words based on the sub-unit probabilities and the large-unit probability .

US7979277B2
CLAIM 4
. A speech recognition circuit (recognizing speech) as claimed in claim 1 , wherein the first processor supports multi-threaded operation , and runs the search stage and front ends as separate threads .
US6629073B1
CLAIM 18
. A method of recognizing speech (speech recognition circuit, word identification) from a series of feature vectors representing a speech signal , the method comprising : determining a set of sub-unit probabilities describing the likelihood of feature vectors spanned by a set of individual acoustic sub-units , the set of individual acoustic sub-units spanning less than all of the feature vectors in the series of feature vectors ;
determining a large-unit probability describing the likelihood of feature vectors spanned by a large acoustic unit , the large acoustic unit representing a combination of at least two acoustic sub-units such that the large acoustic unit spans the feature vectors spanned by the combination of at least two acoustic sub-units ;
and identifying a most likely sequence of hypothesized words based on the sub-unit probabilities and the large-unit probability .

US7979277B2
CLAIM 5
. A speech recognition circuit (recognizing speech) as claimed in claim 1 , wherein the said calculating circuit is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
US6629073B1
CLAIM 18
. A method of recognizing speech (speech recognition circuit, word identification) from a series of feature vectors representing a speech signal , the method comprising : determining a set of sub-unit probabilities describing the likelihood of feature vectors spanned by a set of individual acoustic sub-units , the set of individual acoustic sub-units spanning less than all of the feature vectors in the series of feature vectors ;
determining a large-unit probability describing the likelihood of feature vectors spanned by a large acoustic unit , the large acoustic unit representing a combination of at least two acoustic sub-units such that the large acoustic unit spans the feature vectors spanned by the combination of at least two acoustic sub-units ;
and identifying a most likely sequence of hypothesized words based on the sub-unit probabilities and the large-unit probability .

US7979277B2
CLAIM 6
. The speech recognition circuit (recognizing speech) of claim 1 , comprising control means adapted to implement frame dropping , to discard one or more audio time frames .
US6629073B1
CLAIM 18
. A method of recognizing speech (speech recognition circuit, word identification) from a series of feature vectors representing a speech signal , the method comprising : determining a set of sub-unit probabilities describing the likelihood of feature vectors spanned by a set of individual acoustic sub-units , the set of individual acoustic sub-units spanning less than all of the feature vectors in the series of feature vectors ;
determining a large-unit probability describing the likelihood of feature vectors spanned by a large acoustic unit , the large acoustic unit representing a combination of at least two acoustic sub-units such that the large acoustic unit spans the feature vectors spanned by the combination of at least two acoustic sub-units ;
and identifying a most likely sequence of hypothesized words based on the sub-unit probabilities and the large-unit probability .

US7979277B2
CLAIM 7
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal for a predetermined time frame .
US6629073B1
CLAIM 18
. A method of recognizing speech (speech recognition circuit, word identification) from a series of feature vectors representing a speech signal , the method comprising : determining a set of sub-unit probabilities describing the likelihood of feature vectors spanned by a set of individual acoustic sub-units , the set of individual acoustic sub-units spanning less than all of the feature vectors in the series of feature vectors ;
determining a large-unit probability describing the likelihood of feature vectors spanned by a large acoustic unit , the large acoustic unit representing a combination of at least two acoustic sub-units such that the large acoustic unit spans the feature vectors spanned by the combination of at least two acoustic sub-units ;
and identifying a most likely sequence of hypothesized words based on the sub-unit probabilities and the large-unit probability .

US7979277B2
CLAIM 8
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the processor is configured to divert to another task if the data flow stalls .
US6629073B1
CLAIM 18
. A method of recognizing speech (speech recognition circuit, word identification) from a series of feature vectors representing a speech signal , the method comprising : determining a set of sub-unit probabilities describing the likelihood of feature vectors spanned by a set of individual acoustic sub-units , the set of individual acoustic sub-units spanning less than all of the feature vectors in the series of feature vectors ;
determining a large-unit probability describing the likelihood of feature vectors spanned by a large acoustic unit , the large acoustic unit representing a combination of at least two acoustic sub-units such that the large acoustic unit spans the feature vectors spanned by the combination of at least two acoustic sub-units ;
and identifying a most likely sequence of hypothesized words based on the sub-unit probabilities and the large-unit probability .

US7979277B2
CLAIM 9
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
US6629073B1
CLAIM 18
. A method of recognizing speech (speech recognition circuit, word identification) from a series of feature vectors representing a speech signal , the method comprising : determining a set of sub-unit probabilities describing the likelihood of feature vectors spanned by a set of individual acoustic sub-units , the set of individual acoustic sub-units spanning less than all of the feature vectors in the series of feature vectors ;
determining a large-unit probability describing the likelihood of feature vectors spanned by a large acoustic unit , the large acoustic unit representing a combination of at least two acoustic sub-units such that the large acoustic unit spans the feature vectors spanned by the combination of at least two acoustic sub-units ;
and identifying a most likely sequence of hypothesized words based on the sub-unit probabilities and the large-unit probability .

US7979277B2
CLAIM 10
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory (speech signal) .
US6629073B1
CLAIM 18
. A method of recognizing speech (speech recognition circuit, word identification) from a series of feature vectors representing a speech signal (result memory) , the method comprising : determining a set of sub-unit probabilities describing the likelihood of feature vectors spanned by a set of individual acoustic sub-units , the set of individual acoustic sub-units spanning less than all of the feature vectors in the series of feature vectors ;
determining a large-unit probability describing the likelihood of feature vectors spanned by a large acoustic unit , the large acoustic unit representing a combination of at least two acoustic sub-units such that the large acoustic unit spans the feature vectors spanned by the combination of at least two acoustic sub-units ;
and identifying a most likely sequence of hypothesized words based on the sub-unit probabilities and the large-unit probability .

US7979277B2
CLAIM 11
. The speech recognition circuit (recognizing speech) of claim 1 , comprising increasing the pipeline depth by computing extra front frames in advance .
US6629073B1
CLAIM 18
. A method of recognizing speech (speech recognition circuit, word identification) from a series of feature vectors representing a speech signal , the method comprising : determining a set of sub-unit probabilities describing the likelihood of feature vectors spanned by a set of individual acoustic sub-units , the set of individual acoustic sub-units spanning less than all of the feature vectors in the series of feature vectors ;
determining a large-unit probability describing the likelihood of feature vectors spanned by a large acoustic unit , the large acoustic unit representing a combination of at least two acoustic sub-units such that the large acoustic unit spans the feature vectors spanned by the combination of at least two acoustic sub-units ;
and identifying a most likely sequence of hypothesized words based on the sub-unit probabilities and the large-unit probability .

US7979277B2
CLAIM 12
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the audio front end is configured to input a digital audio signal .
US6629073B1
CLAIM 18
. A method of recognizing speech (speech recognition circuit, word identification) from a series of feature vectors representing a speech signal , the method comprising : determining a set of sub-unit probabilities describing the likelihood of feature vectors spanned by a set of individual acoustic sub-units , the set of individual acoustic sub-units spanning less than all of the feature vectors in the series of feature vectors ;
determining a large-unit probability describing the likelihood of feature vectors spanned by a large acoustic unit , the large acoustic unit representing a combination of at least two acoustic sub-units such that the large acoustic unit spans the feature vectors spanned by the combination of at least two acoustic sub-units ;
and identifying a most likely sequence of hypothesized words based on the sub-unit probabilities and the large-unit probability .

US7979277B2
CLAIM 13
. A speech recognition circuit (recognizing speech) of claim 1 , wherein said distance comprises a Mahalanobis distance .
US6629073B1
CLAIM 18
. A method of recognizing speech (speech recognition circuit, word identification) from a series of feature vectors representing a speech signal , the method comprising : determining a set of sub-unit probabilities describing the likelihood of feature vectors spanned by a set of individual acoustic sub-units , the set of individual acoustic sub-units spanning less than all of the feature vectors in the series of feature vectors ;
determining a large-unit probability describing the likelihood of feature vectors spanned by a large acoustic unit , the large acoustic unit representing a combination of at least two acoustic sub-units such that the large acoustic unit spans the feature vectors spanned by the combination of at least two acoustic sub-units ;
and identifying a most likely sequence of hypothesized words based on the sub-unit probabilities and the large-unit probability .

US7979277B2
CLAIM 14
. A speech recognition circuit (recognizing speech) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means (training data) for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US6629073B1
CLAIM 8
. The speech recognition system of claim 1 wherein for a large acoustic unit that appears infrequently in a set of training data (calculating means) , the decoder bases the score on the individual sub-unit probabilities for the acoustic sub-units found in the large acoustic unit instead of a large-unit probability associated with the large acoustic unit .

US6629073B1
CLAIM 18
. A method of recognizing speech (speech recognition circuit, word identification) from a series of feature vectors representing a speech signal , the method comprising : determining a set of sub-unit probabilities describing the likelihood of feature vectors spanned by a set of individual acoustic sub-units , the set of individual acoustic sub-units spanning less than all of the feature vectors in the series of feature vectors ;
determining a large-unit probability describing the likelihood of feature vectors spanned by a large acoustic unit , the large acoustic unit representing a combination of at least two acoustic sub-units such that the large acoustic unit spans the feature vectors spanned by the combination of at least two acoustic sub-units ;
and identifying a most likely sequence of hypothesized words based on the sub-unit probabilities and the large-unit probability .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification (recognizing speech) .
US6629073B1
CLAIM 18
. A method of recognizing speech (speech recognition circuit, word identification) from a series of feature vectors representing a speech signal , the method comprising : determining a set of sub-unit probabilities describing the likelihood of feature vectors spanned by a set of individual acoustic sub-units , the set of individual acoustic sub-units spanning less than all of the feature vectors in the series of feature vectors ;
determining a large-unit probability describing the likelihood of feature vectors spanned by a large acoustic unit , the large acoustic unit representing a combination of at least two acoustic sub-units such that the large acoustic unit spans the feature vectors spanned by the combination of at least two acoustic sub-units ;
and identifying a most likely sequence of hypothesized words based on the sub-unit probabilities and the large-unit probability .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
CN1268732A

Filed: 2000-03-31     Issued: 2000-10-04

基于语音识别专用芯片的特定人语音识别、语音回放方法

(Original Assignee) Tsinghua University     (Current Assignee) Tsinghua University ; Qinghua University ; QINGHUA UNIV

刘加, 李晓宇, 史缓缓, 刘润生
US7979277B2
CLAIM 1
. A speech recognition circuit (识别结果, 作为结果) , comprising : an audio front end for calculating a feature vector from an audio signal (音命令) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (音命令) ;

a calculating circuit (数计算) for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
CN1268732A
CLAIM 1
. 一种基于语音识别专用芯片的特定人语音识别、语音回放方法,包括A/D采样,频谱整形加窗预加重处理,特征参数提取,端点检测,语音识别模板训练及语音回放或语音识别模板匹配,将最好的识别结果 (speech recognition circuit, distance results, result memory) 输出回放,其特征在于,具体包括以下步骤:A、语音识别参数提取:(1)语音信号输入后采用A/D进行采样,成为原始的数字语音,利用电平增益控制,以确保采样的高精度;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)对分帧语音的特征进行语音特征提取,主要特征为根据语音的线性预测模型(LPC)计算语音特征的倒谱系数(LPCC),并存储用于后面的动态分段和模板提取步骤中;(4)使用语音信号的过零率与短时能量特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;B、特定人语音命令 (audio signal, audio time frame) 的训练:(1)对提取的语音特征进行动态分段和加权平均,构成模板参数,加权后的参数作为新识别模板;(2)对该新模板进行鉴别特性分析处理,确保新模板和以前训练构成的模板之间具有很好的可区分性;(3)对处理后,区分性不好的语音,则提示要求说话人重新输入新的语音信号;C、特定人语音命令的识别:(1)特定人语音识别过程头四步与所说的“语音识别参数提取”过程相同;(2)将该语音特征同已存储的识别模板进行比较,采用动态匹配,提取其中最匹配的语音命令作为结果 (speech recognition circuit, distance results, result memory) 输出;(3)在识别过程中,当识别模板匹配误差大于一定门限或可信度很低时,则认为识别结果不可靠,通过提示,要求重新输入语音。D . 语音回放:语音回放方法采用语音合成技术,将所说的语音识别参数与语音合成模型参数进行共享,将语音识别参数同时也作为语音合成模型参数,以尽可能减小系统的开销。

CN1268732A
CLAIM 3
. 如权利要求1所述的特定人语音识别、语音回放方法,其特征在于,所说的语音命令的训练中的动态分段和加权平均方法,具体包括以下步骤:(1)首先根据语音特征参数计算 (calculating circuit) 语音不同帧间参数的变化,当变化超过某一设定阈值,确定该帧为语音特征中重要分界点;(2)对不同语音信号其分界点的个数可以不同;对不同分界点之间的语音特征进行加权平均,提高重要语音特征在识别模型中的比重。

US7979277B2
CLAIM 2
. A speech recognition circuit (识别结果, 作为结果) as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor .
CN1268732A
CLAIM 1
. 一种基于语音识别专用芯片的特定人语音识别、语音回放方法,包括A/D采样,频谱整形加窗预加重处理,特征参数提取,端点检测,语音识别模板训练及语音回放或语音识别模板匹配,将最好的识别结果 (speech recognition circuit, distance results, result memory) 输出回放,其特征在于,具体包括以下步骤:A、语音识别参数提取:(1)语音信号输入后采用A/D进行采样,成为原始的数字语音,利用电平增益控制,以确保采样的高精度;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)对分帧语音的特征进行语音特征提取,主要特征为根据语音的线性预测模型(LPC)计算语音特征的倒谱系数(LPCC),并存储用于后面的动态分段和模板提取步骤中;(4)使用语音信号的过零率与短时能量特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;B、特定人语音命令的训练:(1)对提取的语音特征进行动态分段和加权平均,构成模板参数,加权后的参数作为新识别模板;(2)对该新模板进行鉴别特性分析处理,确保新模板和以前训练构成的模板之间具有很好的可区分性;(3)对处理后,区分性不好的语音,则提示要求说话人重新输入新的语音信号;C、特定人语音命令的识别:(1)特定人语音识别过程头四步与所说的“语音识别参数提取”过程相同;(2)将该语音特征同已存储的识别模板进行比较,采用动态匹配,提取其中最匹配的语音命令作为结果 (speech recognition circuit, distance results, result memory) 输出;(3)在识别过程中,当识别模板匹配误差大于一定门限或可信度很低时,则认为识别结果不可靠,通过提示,要求重新输入语音。D . 语音回放:语音回放方法采用语音合成技术,将所说的语音识别参数与语音合成模型参数进行共享,将语音识别参数同时也作为语音合成模型参数,以尽可能减小系统的开销。

US7979277B2
CLAIM 3
. A speech recognition circuit (识别结果, 作为结果) as claimed in claim 1 , comprising dynamic scheduling whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results (识别结果, 作为结果) and/or availability of space for storing more feature vectors and/or distance results .
CN1268732A
CLAIM 1
. 一种基于语音识别专用芯片的特定人语音识别、语音回放方法,包括A/D采样,频谱整形加窗预加重处理,特征参数提取,端点检测,语音识别模板训练及语音回放或语音识别模板匹配,将最好的识别结果 (speech recognition circuit, distance results, result memory) 输出回放,其特征在于,具体包括以下步骤:A、语音识别参数提取:(1)语音信号输入后采用A/D进行采样,成为原始的数字语音,利用电平增益控制,以确保采样的高精度;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)对分帧语音的特征进行语音特征提取,主要特征为根据语音的线性预测模型(LPC)计算语音特征的倒谱系数(LPCC),并存储用于后面的动态分段和模板提取步骤中;(4)使用语音信号的过零率与短时能量特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;B、特定人语音命令的训练:(1)对提取的语音特征进行动态分段和加权平均,构成模板参数,加权后的参数作为新识别模板;(2)对该新模板进行鉴别特性分析处理,确保新模板和以前训练构成的模板之间具有很好的可区分性;(3)对处理后,区分性不好的语音,则提示要求说话人重新输入新的语音信号;C、特定人语音命令的识别:(1)特定人语音识别过程头四步与所说的“语音识别参数提取”过程相同;(2)将该语音特征同已存储的识别模板进行比较,采用动态匹配,提取其中最匹配的语音命令作为结果 (speech recognition circuit, distance results, result memory) 输出;(3)在识别过程中,当识别模板匹配误差大于一定门限或可信度很低时,则认为识别结果不可靠,通过提示,要求重新输入语音。D . 语音回放:语音回放方法采用语音合成技术,将所说的语音识别参数与语音合成模型参数进行共享,将语音识别参数同时也作为语音合成模型参数,以尽可能减小系统的开销。

US7979277B2
CLAIM 4
. A speech recognition circuit (识别结果, 作为结果) as claimed in claim 1 , wherein the first processor supports multi-threaded operation , and runs the search stage and front ends as separate threads .
CN1268732A
CLAIM 1
. 一种基于语音识别专用芯片的特定人语音识别、语音回放方法,包括A/D采样,频谱整形加窗预加重处理,特征参数提取,端点检测,语音识别模板训练及语音回放或语音识别模板匹配,将最好的识别结果 (speech recognition circuit, distance results, result memory) 输出回放,其特征在于,具体包括以下步骤:A、语音识别参数提取:(1)语音信号输入后采用A/D进行采样,成为原始的数字语音,利用电平增益控制,以确保采样的高精度;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)对分帧语音的特征进行语音特征提取,主要特征为根据语音的线性预测模型(LPC)计算语音特征的倒谱系数(LPCC),并存储用于后面的动态分段和模板提取步骤中;(4)使用语音信号的过零率与短时能量特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;B、特定人语音命令的训练:(1)对提取的语音特征进行动态分段和加权平均,构成模板参数,加权后的参数作为新识别模板;(2)对该新模板进行鉴别特性分析处理,确保新模板和以前训练构成的模板之间具有很好的可区分性;(3)对处理后,区分性不好的语音,则提示要求说话人重新输入新的语音信号;C、特定人语音命令的识别:(1)特定人语音识别过程头四步与所说的“语音识别参数提取”过程相同;(2)将该语音特征同已存储的识别模板进行比较,采用动态匹配,提取其中最匹配的语音命令作为结果 (speech recognition circuit, distance results, result memory) 输出;(3)在识别过程中,当识别模板匹配误差大于一定门限或可信度很低时,则认为识别结果不可靠,通过提示,要求重新输入语音。D . 语音回放:语音回放方法采用语音合成技术,将所说的语音识别参数与语音合成模型参数进行共享,将语音识别参数同时也作为语音合成模型参数,以尽可能减小系统的开销。

US7979277B2
CLAIM 5
. A speech recognition circuit (识别结果, 作为结果) as claimed in claim 1 , wherein the said calculating circuit (数计算) is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
CN1268732A
CLAIM 1
. 一种基于语音识别专用芯片的特定人语音识别、语音回放方法,包括A/D采样,频谱整形加窗预加重处理,特征参数提取,端点检测,语音识别模板训练及语音回放或语音识别模板匹配,将最好的识别结果 (speech recognition circuit, distance results, result memory) 输出回放,其特征在于,具体包括以下步骤:A、语音识别参数提取:(1)语音信号输入后采用A/D进行采样,成为原始的数字语音,利用电平增益控制,以确保采样的高精度;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)对分帧语音的特征进行语音特征提取,主要特征为根据语音的线性预测模型(LPC)计算语音特征的倒谱系数(LPCC),并存储用于后面的动态分段和模板提取步骤中;(4)使用语音信号的过零率与短时能量特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;B、特定人语音命令的训练:(1)对提取的语音特征进行动态分段和加权平均,构成模板参数,加权后的参数作为新识别模板;(2)对该新模板进行鉴别特性分析处理,确保新模板和以前训练构成的模板之间具有很好的可区分性;(3)对处理后,区分性不好的语音,则提示要求说话人重新输入新的语音信号;C、特定人语音命令的识别:(1)特定人语音识别过程头四步与所说的“语音识别参数提取”过程相同;(2)将该语音特征同已存储的识别模板进行比较,采用动态匹配,提取其中最匹配的语音命令作为结果 (speech recognition circuit, distance results, result memory) 输出;(3)在识别过程中,当识别模板匹配误差大于一定门限或可信度很低时,则认为识别结果不可靠,通过提示,要求重新输入语音。D . 语音回放:语音回放方法采用语音合成技术,将所说的语音识别参数与语音合成模型参数进行共享,将语音识别参数同时也作为语音合成模型参数,以尽可能减小系统的开销。

CN1268732A
CLAIM 3
. 如权利要求1所述的特定人语音识别、语音回放方法,其特征在于,所说的语音命令的训练中的动态分段和加权平均方法,具体包括以下步骤:(1)首先根据语音特征参数计算 (calculating circuit) 语音不同帧间参数的变化,当变化超过某一设定阈值,确定该帧为语音特征中重要分界点;(2)对不同语音信号其分界点的个数可以不同;对不同分界点之间的语音特征进行加权平均,提高重要语音特征在识别模型中的比重。

US7979277B2
CLAIM 6
. The speech recognition circuit (识别结果, 作为结果) of claim 1 , comprising control means adapted to implement frame dropping , to discard one or more audio time frames .
CN1268732A
CLAIM 1
. 一种基于语音识别专用芯片的特定人语音识别、语音回放方法,包括A/D采样,频谱整形加窗预加重处理,特征参数提取,端点检测,语音识别模板训练及语音回放或语音识别模板匹配,将最好的识别结果 (speech recognition circuit, distance results, result memory) 输出回放,其特征在于,具体包括以下步骤:A、语音识别参数提取:(1)语音信号输入后采用A/D进行采样,成为原始的数字语音,利用电平增益控制,以确保采样的高精度;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)对分帧语音的特征进行语音特征提取,主要特征为根据语音的线性预测模型(LPC)计算语音特征的倒谱系数(LPCC),并存储用于后面的动态分段和模板提取步骤中;(4)使用语音信号的过零率与短时能量特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;B、特定人语音命令的训练:(1)对提取的语音特征进行动态分段和加权平均,构成模板参数,加权后的参数作为新识别模板;(2)对该新模板进行鉴别特性分析处理,确保新模板和以前训练构成的模板之间具有很好的可区分性;(3)对处理后,区分性不好的语音,则提示要求说话人重新输入新的语音信号;C、特定人语音命令的识别:(1)特定人语音识别过程头四步与所说的“语音识别参数提取”过程相同;(2)将该语音特征同已存储的识别模板进行比较,采用动态匹配,提取其中最匹配的语音命令作为结果 (speech recognition circuit, distance results, result memory) 输出;(3)在识别过程中,当识别模板匹配误差大于一定门限或可信度很低时,则认为识别结果不可靠,通过提示,要求重新输入语音。D . 语音回放:语音回放方法采用语音合成技术,将所说的语音识别参数与语音合成模型参数进行共享,将语音识别参数同时也作为语音合成模型参数,以尽可能减小系统的开销。

US7979277B2
CLAIM 7
. The speech recognition circuit (识别结果, 作为结果) of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal (音命令) for a predetermined time frame .
CN1268732A
CLAIM 1
. 一种基于语音识别专用芯片的特定人语音识别、语音回放方法,包括A/D采样,频谱整形加窗预加重处理,特征参数提取,端点检测,语音识别模板训练及语音回放或语音识别模板匹配,将最好的识别结果 (speech recognition circuit, distance results, result memory) 输出回放,其特征在于,具体包括以下步骤:A、语音识别参数提取:(1)语音信号输入后采用A/D进行采样,成为原始的数字语音,利用电平增益控制,以确保采样的高精度;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)对分帧语音的特征进行语音特征提取,主要特征为根据语音的线性预测模型(LPC)计算语音特征的倒谱系数(LPCC),并存储用于后面的动态分段和模板提取步骤中;(4)使用语音信号的过零率与短时能量特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;B、特定人语音命令 (audio signal, audio time frame) 的训练:(1)对提取的语音特征进行动态分段和加权平均,构成模板参数,加权后的参数作为新识别模板;(2)对该新模板进行鉴别特性分析处理,确保新模板和以前训练构成的模板之间具有很好的可区分性;(3)对处理后,区分性不好的语音,则提示要求说话人重新输入新的语音信号;C、特定人语音命令的识别:(1)特定人语音识别过程头四步与所说的“语音识别参数提取”过程相同;(2)将该语音特征同已存储的识别模板进行比较,采用动态匹配,提取其中最匹配的语音命令作为结果 (speech recognition circuit, distance results, result memory) 输出;(3)在识别过程中,当识别模板匹配误差大于一定门限或可信度很低时,则认为识别结果不可靠,通过提示,要求重新输入语音。D . 语音回放:语音回放方法采用语音合成技术,将所说的语音识别参数与语音合成模型参数进行共享,将语音识别参数同时也作为语音合成模型参数,以尽可能减小系统的开销。

US7979277B2
CLAIM 8
. The speech recognition circuit (识别结果, 作为结果) of claim 1 , wherein the processor is configured to divert to another task if the data flow stalls .
CN1268732A
CLAIM 1
. 一种基于语音识别专用芯片的特定人语音识别、语音回放方法,包括A/D采样,频谱整形加窗预加重处理,特征参数提取,端点检测,语音识别模板训练及语音回放或语音识别模板匹配,将最好的识别结果 (speech recognition circuit, distance results, result memory) 输出回放,其特征在于,具体包括以下步骤:A、语音识别参数提取:(1)语音信号输入后采用A/D进行采样,成为原始的数字语音,利用电平增益控制,以确保采样的高精度;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)对分帧语音的特征进行语音特征提取,主要特征为根据语音的线性预测模型(LPC)计算语音特征的倒谱系数(LPCC),并存储用于后面的动态分段和模板提取步骤中;(4)使用语音信号的过零率与短时能量特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;B、特定人语音命令的训练:(1)对提取的语音特征进行动态分段和加权平均,构成模板参数,加权后的参数作为新识别模板;(2)对该新模板进行鉴别特性分析处理,确保新模板和以前训练构成的模板之间具有很好的可区分性;(3)对处理后,区分性不好的语音,则提示要求说话人重新输入新的语音信号;C、特定人语音命令的识别:(1)特定人语音识别过程头四步与所说的“语音识别参数提取”过程相同;(2)将该语音特征同已存储的识别模板进行比较,采用动态匹配,提取其中最匹配的语音命令作为结果 (speech recognition circuit, distance results, result memory) 输出;(3)在识别过程中,当识别模板匹配误差大于一定门限或可信度很低时,则认为识别结果不可靠,通过提示,要求重新输入语音。D . 语音回放:语音回放方法采用语音合成技术,将所说的语音识别参数与语音合成模型参数进行共享,将语音识别参数同时也作为语音合成模型参数,以尽可能减小系统的开销。

US7979277B2
CLAIM 9
. The speech recognition circuit (识别结果, 作为结果) of claim 1 , wherein the speech accelerator (语音编码) has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
CN1268732A
CLAIM 1
. 一种基于语音识别专用芯片的特定人语音识别、语音回放方法,包括A/D采样,频谱整形加窗预加重处理,特征参数提取,端点检测,语音识别模板训练及语音回放或语音识别模板匹配,将最好的识别结果 (speech recognition circuit, distance results, result memory) 输出回放,其特征在于,具体包括以下步骤:A、语音识别参数提取:(1)语音信号输入后采用A/D进行采样,成为原始的数字语音,利用电平增益控制,以确保采样的高精度;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)对分帧语音的特征进行语音特征提取,主要特征为根据语音的线性预测模型(LPC)计算语音特征的倒谱系数(LPCC),并存储用于后面的动态分段和模板提取步骤中;(4)使用语音信号的过零率与短时能量特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;B、特定人语音命令的训练:(1)对提取的语音特征进行动态分段和加权平均,构成模板参数,加权后的参数作为新识别模板;(2)对该新模板进行鉴别特性分析处理,确保新模板和以前训练构成的模板之间具有很好的可区分性;(3)对处理后,区分性不好的语音,则提示要求说话人重新输入新的语音信号;C、特定人语音命令的识别:(1)特定人语音识别过程头四步与所说的“语音识别参数提取”过程相同;(2)将该语音特征同已存储的识别模板进行比较,采用动态匹配,提取其中最匹配的语音命令作为结果 (speech recognition circuit, distance results, result memory) 输出;(3)在识别过程中,当识别模板匹配误差大于一定门限或可信度很低时,则认为识别结果不可靠,通过提示,要求重新输入语音。D . 语音回放:语音回放方法采用语音合成技术,将所说的语音识别参数与语音合成模型参数进行共享,将语音识别参数同时也作为语音合成模型参数,以尽可能减小系统的开销。

CN1268732A
CLAIM 5
. 如权利要求1所述的特定人语音识别、语音回放方法,其特征在于,所说的语音回放中的识别参数与语音编码 (speech accelerator) 声道模型参数共享的方法,具体包括以下步骤:(1)语音识别模型参数与语音编码声道参数采用相同的参数,因此在语音编码过程中并不需要增加声道模型参数的存储量。(2)声道模型的激励参数采用改进的LPC声码器方法,激励参数为基音周期、清/浊/过渡音判定信息。

US7979277B2
CLAIM 10
. The speech recognition circuit (识别结果, 作为结果) of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory (识别结果, 作为结果) .
CN1268732A
CLAIM 1
. 一种基于语音识别专用芯片的特定人语音识别、语音回放方法,包括A/D采样,频谱整形加窗预加重处理,特征参数提取,端点检测,语音识别模板训练及语音回放或语音识别模板匹配,将最好的识别结果 (speech recognition circuit, distance results, result memory) 输出回放,其特征在于,具体包括以下步骤:A、语音识别参数提取:(1)语音信号输入后采用A/D进行采样,成为原始的数字语音,利用电平增益控制,以确保采样的高精度;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)对分帧语音的特征进行语音特征提取,主要特征为根据语音的线性预测模型(LPC)计算语音特征的倒谱系数(LPCC),并存储用于后面的动态分段和模板提取步骤中;(4)使用语音信号的过零率与短时能量特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;B、特定人语音命令的训练:(1)对提取的语音特征进行动态分段和加权平均,构成模板参数,加权后的参数作为新识别模板;(2)对该新模板进行鉴别特性分析处理,确保新模板和以前训练构成的模板之间具有很好的可区分性;(3)对处理后,区分性不好的语音,则提示要求说话人重新输入新的语音信号;C、特定人语音命令的识别:(1)特定人语音识别过程头四步与所说的“语音识别参数提取”过程相同;(2)将该语音特征同已存储的识别模板进行比较,采用动态匹配,提取其中最匹配的语音命令作为结果 (speech recognition circuit, distance results, result memory) 输出;(3)在识别过程中,当识别模板匹配误差大于一定门限或可信度很低时,则认为识别结果不可靠,通过提示,要求重新输入语音。D . 语音回放:语音回放方法采用语音合成技术,将所说的语音识别参数与语音合成模型参数进行共享,将语音识别参数同时也作为语音合成模型参数,以尽可能减小系统的开销。

US7979277B2
CLAIM 11
. The speech recognition circuit (识别结果, 作为结果) of claim 1 , comprising increasing the pipeline depth by computing extra front frames in advance .
CN1268732A
CLAIM 1
. 一种基于语音识别专用芯片的特定人语音识别、语音回放方法,包括A/D采样,频谱整形加窗预加重处理,特征参数提取,端点检测,语音识别模板训练及语音回放或语音识别模板匹配,将最好的识别结果 (speech recognition circuit, distance results, result memory) 输出回放,其特征在于,具体包括以下步骤:A、语音识别参数提取:(1)语音信号输入后采用A/D进行采样,成为原始的数字语音,利用电平增益控制,以确保采样的高精度;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)对分帧语音的特征进行语音特征提取,主要特征为根据语音的线性预测模型(LPC)计算语音特征的倒谱系数(LPCC),并存储用于后面的动态分段和模板提取步骤中;(4)使用语音信号的过零率与短时能量特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;B、特定人语音命令的训练:(1)对提取的语音特征进行动态分段和加权平均,构成模板参数,加权后的参数作为新识别模板;(2)对该新模板进行鉴别特性分析处理,确保新模板和以前训练构成的模板之间具有很好的可区分性;(3)对处理后,区分性不好的语音,则提示要求说话人重新输入新的语音信号;C、特定人语音命令的识别:(1)特定人语音识别过程头四步与所说的“语音识别参数提取”过程相同;(2)将该语音特征同已存储的识别模板进行比较,采用动态匹配,提取其中最匹配的语音命令作为结果 (speech recognition circuit, distance results, result memory) 输出;(3)在识别过程中,当识别模板匹配误差大于一定门限或可信度很低时,则认为识别结果不可靠,通过提示,要求重新输入语音。D . 语音回放:语音回放方法采用语音合成技术,将所说的语音识别参数与语音合成模型参数进行共享,将语音识别参数同时也作为语音合成模型参数,以尽可能减小系统的开销。

US7979277B2
CLAIM 12
. The speech recognition circuit (识别结果, 作为结果) of claim 1 , wherein the audio front end is configured to input a digital audio signal (音命令) .
CN1268732A
CLAIM 1
. 一种基于语音识别专用芯片的特定人语音识别、语音回放方法,包括A/D采样,频谱整形加窗预加重处理,特征参数提取,端点检测,语音识别模板训练及语音回放或语音识别模板匹配,将最好的识别结果 (speech recognition circuit, distance results, result memory) 输出回放,其特征在于,具体包括以下步骤:A、语音识别参数提取:(1)语音信号输入后采用A/D进行采样,成为原始的数字语音,利用电平增益控制,以确保采样的高精度;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)对分帧语音的特征进行语音特征提取,主要特征为根据语音的线性预测模型(LPC)计算语音特征的倒谱系数(LPCC),并存储用于后面的动态分段和模板提取步骤中;(4)使用语音信号的过零率与短时能量特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;B、特定人语音命令 (audio signal, audio time frame) 的训练:(1)对提取的语音特征进行动态分段和加权平均,构成模板参数,加权后的参数作为新识别模板;(2)对该新模板进行鉴别特性分析处理,确保新模板和以前训练构成的模板之间具有很好的可区分性;(3)对处理后,区分性不好的语音,则提示要求说话人重新输入新的语音信号;C、特定人语音命令的识别:(1)特定人语音识别过程头四步与所说的“语音识别参数提取”过程相同;(2)将该语音特征同已存储的识别模板进行比较,采用动态匹配,提取其中最匹配的语音命令作为结果 (speech recognition circuit, distance results, result memory) 输出;(3)在识别过程中,当识别模板匹配误差大于一定门限或可信度很低时,则认为识别结果不可靠,通过提示,要求重新输入语音。D . 语音回放:语音回放方法采用语音合成技术,将所说的语音识别参数与语音合成模型参数进行共享,将语音识别参数同时也作为语音合成模型参数,以尽可能减小系统的开销。

US7979277B2
CLAIM 13
. A speech recognition circuit (识别结果, 作为结果) of claim 1 , wherein said distance comprises a Mahalanobis distance .
CN1268732A
CLAIM 1
. 一种基于语音识别专用芯片的特定人语音识别、语音回放方法,包括A/D采样,频谱整形加窗预加重处理,特征参数提取,端点检测,语音识别模板训练及语音回放或语音识别模板匹配,将最好的识别结果 (speech recognition circuit, distance results, result memory) 输出回放,其特征在于,具体包括以下步骤:A、语音识别参数提取:(1)语音信号输入后采用A/D进行采样,成为原始的数字语音,利用电平增益控制,以确保采样的高精度;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)对分帧语音的特征进行语音特征提取,主要特征为根据语音的线性预测模型(LPC)计算语音特征的倒谱系数(LPCC),并存储用于后面的动态分段和模板提取步骤中;(4)使用语音信号的过零率与短时能量特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;B、特定人语音命令的训练:(1)对提取的语音特征进行动态分段和加权平均,构成模板参数,加权后的参数作为新识别模板;(2)对该新模板进行鉴别特性分析处理,确保新模板和以前训练构成的模板之间具有很好的可区分性;(3)对处理后,区分性不好的语音,则提示要求说话人重新输入新的语音信号;C、特定人语音命令的识别:(1)特定人语音识别过程头四步与所说的“语音识别参数提取”过程相同;(2)将该语音特征同已存储的识别模板进行比较,采用动态匹配,提取其中最匹配的语音命令作为结果 (speech recognition circuit, distance results, result memory) 输出;(3)在识别过程中,当识别模板匹配误差大于一定门限或可信度很低时,则认为识别结果不可靠,通过提示,要求重新输入语音。D . 语音回放:语音回放方法采用语音合成技术,将所说的语音识别参数与语音合成模型参数进行共享,将语音识别参数同时也作为语音合成模型参数,以尽可能减小系统的开销。

US7979277B2
CLAIM 14
. A speech recognition circuit (识别结果, 作为结果) , comprising : an audio front end for calculating a feature vector from an audio signal (音命令) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (音命令) ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
CN1268732A
CLAIM 1
. 一种基于语音识别专用芯片的特定人语音识别、语音回放方法,包括A/D采样,频谱整形加窗预加重处理,特征参数提取,端点检测,语音识别模板训练及语音回放或语音识别模板匹配,将最好的识别结果 (speech recognition circuit, distance results, result memory) 输出回放,其特征在于,具体包括以下步骤:A、语音识别参数提取:(1)语音信号输入后采用A/D进行采样,成为原始的数字语音,利用电平增益控制,以确保采样的高精度;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)对分帧语音的特征进行语音特征提取,主要特征为根据语音的线性预测模型(LPC)计算语音特征的倒谱系数(LPCC),并存储用于后面的动态分段和模板提取步骤中;(4)使用语音信号的过零率与短时能量特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;B、特定人语音命令 (audio signal, audio time frame) 的训练:(1)对提取的语音特征进行动态分段和加权平均,构成模板参数,加权后的参数作为新识别模板;(2)对该新模板进行鉴别特性分析处理,确保新模板和以前训练构成的模板之间具有很好的可区分性;(3)对处理后,区分性不好的语音,则提示要求说话人重新输入新的语音信号;C、特定人语音命令的识别:(1)特定人语音识别过程头四步与所说的“语音识别参数提取”过程相同;(2)将该语音特征同已存储的识别模板进行比较,采用动态匹配,提取其中最匹配的语音命令作为结果 (speech recognition circuit, distance results, result memory) 输出;(3)在识别过程中,当识别模板匹配误差大于一定门限或可信度很低时,则认为识别结果不可靠,通过提示,要求重新输入语音。D . 语音回放:语音回放方法采用语音合成技术,将所说的语音识别参数与语音合成模型参数进行共享,将语音识别参数同时也作为语音合成模型参数,以尽可能减小系统的开销。

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal (音命令) using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (音命令) ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit (数计算) ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
CN1268732A
CLAIM 1
. 一种基于语音识别专用芯片的特定人语音识别、语音回放方法,包括A/D采样,频谱整形加窗预加重处理,特征参数提取,端点检测,语音识别模板训练及语音回放或语音识别模板匹配,将最好的识别结果输出回放,其特征在于,具体包括以下步骤:A、语音识别参数提取:(1)语音信号输入后采用A/D进行采样,成为原始的数字语音,利用电平增益控制,以确保采样的高精度;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)对分帧语音的特征进行语音特征提取,主要特征为根据语音的线性预测模型(LPC)计算语音特征的倒谱系数(LPCC),并存储用于后面的动态分段和模板提取步骤中;(4)使用语音信号的过零率与短时能量特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;B、特定人语音命令 (audio signal, audio time frame) 的训练:(1)对提取的语音特征进行动态分段和加权平均,构成模板参数,加权后的参数作为新识别模板;(2)对该新模板进行鉴别特性分析处理,确保新模板和以前训练构成的模板之间具有很好的可区分性;(3)对处理后,区分性不好的语音,则提示要求说话人重新输入新的语音信号;C、特定人语音命令的识别:(1)特定人语音识别过程头四步与所说的“语音识别参数提取”过程相同;(2)将该语音特征同已存储的识别模板进行比较,采用动态匹配,提取其中最匹配的语音命令作为结果输出;(3)在识别过程中,当识别模板匹配误差大于一定门限或可信度很低时,则认为识别结果不可靠,通过提示,要求重新输入语音。D . 语音回放:语音回放方法采用语音合成技术,将所说的语音识别参数与语音合成模型参数进行共享,将语音识别参数同时也作为语音合成模型参数,以尽可能减小系统的开销。

CN1268732A
CLAIM 3
. 如权利要求1所述的特定人语音识别、语音回放方法,其特征在于,所说的语音命令的训练中的动态分段和加权平均方法,具体包括以下步骤:(1)首先根据语音特征参数计算 (calculating circuit) 语音不同帧间参数的变化,当变化超过某一设定阈值,确定该帧为语音特征中重要分界点;(2)对不同语音信号其分界点的个数可以不同;对不同分界点之间的语音特征进行加权平均,提高重要语音特征在识别模型中的比重。

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal (音命令) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (音命令) ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
CN1268732A
CLAIM 1
. 一种基于语音识别专用芯片的特定人语音识别、语音回放方法,包括A/D采样,频谱整形加窗预加重处理,特征参数提取,端点检测,语音识别模板训练及语音回放或语音识别模板匹配,将最好的识别结果输出回放,其特征在于,具体包括以下步骤:A、语音识别参数提取:(1)语音信号输入后采用A/D进行采样,成为原始的数字语音,利用电平增益控制,以确保采样的高精度;(2)对所说的原始数字语音信号进行频谱整形及分帧加窗处理,以保证分帧语音的准平稳性;(3)对分帧语音的特征进行语音特征提取,主要特征为根据语音的线性预测模型(LPC)计算语音特征的倒谱系数(LPCC),并存储用于后面的动态分段和模板提取步骤中;(4)使用语音信号的过零率与短时能量特征进行端点检测,去除无声区的语音帧,以保证各帧语音特征的有效性;B、特定人语音命令 (audio signal, audio time frame) 的训练:(1)对提取的语音特征进行动态分段和加权平均,构成模板参数,加权后的参数作为新识别模板;(2)对该新模板进行鉴别特性分析处理,确保新模板和以前训练构成的模板之间具有很好的可区分性;(3)对处理后,区分性不好的语音,则提示要求说话人重新输入新的语音信号;C、特定人语音命令的识别:(1)特定人语音识别过程头四步与所说的“语音识别参数提取”过程相同;(2)将该语音特征同已存储的识别模板进行比较,采用动态匹配,提取其中最匹配的语音命令作为结果输出;(3)在识别过程中,当识别模板匹配误差大于一定门限或可信度很低时,则认为识别结果不可靠,通过提示,要求重新输入语音。D . 语音回放:语音回放方法采用语音合成技术,将所说的语音识别参数与语音合成模型参数进行共享,将语音识别参数同时也作为语音合成模型参数,以尽可能减小系统的开销。




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6732142B1

Filed: 2000-01-25     Issued: 2004-05-04

Method and apparatus for audible presentation of web page content

(Original Assignee) International Business Machines Corp     (Current Assignee) Google LLC

Cary Lee Bates, Paul Reuben Day, John Matthew Santosuosso
US7979277B2
CLAIM 10
. The speech recognition circuit of claim 1 , wherein the accelerator signals (time intervals) to the search stage when the distances for a new frame are available in a result memory .
US6732142B1
CLAIM 2
. A method of presenting information from the web , comprising the steps of : selecting web content for audible background presentation on a web client digital device , said web client digital device supporting concurrent execution of a plurality of tasks ;
specifying at least one audible presentation parameter , said at least one audible presentation parameter determining when said selected web content will be audibly presented ;
and audibly presenting said selected web content on said web client digital device at a time determined by said at least one audible presentation parameter , said step of audibly presenting said selected web content being performed as a background task of said plurality of tasks executing on said web client digital device , concurrently with visually presenting independent information on a display of said web client digital device , said independent information being presented as at least one task of said plurality of tasks executing on said web client digital device other than said background task , said independent information being unaffected by said audio presentation ;
wherein said at least one audible presentation parameter comprises a time interval for accessing a web server , and wherein said step of audibly presenting said selected web content comprises the steps of : accessing said web server a plurality of times at time intervals (accelerator signals) determined by said time interval parameter to obtain current web content ;
and audibly presenting said current web content at a plurality of times .

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end is configured to input a digital audio (said time) signal .
US6732142B1
CLAIM 2
. A method of presenting information from the web , comprising the steps of : selecting web content for audible background presentation on a web client digital device , said web client digital device supporting concurrent execution of a plurality of tasks ;
specifying at least one audible presentation parameter , said at least one audible presentation parameter determining when said selected web content will be audibly presented ;
and audibly presenting said selected web content on said web client digital device at a time determined by said at least one audible presentation parameter , said step of audibly presenting said selected web content being performed as a background task of said plurality of tasks executing on said web client digital device , concurrently with visually presenting independent information on a display of said web client digital device , said independent information being presented as at least one task of said plurality of tasks executing on said web client digital device other than said background task , said independent information being unaffected by said audio presentation ;
wherein said at least one audible presentation parameter comprises a time interval for accessing a web server , and wherein said step of audibly presenting said selected web content comprises the steps of : accessing said web server a plurality of times at time intervals determined by said time (digital audio, digital audio signal) interval parameter to obtain current web content ;
and audibly presenting said current web content at a plurality of times .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6687341B1

Filed: 1999-12-21     Issued: 2004-02-03

Network and method for the specification and delivery of customized information content via a telephone interface

(Original Assignee) BellSouth Intellectual Property Corp     (Current Assignee) Open Invention Network LLC

Robert A. Koch, Arnold Chester McQuaide, Jr.
US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator has an interrupt signal (incoming telephone call) to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
US6687341B1
CLAIM 2
. The communications network of claim 1 , wherein the intelligent resource server maps the incoming telephone call (interrupt signal) to a URL having an IP address of the server .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation (predetermined destination) , and to the word identification .
US6687341B1
CLAIM 16
. The communications network of claim 4 , wherein : the SCP routes an incoming call to the SSP switch terminating at a predetermined destination (distance calculation) number to the intelligent resource server ;
and the intelligent resource servers maps the incoming call to the audio-based document generated by the server .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6349132B1

Filed: 1999-12-16     Issued: 2002-02-19

Voice interface for electronic documents

(Original Assignee) Talk2 Tech Inc     (Current Assignee) Talk2com ; Intellisync LLC

Darren L. Wesemann, Dong-Kyun Nam, Richard T. Newton
US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector (hierarchical relationship) from the front end .
US6349132B1
CLAIM 1
. In a system that includes an information service and a telephone for interacting with the information service , a method of enabling a user of the telephone to access and navigate electronic documents by presenting to the user an audio representation of a hierarchy of links of the document so as to enhance the ability of the user to navigate the electronic documents , the method comprising the acts of : obtaining an electronic document ;
parsing the electronic document to identify any text and any links included in the content of the electronic document ;
mapping content of the parsed electronic document by performing the acts of : determining whether the text and links included in the content of the document represent categories , first-level links and second-level links in a hierarchical relationship (next feature vector) one with another ;
and to the extent that the text and links represent categories , first-level links and second level links , creating a hierarchical data structure that associates the text and links to the categories , the first-level links and the second-level links ;
generating an audio representation of at least a portion of the parsed electronic document , the audio representation being communicated to a client ;
and prompting the user to select a category from the hierarchical data structure and then successively prompting the user to select any first-level links and second-level links , such that the content of the electronic document is presented audibly to the user and the user can make verbal selections .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means (third part) for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US6349132B1
CLAIM 7
. A method as recited in claim 6 wherein the instruction received from the client is at least one of an instruction to email , fax , or voice mail at least a portion of the electronic document to a third part (calculating means) y selected from the client' ;
s contact list .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6442519B1

Filed: 1999-11-10     Issued: 2002-08-27

Speaker model adaptation via network of similar users

(Original Assignee) International Business Machines Corp     (Current Assignee) Nuance Communications Inc

Dimitri Kanevsky, Vit V. Libal, Jan Sedivy, Wlodek W. Zadrozny
US7979277B2
CLAIM 1
. A speech recognition circuit (recognizing speech) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US6442519B1
CLAIM 1
. A speech recognition system for recognizing speech (speech recognition circuit, word identification) input from computer users connected together over a network of computers , a plurality of said computers each including at least one acoustic model trained for a particular user , said system comprising : means for comparing acoustic models of one or more computer users , each of said computer users using one of a plurality of computers ;
means for clustering users on a network of said plurality of computers into clusters of similar users responsive to said comparison of acoustic models ;
means for modifying each of said acoustic models responsive to user production activities ;
means for comparing identified similar acoustic models and , responsive to modification of one or more of said acoustic models , modifying one or more compared said identified similar acoustic models ;
and means for transmitting acoustic model data over said network to other computers connected to said network .

US7979277B2
CLAIM 2
. A speech recognition circuit (recognizing speech) as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor .
US6442519B1
CLAIM 1
. A speech recognition system for recognizing speech (speech recognition circuit, word identification) input from computer users connected together over a network of computers , a plurality of said computers each including at least one acoustic model trained for a particular user , said system comprising : means for comparing acoustic models of one or more computer users , each of said computer users using one of a plurality of computers ;
means for clustering users on a network of said plurality of computers into clusters of similar users responsive to said comparison of acoustic models ;
means for modifying each of said acoustic models responsive to user production activities ;
means for comparing identified similar acoustic models and , responsive to modification of one or more of said acoustic models , modifying one or more compared said identified similar acoustic models ;
and means for transmitting acoustic model data over said network to other computers connected to said network .

US7979277B2
CLAIM 3
. A speech recognition circuit (recognizing speech) as claimed in claim 1 , comprising dynamic scheduling whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
US6442519B1
CLAIM 1
. A speech recognition system for recognizing speech (speech recognition circuit, word identification) input from computer users connected together over a network of computers , a plurality of said computers each including at least one acoustic model trained for a particular user , said system comprising : means for comparing acoustic models of one or more computer users , each of said computer users using one of a plurality of computers ;
means for clustering users on a network of said plurality of computers into clusters of similar users responsive to said comparison of acoustic models ;
means for modifying each of said acoustic models responsive to user production activities ;
means for comparing identified similar acoustic models and , responsive to modification of one or more of said acoustic models , modifying one or more compared said identified similar acoustic models ;
and means for transmitting acoustic model data over said network to other computers connected to said network .

US7979277B2
CLAIM 4
. A speech recognition circuit (recognizing speech) as claimed in claim 1 , wherein the first processor supports multi-threaded operation , and runs the search stage and front ends as separate threads .
US6442519B1
CLAIM 1
. A speech recognition system for recognizing speech (speech recognition circuit, word identification) input from computer users connected together over a network of computers , a plurality of said computers each including at least one acoustic model trained for a particular user , said system comprising : means for comparing acoustic models of one or more computer users , each of said computer users using one of a plurality of computers ;
means for clustering users on a network of said plurality of computers into clusters of similar users responsive to said comparison of acoustic models ;
means for modifying each of said acoustic models responsive to user production activities ;
means for comparing identified similar acoustic models and , responsive to modification of one or more of said acoustic models , modifying one or more compared said identified similar acoustic models ;
and means for transmitting acoustic model data over said network to other computers connected to said network .

US7979277B2
CLAIM 5
. A speech recognition circuit (recognizing speech) as claimed in claim 1 , wherein the said calculating circuit is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
US6442519B1
CLAIM 1
. A speech recognition system for recognizing speech (speech recognition circuit, word identification) input from computer users connected together over a network of computers , a plurality of said computers each including at least one acoustic model trained for a particular user , said system comprising : means for comparing acoustic models of one or more computer users , each of said computer users using one of a plurality of computers ;
means for clustering users on a network of said plurality of computers into clusters of similar users responsive to said comparison of acoustic models ;
means for modifying each of said acoustic models responsive to user production activities ;
means for comparing identified similar acoustic models and , responsive to modification of one or more of said acoustic models , modifying one or more compared said identified similar acoustic models ;
and means for transmitting acoustic model data over said network to other computers connected to said network .

US7979277B2
CLAIM 6
. The speech recognition circuit (recognizing speech) of claim 1 , comprising control means adapted to implement frame dropping (noise generation, sound generation) , to discard one or more audio time frames .
US6442519B1
CLAIM 1
. A speech recognition system for recognizing speech (speech recognition circuit, word identification) input from computer users connected together over a network of computers , a plurality of said computers each including at least one acoustic model trained for a particular user , said system comprising : means for comparing acoustic models of one or more computer users , each of said computer users using one of a plurality of computers ;
means for clustering users on a network of said plurality of computers into clusters of similar users responsive to said comparison of acoustic models ;
means for modifying each of said acoustic models responsive to user production activities ;
means for comparing identified similar acoustic models and , responsive to modification of one or more of said acoustic models , modifying one or more compared said identified similar acoustic models ;
and means for transmitting acoustic model data over said network to other computers connected to said network .

US6442519B1
CLAIM 6
. A speech recognition system as in claim 2 , further comprising means for receiving user production activities , said means for receiving user production activities being capable of receiving activity selected from the group consisting of dictation , conversation , error correction , sound generation (frame dropping) , noise generation (frame dropping) and music generation .

US7979277B2
CLAIM 7
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal for a predetermined time frame .
US6442519B1
CLAIM 1
. A speech recognition system for recognizing speech (speech recognition circuit, word identification) input from computer users connected together over a network of computers , a plurality of said computers each including at least one acoustic model trained for a particular user , said system comprising : means for comparing acoustic models of one or more computer users , each of said computer users using one of a plurality of computers ;
means for clustering users on a network of said plurality of computers into clusters of similar users responsive to said comparison of acoustic models ;
means for modifying each of said acoustic models responsive to user production activities ;
means for comparing identified similar acoustic models and , responsive to modification of one or more of said acoustic models , modifying one or more compared said identified similar acoustic models ;
and means for transmitting acoustic model data over said network to other computers connected to said network .

US7979277B2
CLAIM 8
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the processor is configured to divert to another task if the data flow stalls .
US6442519B1
CLAIM 1
. A speech recognition system for recognizing speech (speech recognition circuit, word identification) input from computer users connected together over a network of computers , a plurality of said computers each including at least one acoustic model trained for a particular user , said system comprising : means for comparing acoustic models of one or more computer users , each of said computer users using one of a plurality of computers ;
means for clustering users on a network of said plurality of computers into clusters of similar users responsive to said comparison of acoustic models ;
means for modifying each of said acoustic models responsive to user production activities ;
means for comparing identified similar acoustic models and , responsive to modification of one or more of said acoustic models , modifying one or more compared said identified similar acoustic models ;
and means for transmitting acoustic model data over said network to other computers connected to said network .

US7979277B2
CLAIM 9
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
US6442519B1
CLAIM 1
. A speech recognition system for recognizing speech (speech recognition circuit, word identification) input from computer users connected together over a network of computers , a plurality of said computers each including at least one acoustic model trained for a particular user , said system comprising : means for comparing acoustic models of one or more computer users , each of said computer users using one of a plurality of computers ;
means for clustering users on a network of said plurality of computers into clusters of similar users responsive to said comparison of acoustic models ;
means for modifying each of said acoustic models responsive to user production activities ;
means for comparing identified similar acoustic models and , responsive to modification of one or more of said acoustic models , modifying one or more compared said identified similar acoustic models ;
and means for transmitting acoustic model data over said network to other computers connected to said network .

US7979277B2
CLAIM 10
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the accelerator signals (particular user) to the search stage when the distances for a new frame are available in a result memory .
US6442519B1
CLAIM 1
. A speech recognition system for recognizing speech (speech recognition circuit, word identification) input from computer users connected together over a network of computers , a plurality of said computers each including at least one acoustic model trained for a particular user (accelerator signals) , said system comprising : means for comparing acoustic models of one or more computer users , each of said computer users using one of a plurality of computers ;
means for clustering users on a network of said plurality of computers into clusters of similar users responsive to said comparison of acoustic models ;
means for modifying each of said acoustic models responsive to user production activities ;
means for comparing identified similar acoustic models and , responsive to modification of one or more of said acoustic models , modifying one or more compared said identified similar acoustic models ;
and means for transmitting acoustic model data over said network to other computers connected to said network .

US7979277B2
CLAIM 11
. The speech recognition circuit (recognizing speech) of claim 1 , comprising increasing the pipeline depth by computing extra front frames in advance .
US6442519B1
CLAIM 1
. A speech recognition system for recognizing speech (speech recognition circuit, word identification) input from computer users connected together over a network of computers , a plurality of said computers each including at least one acoustic model trained for a particular user , said system comprising : means for comparing acoustic models of one or more computer users , each of said computer users using one of a plurality of computers ;
means for clustering users on a network of said plurality of computers into clusters of similar users responsive to said comparison of acoustic models ;
means for modifying each of said acoustic models responsive to user production activities ;
means for comparing identified similar acoustic models and , responsive to modification of one or more of said acoustic models , modifying one or more compared said identified similar acoustic models ;
and means for transmitting acoustic model data over said network to other computers connected to said network .

US7979277B2
CLAIM 12
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the audio front end is configured to input a digital audio signal .
US6442519B1
CLAIM 1
. A speech recognition system for recognizing speech (speech recognition circuit, word identification) input from computer users connected together over a network of computers , a plurality of said computers each including at least one acoustic model trained for a particular user , said system comprising : means for comparing acoustic models of one or more computer users , each of said computer users using one of a plurality of computers ;
means for clustering users on a network of said plurality of computers into clusters of similar users responsive to said comparison of acoustic models ;
means for modifying each of said acoustic models responsive to user production activities ;
means for comparing identified similar acoustic models and , responsive to modification of one or more of said acoustic models , modifying one or more compared said identified similar acoustic models ;
and means for transmitting acoustic model data over said network to other computers connected to said network .

US7979277B2
CLAIM 13
. A speech recognition circuit (recognizing speech) of claim 1 , wherein said distance comprises a Mahalanobis distance .
US6442519B1
CLAIM 1
. A speech recognition system for recognizing speech (speech recognition circuit, word identification) input from computer users connected together over a network of computers , a plurality of said computers each including at least one acoustic model trained for a particular user , said system comprising : means for comparing acoustic models of one or more computer users , each of said computer users using one of a plurality of computers ;
means for clustering users on a network of said plurality of computers into clusters of similar users responsive to said comparison of acoustic models ;
means for modifying each of said acoustic models responsive to user production activities ;
means for comparing identified similar acoustic models and , responsive to modification of one or more of said acoustic models , modifying one or more compared said identified similar acoustic models ;
and means for transmitting acoustic model data over said network to other computers connected to said network .

US7979277B2
CLAIM 14
. A speech recognition circuit (recognizing speech) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US6442519B1
CLAIM 1
. A speech recognition system for recognizing speech (speech recognition circuit, word identification) input from computer users connected together over a network of computers , a plurality of said computers each including at least one acoustic model trained for a particular user , said system comprising : means for comparing acoustic models of one or more computer users , each of said computer users using one of a plurality of computers ;
means for clustering users on a network of said plurality of computers into clusters of similar users responsive to said comparison of acoustic models ;
means for modifying each of said acoustic models responsive to user production activities ;
means for comparing identified similar acoustic models and , responsive to modification of one or more of said acoustic models , modifying one or more compared said identified similar acoustic models ;
and means for transmitting acoustic model data over said network to other computers connected to said network .

US7979277B2
CLAIM 15
. A speech recognition method (speech recognition method) , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US6442519B1
CLAIM 15
. A speech recognition method (speech recognition method) for recognizing speech from each of a plurality of computer users , said method comprising the steps of : a) clustering computer users coupled together over a network of connected computers into classes of similar users , at least one acoustic model being maintained on a corresponding one of said connected computers for each of said computer users ;
b) for each of said classes , identifying similar acoustic models being used by clustered users ;
c) modifying one user acoustic model responsive to user production activities by a corresponding clustered user ;
d) comparing and adapting all said identified similar acoustic models responsive to modification of said one user acoustic model ;
and e) transmitting user data over said network , said transmitted user data including information about user activities and user acoustic model data .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method (speech recognition method) , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification (recognizing speech) .
US6442519B1
CLAIM 1
. A speech recognition system for recognizing speech (speech recognition circuit, word identification) input from computer users connected together over a network of computers , a plurality of said computers each including at least one acoustic model trained for a particular user , said system comprising : means for comparing acoustic models of one or more computer users , each of said computer users using one of a plurality of computers ;
means for clustering users on a network of said plurality of computers into clusters of similar users responsive to said comparison of acoustic models ;
means for modifying each of said acoustic models responsive to user production activities ;
means for comparing identified similar acoustic models and , responsive to modification of one or more of said acoustic models , modifying one or more compared said identified similar acoustic models ;
and means for transmitting acoustic model data over said network to other computers connected to said network .

US6442519B1
CLAIM 15
. A speech recognition method (speech recognition method) for recognizing speech from each of a plurality of computer users , said method comprising the steps of : a) clustering computer users coupled together over a network of connected computers into classes of similar users , at least one acoustic model being maintained on a corresponding one of said connected computers for each of said computer users ;
b) for each of said classes , identifying similar acoustic models being used by clustered users ;
c) modifying one user acoustic model responsive to user production activities by a corresponding clustered user ;
d) comparing and adapting all said identified similar acoustic models responsive to modification of said one user acoustic model ;
and e) transmitting user data over said network , said transmitted user data including information about user activities and user acoustic model data .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
JP2001117583A

Filed: 1999-10-15     Issued: 2001-04-27

音声認識装置および音声認識方法、並びに記録媒体

(Original Assignee) Sony Corp; ソニー株式会社     

Hideki Kishi, 秀樹 岸
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal (データ) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
JP2001117583A
CLAIM 6
【請求項6】 前記音声認識手段は、 音声認識を行うのに参照する参照データ (audio signal) を、言語ごとに 記憶している記憶手段を有し、 前記音声中に未知語を検出したとき、その未知語に対応 する音声を、他の言語についての参照データを参照して 認識することを特徴とする請求項1に記載の音声認識装 置。

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal (データ) for a predetermined time frame .
JP2001117583A
CLAIM 6
【請求項6】 前記音声認識手段は、 音声認識を行うのに参照する参照データ (audio signal) を、言語ごとに 記憶している記憶手段を有し、 前記音声中に未知語を検出したとき、その未知語に対応 する音声を、他の言語についての参照データを参照して 認識することを特徴とする請求項1に記載の音声認識装 置。

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature (備えること) vector from the front end .
JP2001117583A
CLAIM 1
【請求項1】 入力された音声を認識する音声認識装置 であって、 前記音声の特徴パラメータを抽出する抽出手段と、 前記特徴パラメータに基づいて、前記音声を認識し、そ の音声認識結果の1以上の候補と、各候補の確からしさ に対応するスコアを、自然言語処理を行う自然言語処理 手段に出力する音声認識手段とを備えること (next feature) を特徴とす る音声認識装置。

US7979277B2
CLAIM 10
. The speech recognition circuit of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory (処理結果, 認識結果) .
JP2001117583A
CLAIM 1
【請求項1】 入力された音声を認識する音声認識装置 であって、 前記音声の特徴パラメータを抽出する抽出手段と、 前記特徴パラメータに基づいて、前記音声を認識し、そ の音声認識結果 (result memory) の1以上の候補と、各候補の確からしさ に対応するスコアを、自然言語処理を行う自然言語処理 手段に出力する音声認識手段とを備えることを特徴とす る音声認識装置。

JP2001117583A
CLAIM 3
【請求項3】 前記自然言語処理手段は、前記音声認識 結果の1以上の候補を自然言語処理し、その自然言語処 理結果の中から、前記音声認識結果の各候補の言語的な 信頼性と、前記スコアとに基づいて、最終的な自然言語 処理結果 (result memory) を選択することを特徴とする請求項1に記載の 音声認識装置。

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end is configured to input a digital audio signal (データ) .
JP2001117583A
CLAIM 6
【請求項6】 前記音声認識手段は、 音声認識を行うのに参照する参照データ (audio signal) を、言語ごとに 記憶している記憶手段を有し、 前記音声中に未知語を検出したとき、その未知語に対応 する音声を、他の言語についての参照データを参照して 認識することを特徴とする請求項1に記載の音声認識装 置。

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal (データ) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
JP2001117583A
CLAIM 6
【請求項6】 前記音声認識手段は、 音声認識を行うのに参照する参照データ (audio signal) を、言語ごとに 記憶している記憶手段を有し、 前記音声中に未知語を検出したとき、その未知語に対応 する音声を、他の言語についての参照データを参照して 認識することを特徴とする請求項1に記載の音声認識装 置。

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal (データ) using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
JP2001117583A
CLAIM 6
【請求項6】 前記音声認識手段は、 音声認識を行うのに参照する参照データ (audio signal) を、言語ごとに 記憶している記憶手段を有し、 前記音声中に未知語を検出したとき、その未知語に対応 する音声を、他の言語についての参照データを参照して 認識することを特徴とする請求項1に記載の音声認識装 置。

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal (データ) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
JP2001117583A
CLAIM 6
【請求項6】 前記音声認識手段は、 音声認識を行うのに参照する参照データ (audio signal) を、言語ごとに 記憶している記憶手段を有し、 前記音声中に未知語を検出したとき、その未知語に対応 する音声を、他の言語についての参照データを参照して 認識することを特徴とする請求項1に記載の音声認識装 置。




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6539353B1

Filed: 1999-10-12     Issued: 2003-03-25

Confidence measures using sub-word-dependent weighting of sub-word confidence scores for robust speech recognition

(Original Assignee) Microsoft Corp     (Current Assignee) Microsoft Technology Licensing LLC

Li Jiang, Xuedong Huang
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal (noise signal) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US6539353B1
CLAIM 16
. A computer-readable medium having computer executable instructions for performing steps comprising : receiving a digital signal representative of an input speech and noise signal (audio signal) ;
extracting features from the digital signal ;
identifying a recognition score from the features for a hypothesis word formed of sub-words ;
and determining a confidence score for the hypothesis word based on weighted sub-word confidence scores determined from the features , the weighting of sub-word confidence scores applying different weights to sub-word confidence scores associated with different classes of sub-words for the sub-words of the word .

US7979277B2
CLAIM 3
. A speech recognition circuit as claimed in claim 1 , comprising dynamic scheduling (different sub) whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
US6539353B1
CLAIM 2
. The speech recognition system of claim 1 wherein attributing different weights to different sub (dynamic scheduling) -words comprises attributing different parameters to functions of a sub-word confidence score .

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal (noise signal) for a predetermined time frame .
US6539353B1
CLAIM 16
. A computer-readable medium having computer executable instructions for performing steps comprising : receiving a digital signal representative of an input speech and noise signal (audio signal) ;
extracting features from the digital signal ;
identifying a recognition score from the features for a hypothesis word formed of sub-words ;
and determining a confidence score for the hypothesis word based on weighted sub-word confidence scores determined from the features , the weighting of sub-word confidence scores applying different weights to sub-word confidence scores associated with different classes of sub-words for the sub-words of the word .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator (discriminative training) has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
US6539353B1
CLAIM 11
. The method of claim 10 wherein applying different weights to sub-word confidence scores further comprises selecting the parameters for a class based on discriminative training (speech accelerator, speech recognition method) .

US7979277B2
CLAIM 10
. The speech recognition circuit of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory (speech signal) .
US6539353B1
CLAIM 9
. A method of speech recognition comprising : extracting at least one feature from a set of digital values that represent a speech signal (result memory) ;
identifying a hypothesis word formed of sub-words from the feature ;
determining sub-word confidence scores from the feature for each sub-word of the hypothesis word ;
and determining a word confidence score for the hypothesis word by-applying different weights to the sub-word confidence scores associated with different classes of sub-words of the hypothesis word .

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end is configured to input a digital audio (digital value) signal .
US6539353B1
CLAIM 9
. A method of speech recognition comprising : extracting at least one feature from a set of digital value (digital audio) s that represent a speech signal ;
identifying a hypothesis word formed of sub-words from the feature ;
determining sub-word confidence scores from the feature for each sub-word of the hypothesis word ;
and determining a word confidence score for the hypothesis word by-applying different weights to the sub-word confidence scores associated with different classes of sub-words of the hypothesis word .

US6539353B1
CLAIM 16
. A computer-readable medium having computer executable instructions for performing steps comprising : receiving a digital signal representative of an input speech and noise signal (audio signal) ;
extracting features from the digital signal ;
identifying a recognition score from the features for a hypothesis word formed of sub-words ;
and determining a confidence score for the hypothesis word based on weighted sub-word confidence scores determined from the features , the weighting of sub-word confidence scores applying different weights to sub-word confidence scores associated with different classes of sub-words for the sub-words of the word .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal (noise signal) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US6539353B1
CLAIM 16
. A computer-readable medium having computer executable instructions for performing steps comprising : receiving a digital signal representative of an input speech and noise signal (audio signal) ;
extracting features from the digital signal ;
identifying a recognition score from the features for a hypothesis word formed of sub-words ;
and determining a confidence score for the hypothesis word based on weighted sub-word confidence scores determined from the features , the weighting of sub-word confidence scores applying different weights to sub-word confidence scores associated with different classes of sub-words for the sub-words of the word .

US7979277B2
CLAIM 15
. A speech recognition method (discriminative training) , comprising : calculating a feature vector from an audio signal (noise signal) using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US6539353B1
CLAIM 11
. The method of claim 10 wherein applying different weights to sub-word confidence scores further comprises selecting the parameters for a class based on discriminative training (speech accelerator, speech recognition method) .

US6539353B1
CLAIM 16
. A computer-readable medium having computer executable instructions for performing steps comprising : receiving a digital signal representative of an input speech and noise signal (audio signal) ;
extracting features from the digital signal ;
identifying a recognition score from the features for a hypothesis word formed of sub-words ;
and determining a confidence score for the hypothesis word based on weighted sub-word confidence scores determined from the features , the weighting of sub-word confidence scores applying different weights to sub-word confidence scores associated with different classes of sub-words for the sub-words of the word .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method (discriminative training) , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal (noise signal) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
US6539353B1
CLAIM 11
. The method of claim 10 wherein applying different weights to sub-word confidence scores further comprises selecting the parameters for a class based on discriminative training (speech accelerator, speech recognition method) .

US6539353B1
CLAIM 16
. A computer-readable medium having computer executable instructions for performing steps comprising : receiving a digital signal representative of an input speech and noise signal (audio signal) ;
extracting features from the digital signal ;
identifying a recognition score from the features for a hypothesis word formed of sub-words ;
and determining a confidence score for the hypothesis word based on weighted sub-word confidence scores determined from the features , the weighting of sub-word confidence scores applying different weights to sub-word confidence scores associated with different classes of sub-words for the sub-words of the word .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6542866B1

Filed: 1999-09-22     Issued: 2003-04-01

Speech recognition method and apparatus utilizing multiple feature streams

(Original Assignee) Microsoft Corp     (Current Assignee) Microsoft Technology Licensing LLC

Li Jiang, Xuedong Huang
US7979277B2
CLAIM 8
. The speech recognition circuit of claim 1 , wherein the processor is configured to divert to another task if the data flow (one path) stalls .
US6542866B1
CLAIM 3
. The speech recognition system of claim 2 wherein one path (data flow) score is generated by selecting one segment score from a group and a second path score is generated by selecting a second segment score from the same group .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator (second language, first language) has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
US6542866B1
CLAIM 33
. The apparatus of claim 28 wherein the first speech recognition system comprises a common feature extractor , a common acoustic model and a first language (speech accelerator) model and wherein the second speech recognition system comprises the common feature extractor , the common acoustic model and a second language (speech accelerator) model that is different from the first language model .

US7979277B2
CLAIM 10
. The speech recognition circuit of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory (speech signal) .
US6542866B1
CLAIM 27
. A computer-readable medium having computer-executable instructions for performing steps comprising : receiving a digital signal representative of an input speech signal (result memory) ;
extracting at least two feature vectors for a frame of the digital signal ;
and determining a path score that is indicative of the probability that a word is represented by the digital signal through steps comprising : using different feature vectors of a frame to determine a group of segment scores that each represent a separate probability of a same segment unit appearing within a segment ;
selecting one of the segment scores from the group as a chosen segment score ;
and combining chosen segment scores from multiple segments to produce a path score for a word .

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end is configured to input a digital audio (digital value) signal .
US6542866B1
CLAIM 1
. A speech recognition system for identifying words from a series of digital value (digital audio) s representing speech , the system comprising : a first feature extractor for generating a first feature vector for a segment using a first type of feature of a portion of the series of digital values ;
a second feature extractor for generating a second feature vector for the same segment as the first feature extractor using a second type of feature of a portion of the series of digital values ;
a decoder capable of generating a path score that is indicative of the probability that a sequence of words is represented by the series of digital values , the path score being based in part on a single chosen segment score selected from a group of at least two segment scores wherein each segment score in the group represents a separate probability of a same segment unit appearing within a segment but wherein each segment score in the group is based on a different feature vector formed using a different type of feature for the same segment .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow (one path) .
US6542866B1
CLAIM 3
. The speech recognition system of claim 2 wherein one path (data flow) score is generated by selecting one segment score from a group and a second path score is generated by selecting a second segment score from the same group .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6778557B1

Filed: 1999-09-22     Issued: 2004-08-17

Point-to-multipoint communication system

(Original Assignee) Toshiba Corp     (Current Assignee) Toshiba Corp

Yoshinori Yuki, Masatoshi Nakao, Hiroyuki Ibe, Yoshio Hatate
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (first transmit) ;

a calculating circuit (data storage means) for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US6778557B1
CLAIM 22
. The point-to-multipoint communication system according to claim 21 , characterized in that the slave units comprise data storage means (calculating circuit) for temporarily storing data by service class , and storage amount counting means for counting , separately for each service class , the storage amount of data stored by the data storage means ;
and the information amount reporting means operates such that the storage amounts counted by the storage amount counting means separately for each service class are reported to the master unit in accordance with instructions from the master unit .

US6778557B1
CLAIM 32
. The point-to-multipoint communication system according to claim 30 , characterized in that the signal transmission means first transmit (time frame) s signals of the high priority service classes when instructions to transmit signals are received from the master unit .

US7979277B2
CLAIM 3
. A speech recognition circuit as claimed in claim 1 , comprising dynamic scheduling whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results (measurement result) and/or availability of space for storing more feature vectors and/or distance results .
US6778557B1
CLAIM 33
. A point-to-multipoint communication system , in which a master unit and a plurality of slave units are connected by means of a transmission line , and the slave units transmit information signals by authorization from the master unit , wherein the point-to-multipoint communication system is characterized in that the slave units comprise measurement means for measuring the information amount to be transmitted , and first notification means for notifying the master unit of the measurement result (distance results, distance calculation) s obtained by the measurement means ;
the master unit comprises calculation means for calculating a transmission-enabling information amount that allows each slave unit to transmit signals at no more than a specific maximum value by determining a maximum overall transmittable information amount of each currently active slave unit by allocating among the currently active slave units the maximum transmittable information amount that can be transmitted from the slave units to the master unit , calculating as the transmission-enabling information amount the information amount reported by the slave units if the information amount thus reported does not reach the maximum transmittable information amount , and calculating as the transmission-enabling information amount the maximum transmittable information amount if the information amount reported by the slave units exceeds the maximum transmittable information amount , and second notification means for assigning the transmission line as time slot units to the slave units on the basis of the transmission-enabling information amount calculated by the calculation means , and notifying the slave units about the results of this assigning ;
and the slave units that have received the notifications from the second notification means transmit data signals in the time slot units thus assigned .

US7979277B2
CLAIM 5
. A speech recognition circuit as claimed in claim 1 , wherein the said calculating circuit (data storage means) is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
US6778557B1
CLAIM 22
. The point-to-multipoint communication system according to claim 21 , characterized in that the slave units comprise data storage means (calculating circuit) for temporarily storing data by service class , and storage amount counting means for counting , separately for each service class , the storage amount of data stored by the data storage means ;
and the information amount reporting means operates such that the storage amounts counted by the storage amount counting means separately for each service class are reported to the master unit in accordance with instructions from the master unit .

US7979277B2
CLAIM 6
. The speech recognition circuit of claim 1 , comprising control means (comprises information) adapted to implement frame dropping , to discard one or more audio time frames .
US6778557B1
CLAIM 11
. The point-to-multipoint communication system according to claim 1 , characterized in that the master unit further comprises information (control means) amount report instruction means for instructing the slave units to issue reports regarding the information amount necessary to transmit the signals ;
and the information amount reporting means presents the master unit with reports regarding the information amount necessary to transmit the signals in accordance with the instructions from the information amount report instruction means .

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal for a predetermined time frame (first transmit) .
US6778557B1
CLAIM 32
. The point-to-multipoint communication system according to claim 30 , characterized in that the signal transmission means first transmit (time frame) s signals of the high priority service classes when instructions to transmit signals are received from the master unit .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator has an interrupt signal (second notification) to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
US6778557B1
CLAIM 33
. A point-to-multipoint communication system , in which a master unit and a plurality of slave units are connected by means of a transmission line , and the slave units transmit information signals by authorization from the master unit , wherein the point-to-multipoint communication system is characterized in that the slave units comprise measurement means for measuring the information amount to be transmitted , and first notification means for notifying the master unit of the measurement results obtained by the measurement means ;
the master unit comprises calculation means for calculating a transmission-enabling information amount that allows each slave unit to transmit signals at no more than a specific maximum value by determining a maximum overall transmittable information amount of each currently active slave unit by allocating among the currently active slave units the maximum transmittable information amount that can be transmitted from the slave units to the master unit , calculating as the transmission-enabling information amount the information amount reported by the slave units if the information amount thus reported does not reach the maximum transmittable information amount , and calculating as the transmission-enabling information amount the maximum transmittable information amount if the information amount reported by the slave units exceeds the maximum transmittable information amount , and second notification (interrupt signal) means for assigning the transmission line as time slot units to the slave units on the basis of the transmission-enabling information amount calculated by the calculation means , and notifying the slave units about the results of this assigning ;
and the slave units that have received the notifications from the second notification means transmit data signals in the time slot units thus assigned .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (first transmit) ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US6778557B1
CLAIM 32
. The point-to-multipoint communication system according to claim 30 , characterized in that the signal transmission means first transmit (time frame) s signals of the high priority service classes when instructions to transmit signals are received from the master unit .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (first transmit) ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit (data storage means) ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US6778557B1
CLAIM 22
. The point-to-multipoint communication system according to claim 21 , characterized in that the slave units comprise data storage means (calculating circuit) for temporarily storing data by service class , and storage amount counting means for counting , separately for each service class , the storage amount of data stored by the data storage means ;
and the information amount reporting means operates such that the storage amounts counted by the storage amount counting means separately for each service class are reported to the master unit in accordance with instructions from the master unit .

US6778557B1
CLAIM 32
. The point-to-multipoint communication system according to claim 30 , characterized in that the signal transmission means first transmit (time frame) s signals of the high priority service classes when instructions to transmit signals are received from the master unit .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (first transmit) ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation (measurement result) , and to the word identification .
US6778557B1
CLAIM 32
. The point-to-multipoint communication system according to claim 30 , characterized in that the signal transmission means first transmit (time frame) s signals of the high priority service classes when instructions to transmit signals are received from the master unit .

US6778557B1
CLAIM 33
. A point-to-multipoint communication system , in which a master unit and a plurality of slave units are connected by means of a transmission line , and the slave units transmit information signals by authorization from the master unit , wherein the point-to-multipoint communication system is characterized in that the slave units comprise measurement means for measuring the information amount to be transmitted , and first notification means for notifying the master unit of the measurement result (distance results, distance calculation) s obtained by the measurement means ;
the master unit comprises calculation means for calculating a transmission-enabling information amount that allows each slave unit to transmit signals at no more than a specific maximum value by determining a maximum overall transmittable information amount of each currently active slave unit by allocating among the currently active slave units the maximum transmittable information amount that can be transmitted from the slave units to the master unit , calculating as the transmission-enabling information amount the information amount reported by the slave units if the information amount thus reported does not reach the maximum transmittable information amount , and calculating as the transmission-enabling information amount the maximum transmittable information amount if the information amount reported by the slave units exceeds the maximum transmittable information amount , and second notification means for assigning the transmission line as time slot units to the slave units on the basis of the transmission-enabling information amount calculated by the calculation means , and notifying the slave units about the results of this assigning ;
and the slave units that have received the notifications from the second notification means transmit data signals in the time slot units thus assigned .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6446039B1

Filed: 1999-08-23     Issued: 2002-09-03

Speech recognition method, speech recognition device, and recording medium on which is recorded a speech recognition processing program

(Original Assignee) Seiko Epson Corp     (Current Assignee) Seiko Epson Corp

Yasunaga Miyazawa, Mitsuhiro Inazumi, Hiroshi Hasegawa, Masahisa Ikejiri
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (when r) ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US6446039B1
CLAIM 5
. The speech recognition method as set forth in claim 1 , further comprising : performing speaker learning processing using at least one of the registration word data , the standard speaker sound model data , or the specific speaker group sound model data , and when r (time frame) ecognizable words other than the registration words are recognized , recognizing sound by using the post-speaker learning data for speaker adaptation .

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal for a predetermined time frame (when r) .
US6446039B1
CLAIM 5
. The speech recognition method as set forth in claim 1 , further comprising : performing speaker learning processing using at least one of the registration word data , the standard speaker sound model data , or the specific speaker group sound model data , and when r (time frame) ecognizable words other than the registration words are recognized , recognizing sound by using the post-speaker learning data for speaker adaptation .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (when r) ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US6446039B1
CLAIM 5
. The speech recognition method as set forth in claim 1 , further comprising : performing speaker learning processing using at least one of the registration word data , the standard speaker sound model data , or the specific speaker group sound model data , and when r (time frame) ecognizable words other than the registration words are recognized , recognizing sound by using the post-speaker learning data for speaker adaptation .

US7979277B2
CLAIM 15
. A speech recognition method (speech recognition method) , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (when r) ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US6446039B1
CLAIM 1
. A speech recognition method (speech recognition method) , comprising : creating standard speaker sound model data from a plurality of non-specific speaker sound data and recognizing a predetermined plurality of words ;
selecting several words as registration words among a plurality of recognizable words , a recognition target speaker speaking the respective registration words ;
creating and storing registration word data for the respective registration words from the sound data ;
and recognizing registration words spoken by the recognition target speaker using the registration word data , and recognizing words other than the registration words using the standard speaker sound model data .

US6446039B1
CLAIM 5
. The speech recognition method as set forth in claim 1 , further comprising : performing speaker learning processing using at least one of the registration word data , the standard speaker sound model data , or the specific speaker group sound model data , and when r (time frame) ecognizable words other than the registration words are recognized , recognizing sound by using the post-speaker learning data for speaker adaptation .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method (speech recognition method) , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (when r) ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
US6446039B1
CLAIM 1
. A speech recognition method (speech recognition method) , comprising : creating standard speaker sound model data from a plurality of non-specific speaker sound data and recognizing a predetermined plurality of words ;
selecting several words as registration words among a plurality of recognizable words , a recognition target speaker speaking the respective registration words ;
creating and storing registration word data for the respective registration words from the sound data ;
and recognizing registration words spoken by the recognition target speaker using the registration word data , and recognizing words other than the registration words using the standard speaker sound model data .

US6446039B1
CLAIM 5
. The speech recognition method as set forth in claim 1 , further comprising : performing speaker learning processing using at least one of the registration word data , the standard speaker sound model data , or the specific speaker group sound model data , and when r (time frame) ecognizable words other than the registration words are recognized , recognizing sound by using the post-speaker learning data for speaker adaptation .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6366578B1

Filed: 1999-08-04     Issued: 2002-04-02

Systems and methods for multiple mode voice and data communications using intelligently bridged TDM and packet buses and methods for implementing language capabilities using the same

(Original Assignee) Vertical Networks Inc     (Current Assignee) RPX Corp ; Vertical Networks Inc

Christopher Sean Johnson
US7979277B2
CLAIM 3
. A speech recognition circuit as claimed in claim 1 , comprising dynamic scheduling (allocation rule) whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
US6366578B1
CLAIM 15
. The method of claim 1 , wherein the system further includes a resource allocation program operable by the processor , wherein the resource allocation program assigns resources within the system according to predetermined allocation rule (comprising dynamic scheduling) s .

US7979277B2
CLAIM 10
. The speech recognition circuit of claim 1 , wherein the accelerator signals (particular user) to the search stage when the distances for a new frame are available in a result memory .
US6366578B1
CLAIM 4
. The method of claim 3 , further comprising the step of determining which of the particular languages are to be used for a particular user (accelerator signals) .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
JP2000322078A

Filed: 1999-05-14     Issued: 2000-11-24

車載型音声認識装置

(Original Assignee) Sumitomo Electric Ind Ltd; 住友電気工業株式会社     

Osamu Hattori, 理 服部, Kazuya Morita, 一哉 森田
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor (メモリ) , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
JP2000322078A
CLAIM 2
【請求項2】前記「操作開始に対応する特定の言葉」の 音声波形パターンを、予めメモリ (first processor) に登録することができ る請求項1記載の車載型音声認識装置。

US7979277B2
CLAIM 2
. A speech recognition circuit as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor (メモリ) .
JP2000322078A
CLAIM 2
【請求項2】前記「操作開始に対応する特定の言葉」の 音声波形パターンを、予めメモリ (first processor) に登録することができ る請求項1記載の車載型音声認識装置。

US7979277B2
CLAIM 3
. A speech recognition circuit as claimed in claim 1 , comprising dynamic scheduling whether the first processor (メモリ) should run the front end or search stage code , based on availability or unavailability of distance results (判定結果) and/or availability of space for storing more feature vectors and/or distance results .
JP2000322078A
CLAIM 1
【請求項1】ユーザの操作音声を認識する音声認識部 と、この音声認識手段により認識された操作音声の内容 に基づいて操作の対象となる機器の実行処理を行う実行 処理部とを備える車載型音声認識装置において、 ユーザの操作開始に対応する特定の言葉のみを認識する ことができる音声操作開始判定手段と、 前記音声操作開始判定手段の判定結果 (distance results, result memory) に基づいて、前記 音声認識部を機能状態にする制御手段とを有することを 特徴とする車載型音声認識装置。

JP2000322078A
CLAIM 2
【請求項2】前記「操作開始に対応する特定の言葉」の 音声波形パターンを、予めメモリ (first processor) に登録することができ る請求項1記載の車載型音声認識装置。

US7979277B2
CLAIM 4
. A speech recognition circuit as claimed in claim 1 , wherein the first processor (メモリ) supports multi-threaded operation , and runs the search stage and front ends as separate threads .
JP2000322078A
CLAIM 2
【請求項2】前記「操作開始に対応する特定の言葉」の 音声波形パターンを、予めメモリ (first processor) に登録することができ る請求項1記載の車載型音声認識装置。

US7979277B2
CLAIM 10
. The speech recognition circuit of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory (判定結果) .
JP2000322078A
CLAIM 1
【請求項1】ユーザの操作音声を認識する音声認識部 と、この音声認識手段により認識された操作音声の内容 に基づいて操作の対象となる機器の実行処理を行う実行 処理部とを備える車載型音声認識装置において、 ユーザの操作開始に対応する特定の言葉のみを認識する ことができる音声操作開始判定手段と、 前記音声操作開始判定手段の判定結果 (distance results, result memory) に基づいて、前記 音声認識部を機能状態にする制御手段とを有することを 特徴とする車載型音声認識装置。




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
JP2000310999A

Filed: 1999-04-26     Issued: 2000-11-07

設備制御システム

(Original Assignee) Asahi Chem Ind Co Ltd; 旭化成工業株式会社     

Hideyuki Yamagishi, 秀之 山岸, Makoto Shosakai, 誠 庄境
US7979277B2
CLAIM 3
. A speech recognition circuit as claimed in claim 1 , comprising dynamic scheduling (処理回路と) whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
JP2000310999A
CLAIM 15
【請求項15】 入力された音声信号を音声入力手段に より音声認識し、音声認識結果に対応する内容の動作を 制御対象機器に実行させる設備制御システムにおいて、 話者の音声の歪みまたは音響の歪みの特徴を記憶してお くメモリと、 音声認識時には前記メモリに記憶してある該当の歪みの 特徴を使用して、入力の音声信号の歪みを補正する信号 処理回路と (dynamic scheduling) 、 当該歪みの補正が行われた音声信号に対して音声認識を 行う音声認識手段とを具えたことを特徴とする設備制御 システム。

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature (基づき) vector from the front end .
JP2000310999A
CLAIM 2
【請求項2】 入力された音声信号を音声認識手段によ り音声認識し、音声認識結果に対応する内容の動作を制 御対象機器に実行させる設備制御システムにおいて、 音声を入力し、音声信号を出力する複数の音声入力手段 と、 前記複数の音声入力手段の中から出力される音声信号に 基づき (next feature) 音声の質が最も良好な音声入力手段を検知し、当 該検知した音声入力手段と前記音声認識手段とを接続す る調停手段とを具えたことを特徴とする設備制御システ ム。




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
GB2336974A

Filed: 1999-04-21     Issued: 1999-11-03

Singlecast interactive radio system

(Original Assignee) International Business Machines Corp     (Current Assignee) International Business Machines Corp

Leon Lumelsky
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit (generating means) for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor (data repository) , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
GB2336974A
CLAIM 1
A singlecast interactive radio system , comprising an authoring system for generating a phonetically encoded audio information signal in response to data received from an information content provider ;
a data repository (first processor) , operatively coupled to the authoring system , for storing the phonetically encoded audio information signal ;
a personal radio system server , operatively coupled to the data repository via a wired communications network , for retrieving at least a portion of the phonetically encoded audio information signal from the data repository upon a request by a system user ;
and at least one user terminal , operatively coupled to the server via a wireless communications network , the user terminal being adapted to generate and transmit the request made by the user to deliver the at least a portion of the phonetically encoded audio information signal thereto and to receive and decode the at least a portion of the encoded audio information signal to synthesize an audio signal representative of at least a portion of the data received from the information content provider for playback to the user , the user terminal further including a plurality of prerecorded phonetic unit dictionaries selectively used to provide the audio signal with one of a plurality of narrative voices during playback to the user .

GB2336974A
CLAIM 10
. An information signal delivery system , comprising is information signal generating means (calculating circuit) for generating a phonetically encoded audio information signal in response to data received from an information content provider ;
storage means , operatively coupled to the information signal generating means , for storing the phonetically encoded audio information signal ;
information signal retrieval means , operatively coupled to the storage means via a wired communications network , for retrieving at least a portion of the phonetically encoded audio information signal from the storage means upon a request by a system user ;
and information request and delivery means , operatively coupled to the information signal retrieval means via a wireless communications network , the information request and delivery means including means for generating and transmitting the request made by the user to deliver the at least a portion of the phonetically encoded audio information signal thereto and means for receiving and decoding the at least a portion of the encoded audio information signal to generate an audio signal representative of at least a portion of the data received from the information content provider for playback to the user , the information request and delivery means including means for selecting one of a plurality of narrative voices to be associated with the audio signal heard by the user .

US7979277B2
CLAIM 2
. A speech recognition circuit as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor (data repository) .
GB2336974A
CLAIM 1
A singlecast interactive radio system , comprising an authoring system for generating a phonetically encoded audio information signal in response to data received from an information content provider ;
a data repository (first processor) , operatively coupled to the authoring system , for storing the phonetically encoded audio information signal ;
a personal radio system server , operatively coupled to the data repository via a wired communications network , for retrieving at least a portion of the phonetically encoded audio information signal from the data repository upon a request by a system user ;
and at least one user terminal , operatively coupled to the server via a wireless communications network , the user terminal being adapted to generate and transmit the request made by the user to deliver the at least a portion of the phonetically encoded audio information signal thereto and to receive and decode the at least a portion of the encoded audio information signal to synthesize an audio signal representative of at least a portion of the data received from the information content provider for playback to the user , the user terminal further including a plurality of prerecorded phonetic unit dictionaries selectively used to provide the audio signal with one of a plurality of narrative voices during playback to the user .

US7979277B2
CLAIM 3
. A speech recognition circuit as claimed in claim 1 , comprising dynamic scheduling whether the first processor (data repository) should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
GB2336974A
CLAIM 1
A singlecast interactive radio system , comprising an authoring system for generating a phonetically encoded audio information signal in response to data received from an information content provider ;
a data repository (first processor) , operatively coupled to the authoring system , for storing the phonetically encoded audio information signal ;
a personal radio system server , operatively coupled to the data repository via a wired communications network , for retrieving at least a portion of the phonetically encoded audio information signal from the data repository upon a request by a system user ;
and at least one user terminal , operatively coupled to the server via a wireless communications network , the user terminal being adapted to generate and transmit the request made by the user to deliver the at least a portion of the phonetically encoded audio information signal thereto and to receive and decode the at least a portion of the encoded audio information signal to synthesize an audio signal representative of at least a portion of the data received from the information content provider for playback to the user , the user terminal further including a plurality of prerecorded phonetic unit dictionaries selectively used to provide the audio signal with one of a plurality of narrative voices during playback to the user .

US7979277B2
CLAIM 4
. A speech recognition circuit as claimed in claim 1 , wherein the first processor (data repository) supports multi-threaded operation , and runs the search stage and front ends as separate threads .
GB2336974A
CLAIM 1
A singlecast interactive radio system , comprising an authoring system for generating a phonetically encoded audio information signal in response to data received from an information content provider ;
a data repository (first processor) , operatively coupled to the authoring system , for storing the phonetically encoded audio information signal ;
a personal radio system server , operatively coupled to the data repository via a wired communications network , for retrieving at least a portion of the phonetically encoded audio information signal from the data repository upon a request by a system user ;
and at least one user terminal , operatively coupled to the server via a wireless communications network , the user terminal being adapted to generate and transmit the request made by the user to deliver the at least a portion of the phonetically encoded audio information signal thereto and to receive and decode the at least a portion of the encoded audio information signal to synthesize an audio signal representative of at least a portion of the data received from the information content provider for playback to the user , the user terminal further including a plurality of prerecorded phonetic unit dictionaries selectively used to provide the audio signal with one of a plurality of narrative voices during playback to the user .

US7979277B2
CLAIM 5
. A speech recognition circuit as claimed in claim 1 , wherein the said calculating circuit (generating means) is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
GB2336974A
CLAIM 10
. An information signal delivery system , comprising is information signal generating means (calculating circuit) for generating a phonetically encoded audio information signal in response to data received from an information content provider ;
storage means , operatively coupled to the information signal generating means , for storing the phonetically encoded audio information signal ;
information signal retrieval means , operatively coupled to the storage means via a wired communications network , for retrieving at least a portion of the phonetically encoded audio information signal from the storage means upon a request by a system user ;
and information request and delivery means , operatively coupled to the information signal retrieval means via a wireless communications network , the information request and delivery means including means for generating and transmitting the request made by the user to deliver the at least a portion of the phonetically encoded audio information signal thereto and means for receiving and decoding the at least a portion of the encoded audio information signal to generate an audio signal representative of at least a portion of the data received from the information content provider for playback to the user , the information request and delivery means including means for selecting one of a plurality of narrative voices to be associated with the audio signal heard by the user .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator (speech recognition means) has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
GB2336974A
CLAIM 27
. In a wireless information signal delivery system including at least one information signal content provider and at least one user terminal , the user terminal comprising control processing means ;
memory means operatively coupled to the processing means ;
wireless modem means , operatively coupled to the memory means , for demodulating a received phonetically encoded information signal representative of audio-based data provided by the at least one information signal content provider and for modulating a user-initiated signal ;
speech recognition means (speech accelerator) , operatively coupled to the control processing means , having speech input means for receiving a spoken utterance provided by a user , the spoken utterance representing at least a request for certain audio-based data provided by the at least one information signal content provider , the speech recognition means decoding the spoken utterance and presenting the decoded utterance to the control processing means , the control processing means generating a userinitiated signal in response thereto ;
speech synthesis means , operatively coupled to the memory means , for generating a synthesized speech signal in response to the demodulated phonetically encoded information signal , the speech synthesis means including a plurality of prerecorded phonetic unit dictionaries selectively used to provide the synthesized speech signal with one of a plurality of narrative voices during playback to the user ;
and audio playback means , operatively coupled to the speech synthesis means , for generating an audio signal in response to the synthesized speech signal for playback to a user .

US7979277B2
CLAIM 10
. The speech recognition circuit of claim 1 , wherein the accelerator signals (control signal) to the search stage when the distances for a new frame are available in a result memory .
GB2336974A
CLAIM 26
. The user terminal Claim 19 , wherein the spoken utterance represents audio playback commands and further wherein the speech recognizer decodes the spoken utterance and presents the decoded utterance to the control processor , the control processor generating a control signal (accelerator signals) in response Y09 -98- 017 37 thereto and providing the control signal to the audio playback means to control the audio playback .

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end is configured to input a digital audio (voice signals) signal .
GB2336974A
CLAIM 9
. The system of Claim 8 , wherein the telephony-related signals includes one of voice signals (digital audio, digital audio signal) and data signals .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit (generating means) ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
GB2336974A
CLAIM 10
. An information signal delivery system , comprising is information signal generating means (calculating circuit) for generating a phonetically encoded audio information signal in response to data received from an information content provider ;
storage means , operatively coupled to the information signal generating means , for storing the phonetically encoded audio information signal ;
information signal retrieval means , operatively coupled to the storage means via a wired communications network , for retrieving at least a portion of the phonetically encoded audio information signal from the storage means upon a request by a system user ;
and information request and delivery means , operatively coupled to the information signal retrieval means via a wireless communications network , the information request and delivery means including means for generating and transmitting the request made by the user to deliver the at least a portion of the phonetically encoded audio information signal thereto and means for receiving and decoding the at least a portion of the encoded audio information signal to generate an audio signal representative of at least a portion of the data received from the information content provider for playback to the user , the information request and delivery means including means for selecting one of a plurality of narrative voices to be associated with the audio signal heard by the user .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6718015B1

Filed: 1998-12-16     Issued: 2004-04-06

Remote web page reader

(Original Assignee) International Business Machines Corp     (Current Assignee) Google LLC

Viktors Berstis
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end (one link) for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor (personal communication) , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US6718015B1
CLAIM 4
. The method as described in claim 1 wherein the given text of the Web page includes at least one link (front end) .

US6718015B1
CLAIM 6
. The method as described in claim 1 wherein the connection is selected from a group consisting essentially of a landline , a wireline , a satellite line , and a personal communication (first processor) s line .

US7979277B2
CLAIM 2
. A speech recognition circuit as claimed in claim 1 , wherein the pipelining comprises alternating of front end (one link) and search stage processing on the first processor (personal communication) .
US6718015B1
CLAIM 4
. The method as described in claim 1 wherein the given text of the Web page includes at least one link (front end) .

US6718015B1
CLAIM 6
. The method as described in claim 1 wherein the connection is selected from a group consisting essentially of a landline , a wireline , a satellite line , and a personal communication (first processor) s line .

US7979277B2
CLAIM 3
. A speech recognition circuit as claimed in claim 1 , comprising dynamic scheduling whether the first processor (personal communication) should run the front end (one link) or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
US6718015B1
CLAIM 4
. The method as described in claim 1 wherein the given text of the Web page includes at least one link (front end) .

US6718015B1
CLAIM 6
. The method as described in claim 1 wherein the connection is selected from a group consisting essentially of a landline , a wireline , a satellite line , and a personal communication (first processor) s line .

US7979277B2
CLAIM 4
. A speech recognition circuit as claimed in claim 1 , wherein the first processor (personal communication) supports multi-threaded operation , and runs the search stage and front ends as separate threads .
US6718015B1
CLAIM 6
. The method as described in claim 1 wherein the connection is selected from a group consisting essentially of a landline , a wireline , a satellite line , and a personal communication (first processor) s line .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end (one link) that the accelerator is ready to receive a next feature vector from the front end .
US6718015B1
CLAIM 4
. The method as described in claim 1 wherein the given text of the Web page includes at least one link (front end) .

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end (one link) is configured to input a digital audio signal .
US6718015B1
CLAIM 4
. The method as described in claim 1 wherein the given text of the Web page includes at least one link (front end) .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end (one link) for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US6718015B1
CLAIM 4
. The method as described in claim 1 wherein the given text of the Web page includes at least one link (front end) .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal using an audio front end (one link) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US6718015B1
CLAIM 4
. The method as described in claim 1 wherein the given text of the Web page includes at least one link (front end) .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US20020006126A1

Filed: 1998-10-02     Issued: 2002-01-17

Methods and systems for accessing information from an information source

(Original Assignee) Motorola Inc     (Current Assignee) Motorola Solutions Inc

Gregory Johnson, David Ladd
US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator (voice recognition) has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
US20020006126A1
CLAIM 11
. The system of claim 7 , wherein the audio processing unit includes one of a voice recognition (speech accelerator) client and a voice recognition server .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation (incoming call) , to the distance calculation , and to the word identification .
US20020006126A1
CLAIM 1
. A method to provide a user with information from an information source comprising the steps of : receiving an incoming call (feature calculation) from a user ;
connecting the incoming call to an audio processing unit ;
providing an audio message to the user ;
receiving an audio input from the user associated with a destination of the information source ;
processing the audio input received from the user ;
establishing a connection to the destination of the information source ;
retrieving information from the destination of the information source ;
processing the information ;
generating an output based upon at least a portion of the information ;
and providing an audio communication to the user based upon the output .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6138095A

Filed: 1998-09-03     Issued: 2000-10-24

Speech recognition

(Original Assignee) Lucent Technologies Inc     (Current Assignee) Nokia of America Corp

Sunil K. Gupta, Frank Kao-Ping Soong
US7979277B2
CLAIM 1
. A speech recognition circuit (recognizing speech) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances (feature vectors) indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US6138095A
CLAIM 1
. A method of recognizing speech (speech recognition circuit, word identification) encoded as electrical signals , comprising : a) processing input speech signals to extract one or more feature vectors (calculating distances) ;
b) comparing the extracted feature vectors with stored speech models and a task grammar to derive a null hypothesis factor indicating the probability that the input speech is correctly recognized ;
c) comparing the extracted feature vectors with a general speech model and a garbage loop grammar to derive an alternate hypothesis factor indicating the probability that the extracted feature vectors correspond to the characteristics of general speech ;
d) normalizing the log difference of the null and alternate hypothesis probability factors by the magnitude of the log likelihood of the null hypothesis factor ;
and e) rejecting the speech where the difference of the normalized probability factors is less than a rejection factor derived from the utterance length .

US7979277B2
CLAIM 2
. A speech recognition circuit (recognizing speech) as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor .
US6138095A
CLAIM 1
. A method of recognizing speech (speech recognition circuit, word identification) encoded as electrical signals , comprising : a) processing input speech signals to extract one or more feature vectors ;
b) comparing the extracted feature vectors with stored speech models and a task grammar to derive a null hypothesis factor indicating the probability that the input speech is correctly recognized ;
c) comparing the extracted feature vectors with a general speech model and a garbage loop grammar to derive an alternate hypothesis factor indicating the probability that the extracted feature vectors correspond to the characteristics of general speech ;
d) normalizing the log difference of the null and alternate hypothesis probability factors by the magnitude of the log likelihood of the null hypothesis factor ;
and e) rejecting the speech where the difference of the normalized probability factors is less than a rejection factor derived from the utterance length .

US7979277B2
CLAIM 3
. A speech recognition circuit (recognizing speech) as claimed in claim 1 , comprising dynamic scheduling whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
US6138095A
CLAIM 1
. A method of recognizing speech (speech recognition circuit, word identification) encoded as electrical signals , comprising : a) processing input speech signals to extract one or more feature vectors ;
b) comparing the extracted feature vectors with stored speech models and a task grammar to derive a null hypothesis factor indicating the probability that the input speech is correctly recognized ;
c) comparing the extracted feature vectors with a general speech model and a garbage loop grammar to derive an alternate hypothesis factor indicating the probability that the extracted feature vectors correspond to the characteristics of general speech ;
d) normalizing the log difference of the null and alternate hypothesis probability factors by the magnitude of the log likelihood of the null hypothesis factor ;
and e) rejecting the speech where the difference of the normalized probability factors is less than a rejection factor derived from the utterance length .

US7979277B2
CLAIM 4
. A speech recognition circuit (recognizing speech) as claimed in claim 1 , wherein the first processor supports multi-threaded operation , and runs the search stage and front ends as separate threads .
US6138095A
CLAIM 1
. A method of recognizing speech (speech recognition circuit, word identification) encoded as electrical signals , comprising : a) processing input speech signals to extract one or more feature vectors ;
b) comparing the extracted feature vectors with stored speech models and a task grammar to derive a null hypothesis factor indicating the probability that the input speech is correctly recognized ;
c) comparing the extracted feature vectors with a general speech model and a garbage loop grammar to derive an alternate hypothesis factor indicating the probability that the extracted feature vectors correspond to the characteristics of general speech ;
d) normalizing the log difference of the null and alternate hypothesis probability factors by the magnitude of the log likelihood of the null hypothesis factor ;
and e) rejecting the speech where the difference of the normalized probability factors is less than a rejection factor derived from the utterance length .

US7979277B2
CLAIM 5
. A speech recognition circuit (recognizing speech) as claimed in claim 1 , wherein the said calculating circuit is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
US6138095A
CLAIM 1
. A method of recognizing speech (speech recognition circuit, word identification) encoded as electrical signals , comprising : a) processing input speech signals to extract one or more feature vectors ;
b) comparing the extracted feature vectors with stored speech models and a task grammar to derive a null hypothesis factor indicating the probability that the input speech is correctly recognized ;
c) comparing the extracted feature vectors with a general speech model and a garbage loop grammar to derive an alternate hypothesis factor indicating the probability that the extracted feature vectors correspond to the characteristics of general speech ;
d) normalizing the log difference of the null and alternate hypothesis probability factors by the magnitude of the log likelihood of the null hypothesis factor ;
and e) rejecting the speech where the difference of the normalized probability factors is less than a rejection factor derived from the utterance length .

US7979277B2
CLAIM 6
. The speech recognition circuit (recognizing speech) of claim 1 , comprising control means (said input) adapted to implement frame dropping , to discard one or more audio time frames .
US6138095A
CLAIM 1
. A method of recognizing speech (speech recognition circuit, word identification) encoded as electrical signals , comprising : a) processing input speech signals to extract one or more feature vectors ;
b) comparing the extracted feature vectors with stored speech models and a task grammar to derive a null hypothesis factor indicating the probability that the input speech is correctly recognized ;
c) comparing the extracted feature vectors with a general speech model and a garbage loop grammar to derive an alternate hypothesis factor indicating the probability that the extracted feature vectors correspond to the characteristics of general speech ;
d) normalizing the log difference of the null and alternate hypothesis probability factors by the magnitude of the log likelihood of the null hypothesis factor ;
and e) rejecting the speech where the difference of the normalized probability factors is less than a rejection factor derived from the utterance length .

US6138095A
CLAIM 6
. A method for improving the accuracy of a speech recognition system utilizing a plurality of recognition models to identify whether input speech corresponds to phonemes within a task grammar or to general speech characteristics , comprising the steps of : generating a signal representing the log difference in likelihoods that the input signal corresponds to one of said phonemes or to said general speech characteristics ;
normalizing said difference signal according to the magnitude of the log likelihood of said general speech characteristics ;
modeling a rejection threshold as a polynomial in utterance length ;
and rejecting said input (control means) speech when said difference signal does not exceed said utterance length dependant rejection threshold .

US7979277B2
CLAIM 7
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal for a predetermined time frame .
US6138095A
CLAIM 1
. A method of recognizing speech (speech recognition circuit, word identification) encoded as electrical signals , comprising : a) processing input speech signals to extract one or more feature vectors ;
b) comparing the extracted feature vectors with stored speech models and a task grammar to derive a null hypothesis factor indicating the probability that the input speech is correctly recognized ;
c) comparing the extracted feature vectors with a general speech model and a garbage loop grammar to derive an alternate hypothesis factor indicating the probability that the extracted feature vectors correspond to the characteristics of general speech ;
d) normalizing the log difference of the null and alternate hypothesis probability factors by the magnitude of the log likelihood of the null hypothesis factor ;
and e) rejecting the speech where the difference of the normalized probability factors is less than a rejection factor derived from the utterance length .

US7979277B2
CLAIM 8
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the processor is configured to divert to another task if the data flow stalls .
US6138095A
CLAIM 1
. A method of recognizing speech (speech recognition circuit, word identification) encoded as electrical signals , comprising : a) processing input speech signals to extract one or more feature vectors ;
b) comparing the extracted feature vectors with stored speech models and a task grammar to derive a null hypothesis factor indicating the probability that the input speech is correctly recognized ;
c) comparing the extracted feature vectors with a general speech model and a garbage loop grammar to derive an alternate hypothesis factor indicating the probability that the extracted feature vectors correspond to the characteristics of general speech ;
d) normalizing the log difference of the null and alternate hypothesis probability factors by the magnitude of the log likelihood of the null hypothesis factor ;
and e) rejecting the speech where the difference of the normalized probability factors is less than a rejection factor derived from the utterance length .

US7979277B2
CLAIM 9
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the speech accelerator has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
US6138095A
CLAIM 1
. A method of recognizing speech (speech recognition circuit, word identification) encoded as electrical signals , comprising : a) processing input speech signals to extract one or more feature vectors ;
b) comparing the extracted feature vectors with stored speech models and a task grammar to derive a null hypothesis factor indicating the probability that the input speech is correctly recognized ;
c) comparing the extracted feature vectors with a general speech model and a garbage loop grammar to derive an alternate hypothesis factor indicating the probability that the extracted feature vectors correspond to the characteristics of general speech ;
d) normalizing the log difference of the null and alternate hypothesis probability factors by the magnitude of the log likelihood of the null hypothesis factor ;
and e) rejecting the speech where the difference of the normalized probability factors is less than a rejection factor derived from the utterance length .

US7979277B2
CLAIM 10
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory (speech signal) .
US6138095A
CLAIM 1
. A method of recognizing speech (speech recognition circuit, word identification) encoded as electrical signals , comprising : a) processing input speech signal (result memory) s to extract one or more feature vectors ;
b) comparing the extracted feature vectors with stored speech models and a task grammar to derive a null hypothesis factor indicating the probability that the input speech is correctly recognized ;
c) comparing the extracted feature vectors with a general speech model and a garbage loop grammar to derive an alternate hypothesis factor indicating the probability that the extracted feature vectors correspond to the characteristics of general speech ;
d) normalizing the log difference of the null and alternate hypothesis probability factors by the magnitude of the log likelihood of the null hypothesis factor ;
and e) rejecting the speech where the difference of the normalized probability factors is less than a rejection factor derived from the utterance length .

US7979277B2
CLAIM 11
. The speech recognition circuit (recognizing speech) of claim 1 , comprising increasing the pipeline depth by computing extra front frames in advance .
US6138095A
CLAIM 1
. A method of recognizing speech (speech recognition circuit, word identification) encoded as electrical signals , comprising : a) processing input speech signals to extract one or more feature vectors ;
b) comparing the extracted feature vectors with stored speech models and a task grammar to derive a null hypothesis factor indicating the probability that the input speech is correctly recognized ;
c) comparing the extracted feature vectors with a general speech model and a garbage loop grammar to derive an alternate hypothesis factor indicating the probability that the extracted feature vectors correspond to the characteristics of general speech ;
d) normalizing the log difference of the null and alternate hypothesis probability factors by the magnitude of the log likelihood of the null hypothesis factor ;
and e) rejecting the speech where the difference of the normalized probability factors is less than a rejection factor derived from the utterance length .

US7979277B2
CLAIM 12
. The speech recognition circuit (recognizing speech) of claim 1 , wherein the audio front end is configured to input a digital audio signal .
US6138095A
CLAIM 1
. A method of recognizing speech (speech recognition circuit, word identification) encoded as electrical signals , comprising : a) processing input speech signals to extract one or more feature vectors ;
b) comparing the extracted feature vectors with stored speech models and a task grammar to derive a null hypothesis factor indicating the probability that the input speech is correctly recognized ;
c) comparing the extracted feature vectors with a general speech model and a garbage loop grammar to derive an alternate hypothesis factor indicating the probability that the extracted feature vectors correspond to the characteristics of general speech ;
d) normalizing the log difference of the null and alternate hypothesis probability factors by the magnitude of the log likelihood of the null hypothesis factor ;
and e) rejecting the speech where the difference of the normalized probability factors is less than a rejection factor derived from the utterance length .

US7979277B2
CLAIM 13
. A speech recognition circuit (recognizing speech) of claim 1 , wherein said distance comprises a Mahalanobis distance .
US6138095A
CLAIM 1
. A method of recognizing speech (speech recognition circuit, word identification) encoded as electrical signals , comprising : a) processing input speech signals to extract one or more feature vectors ;
b) comparing the extracted feature vectors with stored speech models and a task grammar to derive a null hypothesis factor indicating the probability that the input speech is correctly recognized ;
c) comparing the extracted feature vectors with a general speech model and a garbage loop grammar to derive an alternate hypothesis factor indicating the probability that the extracted feature vectors correspond to the characteristics of general speech ;
d) normalizing the log difference of the null and alternate hypothesis probability factors by the magnitude of the log likelihood of the null hypothesis factor ;
and e) rejecting the speech where the difference of the normalized probability factors is less than a rejection factor derived from the utterance length .

US7979277B2
CLAIM 14
. A speech recognition circuit (recognizing speech) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US6138095A
CLAIM 1
. A method of recognizing speech (speech recognition circuit, word identification) encoded as electrical signals , comprising : a) processing input speech signals to extract one or more feature vectors ;
b) comparing the extracted feature vectors with stored speech models and a task grammar to derive a null hypothesis factor indicating the probability that the input speech is correctly recognized ;
c) comparing the extracted feature vectors with a general speech model and a garbage loop grammar to derive an alternate hypothesis factor indicating the probability that the extracted feature vectors correspond to the characteristics of general speech ;
d) normalizing the log difference of the null and alternate hypothesis probability factors by the magnitude of the log likelihood of the null hypothesis factor ;
and e) rejecting the speech where the difference of the normalized probability factors is less than a rejection factor derived from the utterance length .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification (recognizing speech) .
US6138095A
CLAIM 1
. A method of recognizing speech (speech recognition circuit, word identification) encoded as electrical signals , comprising : a) processing input speech signals to extract one or more feature vectors ;
b) comparing the extracted feature vectors with stored speech models and a task grammar to derive a null hypothesis factor indicating the probability that the input speech is correctly recognized ;
c) comparing the extracted feature vectors with a general speech model and a garbage loop grammar to derive an alternate hypothesis factor indicating the probability that the extracted feature vectors correspond to the characteristics of general speech ;
d) normalizing the log difference of the null and alternate hypothesis probability factors by the magnitude of the log likelihood of the null hypothesis factor ;
and e) rejecting the speech where the difference of the normalized probability factors is less than a rejection factor derived from the utterance length .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
EP0901000A2

Filed: 1998-07-30     Issued: 1999-03-10

Message processing system and method for processing messages

(Original Assignee) Toyota Motor Corp     (Current Assignee) Toyota Motor Corp

Taizo Asaoka, Naoki Maeda, Hiroyuki Kanemitsu, Masanobu Yamashita
US7979277B2
CLAIM 1
. A speech recognition circuit (second voice, first voice) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit (output timing) for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
EP0901000A2
CLAIM 9
A message processing device for a vehicle , comprising : receiving means (32) for receiving outside information sent from outside ;
voice tone memorizing means (39)for storing a plurality of different voice tones ;
voice reading means (33 , 34) for reading aloud said outside information by using one voice tone stored in said voice tone memorizing means (39) ;
a voice navigator (34) for providing voice guidance information to a driver ;
and adjusting means (33) for adjusting an output timing (calculating circuit, search stage processing) of when the voice guidance information is read aloud and when the electrical information is read aloud to prevent the voice guidance information and the electrical information from being read aloud simultaneously .

EP0901000A2
CLAIM 17
A computer readable medium including a message processing program that performs the steps of : receiving a message from an outside source ;
reading aloud the message from the outside source using a first voice (speech accelerator, speech recognition circuit) tone ;
and reading aloud a message from a second source different from said outside source using a second voice (speech accelerator, speech recognition circuit) tone that is different from said first voice tone .

US7979277B2
CLAIM 2
. A speech recognition circuit (second voice, first voice) as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing (output timing) on the first processor .
EP0901000A2
CLAIM 9
A message processing device for a vehicle , comprising : receiving means (32) for receiving outside information sent from outside ;
voice tone memorizing means (39)for storing a plurality of different voice tones ;
voice reading means (33 , 34) for reading aloud said outside information by using one voice tone stored in said voice tone memorizing means (39) ;
a voice navigator (34) for providing voice guidance information to a driver ;
and adjusting means (33) for adjusting an output timing (calculating circuit, search stage processing) of when the voice guidance information is read aloud and when the electrical information is read aloud to prevent the voice guidance information and the electrical information from being read aloud simultaneously .

EP0901000A2
CLAIM 17
A computer readable medium including a message processing program that performs the steps of : receiving a message from an outside source ;
reading aloud the message from the outside source using a first voice (speech accelerator, speech recognition circuit) tone ;
and reading aloud a message from a second source different from said outside source using a second voice (speech accelerator, speech recognition circuit) tone that is different from said first voice tone .

US7979277B2
CLAIM 3
. A speech recognition circuit (second voice, first voice) as claimed in claim 1 , comprising dynamic scheduling whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
EP0901000A2
CLAIM 17
A computer readable medium including a message processing program that performs the steps of : receiving a message from an outside source ;
reading aloud the message from the outside source using a first voice (speech accelerator, speech recognition circuit) tone ;
and reading aloud a message from a second source different from said outside source using a second voice (speech accelerator, speech recognition circuit) tone that is different from said first voice tone .

US7979277B2
CLAIM 4
. A speech recognition circuit (second voice, first voice) as claimed in claim 1 , wherein the first processor supports multi-threaded operation , and runs the search stage and front ends as separate threads .
EP0901000A2
CLAIM 17
A computer readable medium including a message processing program that performs the steps of : receiving a message from an outside source ;
reading aloud the message from the outside source using a first voice (speech accelerator, speech recognition circuit) tone ;
and reading aloud a message from a second source different from said outside source using a second voice (speech accelerator, speech recognition circuit) tone that is different from said first voice tone .

US7979277B2
CLAIM 5
. A speech recognition circuit (second voice, first voice) as claimed in claim 1 , wherein the said calculating circuit (output timing) is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
EP0901000A2
CLAIM 9
A message processing device for a vehicle , comprising : receiving means (32) for receiving outside information sent from outside ;
voice tone memorizing means (39)for storing a plurality of different voice tones ;
voice reading means (33 , 34) for reading aloud said outside information by using one voice tone stored in said voice tone memorizing means (39) ;
a voice navigator (34) for providing voice guidance information to a driver ;
and adjusting means (33) for adjusting an output timing (calculating circuit, search stage processing) of when the voice guidance information is read aloud and when the electrical information is read aloud to prevent the voice guidance information and the electrical information from being read aloud simultaneously .

EP0901000A2
CLAIM 17
A computer readable medium including a message processing program that performs the steps of : receiving a message from an outside source ;
reading aloud the message from the outside source using a first voice (speech accelerator, speech recognition circuit) tone ;
and reading aloud a message from a second source different from said outside source using a second voice (speech accelerator, speech recognition circuit) tone that is different from said first voice tone .

US7979277B2
CLAIM 6
. The speech recognition circuit (second voice, first voice) of claim 1 , comprising control means adapted to implement frame dropping , to discard one or more audio time frames .
EP0901000A2
CLAIM 17
A computer readable medium including a message processing program that performs the steps of : receiving a message from an outside source ;
reading aloud the message from the outside source using a first voice (speech accelerator, speech recognition circuit) tone ;
and reading aloud a message from a second source different from said outside source using a second voice (speech accelerator, speech recognition circuit) tone that is different from said first voice tone .

US7979277B2
CLAIM 7
. The speech recognition circuit (second voice, first voice) of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal for a predetermined time frame .
EP0901000A2
CLAIM 17
A computer readable medium including a message processing program that performs the steps of : receiving a message from an outside source ;
reading aloud the message from the outside source using a first voice (speech accelerator, speech recognition circuit) tone ;
and reading aloud a message from a second source different from said outside source using a second voice (speech accelerator, speech recognition circuit) tone that is different from said first voice tone .

US7979277B2
CLAIM 8
. The speech recognition circuit (second voice, first voice) of claim 1 , wherein the processor is configured to divert to another task if the data flow stalls .
EP0901000A2
CLAIM 17
A computer readable medium including a message processing program that performs the steps of : receiving a message from an outside source ;
reading aloud the message from the outside source using a first voice (speech accelerator, speech recognition circuit) tone ;
and reading aloud a message from a second source different from said outside source using a second voice (speech accelerator, speech recognition circuit) tone that is different from said first voice tone .

US7979277B2
CLAIM 9
. The speech recognition circuit (second voice, first voice) of claim 1 , wherein the speech accelerator (second voice, first voice) has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
EP0901000A2
CLAIM 17
A computer readable medium including a message processing program that performs the steps of : receiving a message from an outside source ;
reading aloud the message from the outside source using a first voice (speech accelerator, speech recognition circuit) tone ;
and reading aloud a message from a second source different from said outside source using a second voice (speech accelerator, speech recognition circuit) tone that is different from said first voice tone .

US7979277B2
CLAIM 10
. The speech recognition circuit (second voice, first voice) of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory .
EP0901000A2
CLAIM 17
A computer readable medium including a message processing program that performs the steps of : receiving a message from an outside source ;
reading aloud the message from the outside source using a first voice (speech accelerator, speech recognition circuit) tone ;
and reading aloud a message from a second source different from said outside source using a second voice (speech accelerator, speech recognition circuit) tone that is different from said first voice tone .

US7979277B2
CLAIM 11
. The speech recognition circuit (second voice, first voice) of claim 1 , comprising increasing the pipeline depth by computing extra front frames in advance .
EP0901000A2
CLAIM 17
A computer readable medium including a message processing program that performs the steps of : receiving a message from an outside source ;
reading aloud the message from the outside source using a first voice (speech accelerator, speech recognition circuit) tone ;
and reading aloud a message from a second source different from said outside source using a second voice (speech accelerator, speech recognition circuit) tone that is different from said first voice tone .

US7979277B2
CLAIM 12
. The speech recognition circuit (second voice, first voice) of claim 1 , wherein the audio front end is configured to input a digital audio signal .
EP0901000A2
CLAIM 17
A computer readable medium including a message processing program that performs the steps of : receiving a message from an outside source ;
reading aloud the message from the outside source using a first voice (speech accelerator, speech recognition circuit) tone ;
and reading aloud a message from a second source different from said outside source using a second voice (speech accelerator, speech recognition circuit) tone that is different from said first voice tone .

US7979277B2
CLAIM 13
. A speech recognition circuit (second voice, first voice) of claim 1 , wherein said distance comprises a Mahalanobis distance .
EP0901000A2
CLAIM 17
A computer readable medium including a message processing program that performs the steps of : receiving a message from an outside source ;
reading aloud the message from the outside source using a first voice (speech accelerator, speech recognition circuit) tone ;
and reading aloud a message from a second source different from said outside source using a second voice (speech accelerator, speech recognition circuit) tone that is different from said first voice tone .

US7979277B2
CLAIM 14
. A speech recognition circuit (second voice, first voice) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
EP0901000A2
CLAIM 17
A computer readable medium including a message processing program that performs the steps of : receiving a message from an outside source ;
reading aloud the message from the outside source using a first voice (speech accelerator, speech recognition circuit) tone ;
and reading aloud a message from a second source different from said outside source using a second voice (speech accelerator, speech recognition circuit) tone that is different from said first voice tone .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit (output timing) ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
EP0901000A2
CLAIM 9
A message processing device for a vehicle , comprising : receiving means (32) for receiving outside information sent from outside ;
voice tone memorizing means (39)for storing a plurality of different voice tones ;
voice reading means (33 , 34) for reading aloud said outside information by using one voice tone stored in said voice tone memorizing means (39) ;
a voice navigator (34) for providing voice guidance information to a driver ;
and adjusting means (33) for adjusting an output timing (calculating circuit, search stage processing) of when the voice guidance information is read aloud and when the electrical information is read aloud to prevent the voice guidance information and the electrical information from being read aloud simultaneously .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
WO9857489A2

Filed: 1998-06-08     Issued: 1998-12-17

Modular system for accelerating data searches and data stream operations

(Original Assignee) Metalithic Systems, Inc.     

James M. O'reilly, Daryl Eigen
US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator has an interrupt signal (processing data) to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
WO9857489A2
CLAIM 1
. A data processing module adapted to be connected to a computer for use with a computer , the computer including a memory for storing data , the module comprising : a module memory for storing data ;
and a programmable logic device connected to said module memory and adapted to be connected to the computer for receiving data stored in said module memory and the computer memoiy for processing data (interrupt signal) .

US7979277B2
CLAIM 10
. The speech recognition circuit of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory (programmable logic device) .
WO9857489A2
CLAIM 1
. A data processing module adapted to be connected to a computer for use with a computer , the computer including a memory for storing data , the module comprising : a module memory for storing data ;
and a programmable logic device (result memory) connected to said module memory and adapted to be connected to the computer for receiving data stored in said module memory and the computer memoiy for processing data .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6456965B1

Filed: 1998-05-19     Issued: 2002-09-24

Multi-stage pitch and mixed voicing estimation for harmonic speech coders

(Original Assignee) Texas Instruments Inc     (Current Assignee) Texas Instruments Inc

Suat Yeldener
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (time domain waveform) (time domain waveform) ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US6456965B1
CLAIM 1
. A method of estimating the pitch of a segment of a speech signal , comprising the steps of : selecting a set of initial pitch candidates by dividing the pitch range into sub-ranges , applying a pitch cost function to input samples , and selecting a pitch candidate for each said sub-range for which the pitch cost function is maximized , determining an input pitch period using at least one previously calculated pitch value from prior segments of said speech signal ;
determining whether said determined pitch period from prior segments is short or long ;
and for each pitch candidate , if said average pitch period is short having just a few harmonics such that it is easier to match time domain waveform (audio time frame, time frame) s , using a time domain pitch estimation process to evaluate each said pitch candidate , or if said average pitch period is long being more than a few harmonics and not easier to match time domain waveforms , using a frequency domain pitch estimation process to evaluate each said pitch candidate .

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal for a predetermined time frame (time domain waveform) .
US6456965B1
CLAIM 1
. A method of estimating the pitch of a segment of a speech signal , comprising the steps of : selecting a set of initial pitch candidates by dividing the pitch range into sub-ranges , applying a pitch cost function to input samples , and selecting a pitch candidate for each said sub-range for which the pitch cost function is maximized , determining an input pitch period using at least one previously calculated pitch value from prior segments of said speech signal ;
determining whether said determined pitch period from prior segments is short or long ;
and for each pitch candidate , if said average pitch period is short having just a few harmonics such that it is easier to match time domain waveform (audio time frame, time frame) s , using a time domain pitch estimation process to evaluate each said pitch candidate , or if said average pitch period is long being more than a few harmonics and not easier to match time domain waveforms , using a frequency domain pitch estimation process to evaluate each said pitch candidate .

US7979277B2
CLAIM 10
. The speech recognition circuit of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory (speech signal) .
US6456965B1
CLAIM 1
. A method of estimating the pitch of a segment of a speech signal (result memory) , comprising the steps of : selecting a set of initial pitch candidates by dividing the pitch range into sub-ranges , applying a pitch cost function to input samples , and selecting a pitch candidate for each said sub-range for which the pitch cost function is maximized , determining an input pitch period using at least one previously calculated pitch value from prior segments of said speech signal ;
determining whether said determined pitch period from prior segments is short or long ;
and for each pitch candidate , if said average pitch period is short having just a few harmonics such that it is easier to match time domain waveforms , using a time domain pitch estimation process to evaluate each said pitch candidate , or if said average pitch period is long being more than a few harmonics and not easier to match time domain waveforms , using a frequency domain pitch estimation process to evaluate each said pitch candidate .

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end is configured to input a digital audio (said time) signal .
US6456965B1
CLAIM 5
. The method of claim 1 , wherein said time (digital audio, digital audio signal) domain pitch estimation process is an analysis by synthesis process .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (time domain waveform) (time domain waveform) ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US6456965B1
CLAIM 1
. A method of estimating the pitch of a segment of a speech signal , comprising the steps of : selecting a set of initial pitch candidates by dividing the pitch range into sub-ranges , applying a pitch cost function to input samples , and selecting a pitch candidate for each said sub-range for which the pitch cost function is maximized , determining an input pitch period using at least one previously calculated pitch value from prior segments of said speech signal ;
determining whether said determined pitch period from prior segments is short or long ;
and for each pitch candidate , if said average pitch period is short having just a few harmonics such that it is easier to match time domain waveform (audio time frame, time frame) s , using a time domain pitch estimation process to evaluate each said pitch candidate , or if said average pitch period is long being more than a few harmonics and not easier to match time domain waveforms , using a frequency domain pitch estimation process to evaluate each said pitch candidate .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (time domain waveform) (time domain waveform) ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US6456965B1
CLAIM 1
. A method of estimating the pitch of a segment of a speech signal , comprising the steps of : selecting a set of initial pitch candidates by dividing the pitch range into sub-ranges , applying a pitch cost function to input samples , and selecting a pitch candidate for each said sub-range for which the pitch cost function is maximized , determining an input pitch period using at least one previously calculated pitch value from prior segments of said speech signal ;
determining whether said determined pitch period from prior segments is short or long ;
and for each pitch candidate , if said average pitch period is short having just a few harmonics such that it is easier to match time domain waveform (audio time frame, time frame) s , using a time domain pitch estimation process to evaluate each said pitch candidate , or if said average pitch period is long being more than a few harmonics and not easier to match time domain waveforms , using a frequency domain pitch estimation process to evaluate each said pitch candidate .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (time domain waveform) (time domain waveform) ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
US6456965B1
CLAIM 1
. A method of estimating the pitch of a segment of a speech signal , comprising the steps of : selecting a set of initial pitch candidates by dividing the pitch range into sub-ranges , applying a pitch cost function to input samples , and selecting a pitch candidate for each said sub-range for which the pitch cost function is maximized , determining an input pitch period using at least one previously calculated pitch value from prior segments of said speech signal ;
determining whether said determined pitch period from prior segments is short or long ;
and for each pitch candidate , if said average pitch period is short having just a few harmonics such that it is easier to match time domain waveform (audio time frame, time frame) s , using a time domain pitch estimation process to evaluate each said pitch candidate , or if said average pitch period is long being more than a few harmonics and not easier to match time domain waveforms , using a frequency domain pitch estimation process to evaluate each said pitch candidate .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6593956B1

Filed: 1998-05-15     Issued: 2003-07-15

Locating an audio source

(Original Assignee) Polycom Inc     (Current Assignee) Polycom Inc

Steven L. Potts, Hong Wang, Wendi Beth Rabiner, Peter L. Chu
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal (video signals, audio sources, video image) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances (positioning device) indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US6593956B1
CLAIM 4
. The system of claim 1 wherein the image pickup device includes a positioning device (calculating means, calculating distances) for positioning said image pickup device , wherein the audio source locator supplies control signals to the positioning device for positioning the image pickup device , the control signals being generated based on the determined direction of the audio source .

US6593956B1
CLAIM 6
. The system of claim 1 wherein the image signals represent frames of video image (digital audio, audio signal, interrupt signal, digital audio signal) s and the audio source is a speaking person , the audio source locator detecting an image of the face of the speaking person in one of the frames of video .

US6593956B1
CLAIM 7
. The system of claim 6 wherein the audio source locator detects the image of the face of the speaking person by detecting the speaking person based on the audio signals , detecting images of the faces of a plurality of persons based on the video signals (digital audio, audio signal, interrupt signal, digital audio signal) , and correlating the detected images to the speaking person to detect the image of the face of the speaking person .

US6593956B1
CLAIM 25
. The system of claim 1 wherein the audio based locator detects a plurality of audio sources (digital audio, audio signal, interrupt signal, digital audio signal) and uses a parameter to determine whether to validate at least one of the plurality of audio sources to use in producing control signals for the image pickup device , wherein changing the parameter in one direction increases a likelihood of audio based locator validating said at least one of the plurality of audio sources and changing that parameter in another direction decreases the likelihood , and wherein the audio based locator correlates the audio based direction of the audio source with the stored video based location of the image in one of the frames of video to determine whether the image in the one of the frames of video corresponds to the audio source , and if the image in the one of the frames of video corresponds to the audio source , the audio based locator changes the parameter in the one direction .

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal (video signals, audio sources, video image) for a predetermined time frame .
US6593956B1
CLAIM 6
. The system of claim 1 wherein the image signals represent frames of video image (digital audio, audio signal, interrupt signal, digital audio signal) s and the audio source is a speaking person , the audio source locator detecting an image of the face of the speaking person in one of the frames of video .

US6593956B1
CLAIM 7
. The system of claim 6 wherein the audio source locator detects the image of the face of the speaking person by detecting the speaking person based on the audio signals , detecting images of the faces of a plurality of persons based on the video signals (digital audio, audio signal, interrupt signal, digital audio signal) , and correlating the detected images to the speaking person to detect the image of the face of the speaking person .

US6593956B1
CLAIM 25
. The system of claim 1 wherein the audio based locator detects a plurality of audio sources (digital audio, audio signal, interrupt signal, digital audio signal) and uses a parameter to determine whether to validate at least one of the plurality of audio sources to use in producing control signals for the image pickup device , wherein changing the parameter in one direction increases a likelihood of audio based locator validating said at least one of the plurality of audio sources and changing that parameter in another direction decreases the likelihood , and wherein the audio based locator correlates the audio based direction of the audio source with the stored video based location of the image in one of the frames of video to determine whether the image in the one of the frames of video corresponds to the audio source , and if the image in the one of the frames of video corresponds to the audio source , the audio based locator changes the parameter in the one direction .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator has an interrupt signal (video signals, audio sources, video image) to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
US6593956B1
CLAIM 6
. The system of claim 1 wherein the image signals represent frames of video image (digital audio, audio signal, interrupt signal, digital audio signal) s and the audio source is a speaking person , the audio source locator detecting an image of the face of the speaking person in one of the frames of video .

US6593956B1
CLAIM 7
. The system of claim 6 wherein the audio source locator detects the image of the face of the speaking person by detecting the speaking person based on the audio signals , detecting images of the faces of a plurality of persons based on the video signals (digital audio, audio signal, interrupt signal, digital audio signal) , and correlating the detected images to the speaking person to detect the image of the face of the speaking person .

US6593956B1
CLAIM 25
. The system of claim 1 wherein the audio based locator detects a plurality of audio sources (digital audio, audio signal, interrupt signal, digital audio signal) and uses a parameter to determine whether to validate at least one of the plurality of audio sources to use in producing control signals for the image pickup device , wherein changing the parameter in one direction increases a likelihood of audio based locator validating said at least one of the plurality of audio sources and changing that parameter in another direction decreases the likelihood , and wherein the audio based locator correlates the audio based direction of the audio source with the stored video based location of the image in one of the frames of video to determine whether the image in the one of the frames of video corresponds to the audio source , and if the image in the one of the frames of video corresponds to the audio source , the audio based locator changes the parameter in the one direction .

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end is configured to input a digital audio (video signals, audio sources, video image) signal .
US6593956B1
CLAIM 6
. The system of claim 1 wherein the image signals represent frames of video image (digital audio, audio signal, interrupt signal, digital audio signal) s and the audio source is a speaking person , the audio source locator detecting an image of the face of the speaking person in one of the frames of video .

US6593956B1
CLAIM 7
. The system of claim 6 wherein the audio source locator detects the image of the face of the speaking person by detecting the speaking person based on the audio signals , detecting images of the faces of a plurality of persons based on the video signals (digital audio, audio signal, interrupt signal, digital audio signal) , and correlating the detected images to the speaking person to detect the image of the face of the speaking person .

US6593956B1
CLAIM 25
. The system of claim 1 wherein the audio based locator detects a plurality of audio sources (digital audio, audio signal, interrupt signal, digital audio signal) and uses a parameter to determine whether to validate at least one of the plurality of audio sources to use in producing control signals for the image pickup device , wherein changing the parameter in one direction increases a likelihood of audio based locator validating said at least one of the plurality of audio sources and changing that parameter in another direction decreases the likelihood , and wherein the audio based locator correlates the audio based direction of the audio source with the stored video based location of the image in one of the frames of video to determine whether the image in the one of the frames of video corresponds to the audio source , and if the image in the one of the frames of video corresponds to the audio source , the audio based locator changes the parameter in the one direction .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal (video signals, audio sources, video image) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means (positioning device) for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US6593956B1
CLAIM 4
. The system of claim 1 wherein the image pickup device includes a positioning device (calculating means, calculating distances) for positioning said image pickup device , wherein the audio source locator supplies control signals to the positioning device for positioning the image pickup device , the control signals being generated based on the determined direction of the audio source .

US6593956B1
CLAIM 6
. The system of claim 1 wherein the image signals represent frames of video image (digital audio, audio signal, interrupt signal, digital audio signal) s and the audio source is a speaking person , the audio source locator detecting an image of the face of the speaking person in one of the frames of video .

US6593956B1
CLAIM 7
. The system of claim 6 wherein the audio source locator detects the image of the face of the speaking person by detecting the speaking person based on the audio signals , detecting images of the faces of a plurality of persons based on the video signals (digital audio, audio signal, interrupt signal, digital audio signal) , and correlating the detected images to the speaking person to detect the image of the face of the speaking person .

US6593956B1
CLAIM 25
. The system of claim 1 wherein the audio based locator detects a plurality of audio sources (digital audio, audio signal, interrupt signal, digital audio signal) and uses a parameter to determine whether to validate at least one of the plurality of audio sources to use in producing control signals for the image pickup device , wherein changing the parameter in one direction increases a likelihood of audio based locator validating said at least one of the plurality of audio sources and changing that parameter in another direction decreases the likelihood , and wherein the audio based locator correlates the audio based direction of the audio source with the stored video based location of the image in one of the frames of video to determine whether the image in the one of the frames of video corresponds to the audio source , and if the image in the one of the frames of video corresponds to the audio source , the audio based locator changes the parameter in the one direction .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal (video signals, audio sources, video image) using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US6593956B1
CLAIM 6
. The system of claim 1 wherein the image signals represent frames of video image (digital audio, audio signal, interrupt signal, digital audio signal) s and the audio source is a speaking person , the audio source locator detecting an image of the face of the speaking person in one of the frames of video .

US6593956B1
CLAIM 7
. The system of claim 6 wherein the audio source locator detects the image of the face of the speaking person by detecting the speaking person based on the audio signals , detecting images of the faces of a plurality of persons based on the video signals (digital audio, audio signal, interrupt signal, digital audio signal) , and correlating the detected images to the speaking person to detect the image of the face of the speaking person .

US6593956B1
CLAIM 25
. The system of claim 1 wherein the audio based locator detects a plurality of audio sources (digital audio, audio signal, interrupt signal, digital audio signal) and uses a parameter to determine whether to validate at least one of the plurality of audio sources to use in producing control signals for the image pickup device , wherein changing the parameter in one direction increases a likelihood of audio based locator validating said at least one of the plurality of audio sources and changing that parameter in another direction decreases the likelihood , and wherein the audio based locator correlates the audio based direction of the audio source with the stored video based location of the image in one of the frames of video to determine whether the image in the one of the frames of video corresponds to the audio source , and if the image in the one of the frames of video corresponds to the audio source , the audio based locator changes the parameter in the one direction .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal (video signals, audio sources, video image) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
US6593956B1
CLAIM 6
. The system of claim 1 wherein the image signals represent frames of video image (digital audio, audio signal, interrupt signal, digital audio signal) s and the audio source is a speaking person , the audio source locator detecting an image of the face of the speaking person in one of the frames of video .

US6593956B1
CLAIM 7
. The system of claim 6 wherein the audio source locator detects the image of the face of the speaking person by detecting the speaking person based on the audio signals , detecting images of the faces of a plurality of persons based on the video signals (digital audio, audio signal, interrupt signal, digital audio signal) , and correlating the detected images to the speaking person to detect the image of the face of the speaking person .

US6593956B1
CLAIM 25
. The system of claim 1 wherein the audio based locator detects a plurality of audio sources (digital audio, audio signal, interrupt signal, digital audio signal) and uses a parameter to determine whether to validate at least one of the plurality of audio sources to use in producing control signals for the image pickup device , wherein changing the parameter in one direction increases a likelihood of audio based locator validating said at least one of the plurality of audio sources and changing that parameter in another direction decreases the likelihood , and wherein the audio based locator correlates the audio based direction of the audio source with the stored video based location of the image in one of the frames of video to determine whether the image in the one of the frames of video corresponds to the audio source , and if the image in the one of the frames of video corresponds to the audio source , the audio based locator changes the parameter in the one direction .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6757652B1

Filed: 1998-03-03     Issued: 2004-06-29

Multiple stage speech recognizer

(Original Assignee) Koninklijke Philips Electronics NV     (Current Assignee) Koninklijke Philips NV

Michael Lund, Karl Wright, Wensheng Fan
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor (first processor) , and said calculating circuit is implemented using a second processor (second processors) , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US6757652B1
CLAIM 1
. Software stored on a computer readable medium for causing a multiprocessor computer to perform the function of recognizing an utterance spoken by a speaker , including : software for causing a first processor (first processor) to perform the functions of : computing a series of segments associated with the utterance , each segment having a time interval with the utterance , and a plurality of scores characterizing the degree of match of the utterance in that time interval with a first plurality of subword units , and sharing a portion of the series of segments with a second processor ;
and software for causing a second processor to perform the functions of : determining a plurality of word sequences hypotheses associated with the utterance , computing scores for the plurality of word sequence hypotheses , using a second plurality of subword units to represent words in the word sequence hypotheses , recognizing the utterance using the scores , wherein the first and second processors (second processor) are substantially independent and asynchronous of each other , and wherein the first plurality of subword units is a set of phonemes , and the second plurality of subword units is a set of context dependent phonemes .

US7979277B2
CLAIM 2
. A speech recognition circuit as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor (first processor) .
US6757652B1
CLAIM 1
. Software stored on a computer readable medium for causing a multiprocessor computer to perform the function of recognizing an utterance spoken by a speaker , including : software for causing a first processor (first processor) to perform the functions of : computing a series of segments associated with the utterance , each segment having a time interval with the utterance , and a plurality of scores characterizing the degree of match of the utterance in that time interval with a first plurality of subword units , and sharing a portion of the series of segments with a second processor ;
and software for causing a second processor to perform the functions of : determining a plurality of word sequences hypotheses associated with the utterance , computing scores for the plurality of word sequence hypotheses , using a second plurality of subword units to represent words in the word sequence hypotheses , recognizing the utterance using the scores , wherein the first and second processors are substantially independent and asynchronous of each other , and wherein the first plurality of subword units is a set of phonemes , and the second plurality of subword units is a set of context dependent phonemes .

US7979277B2
CLAIM 3
. A speech recognition circuit as claimed in claim 1 , comprising dynamic scheduling whether the first processor (first processor) should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
US6757652B1
CLAIM 1
. Software stored on a computer readable medium for causing a multiprocessor computer to perform the function of recognizing an utterance spoken by a speaker , including : software for causing a first processor (first processor) to perform the functions of : computing a series of segments associated with the utterance , each segment having a time interval with the utterance , and a plurality of scores characterizing the degree of match of the utterance in that time interval with a first plurality of subword units , and sharing a portion of the series of segments with a second processor ;
and software for causing a second processor to perform the functions of : determining a plurality of word sequences hypotheses associated with the utterance , computing scores for the plurality of word sequence hypotheses , using a second plurality of subword units to represent words in the word sequence hypotheses , recognizing the utterance using the scores , wherein the first and second processors are substantially independent and asynchronous of each other , and wherein the first plurality of subword units is a set of phonemes , and the second plurality of subword units is a set of context dependent phonemes .

US7979277B2
CLAIM 4
. A speech recognition circuit as claimed in claim 1 , wherein the first processor (first processor) supports multi-threaded operation , and runs the search stage and front ends as separate threads .
US6757652B1
CLAIM 1
. Software stored on a computer readable medium for causing a multiprocessor computer to perform the function of recognizing an utterance spoken by a speaker , including : software for causing a first processor (first processor) to perform the functions of : computing a series of segments associated with the utterance , each segment having a time interval with the utterance , and a plurality of scores characterizing the degree of match of the utterance in that time interval with a first plurality of subword units , and sharing a portion of the series of segments with a second processor ;
and software for causing a second processor to perform the functions of : determining a plurality of word sequences hypotheses associated with the utterance , computing scores for the plurality of word sequence hypotheses , using a second plurality of subword units to represent words in the word sequence hypotheses , recognizing the utterance using the scores , wherein the first and second processors are substantially independent and asynchronous of each other , and wherein the first plurality of subword units is a set of phonemes , and the second plurality of subword units is a set of context dependent phonemes .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6081779A

Filed: 1998-03-02     Issued: 2000-06-27

Language model adaptation for automatic speech recognition

(Original Assignee) US Philips Corp     (Current Assignee) US Philips Corp

Stefan Besling, Hans-Gunter Meier
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances (following steps) indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor (reference values) , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US6081779A
CLAIM 1
. A method of adapting a language model with language model values for automatic speech recognition , in which test values are derived from a speech signal and compared with reference values (first processor, distance results, control means) determining a given vocabulary , there being derived scores which are linked to language model values at word boundaries , the language model values being dependent on the probability of occurrence of a given word of the vocabulary in dependence on at least one predecessor word , said method including the following steps (calculating means, speech recognition method, calculating distances) : determination of a basic language model with basic language model values on the basis of training speech signals , determination , utilizing statistic calculation methods , of confidence intervals , having an upper and a lower boundary for language model values , on the basis of a different speech signal which deviates from the training speech signals , determination of a scaling factor in such a manner that the basic language model values scaled thereby satisfy an optimization criterion which the position of the scaled language model values are within the confidence intervals , use of scaled language model values which are situated in the confidence intervals and , in the case of scaled language model values situated beyond the boundaries of the confidence intervals , the nearest boundary as adapted language model values and , for confidence intervals not determined from the different speech signal , the basic language model values for the further recognition of the different speech signal .

US7979277B2
CLAIM 2
. A speech recognition circuit as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor (reference values) .
US6081779A
CLAIM 1
. A method of adapting a language model with language model values for automatic speech recognition , in which test values are derived from a speech signal and compared with reference values (first processor, distance results, control means) determining a given vocabulary , there being derived scores which are linked to language model values at word boundaries , the language model values being dependent on the probability of occurrence of a given word of the vocabulary in dependence on at least one predecessor word , said method including the following steps : determination of a basic language model with basic language model values on the basis of training speech signals , determination , utilizing statistic calculation methods , of confidence intervals , having an upper and a lower boundary for language model values , on the basis of a different speech signal which deviates from the training speech signals , determination of a scaling factor in such a manner that the basic language model values scaled thereby satisfy an optimization criterion which the position of the scaled language model values are within the confidence intervals , use of scaled language model values which are situated in the confidence intervals and , in the case of scaled language model values situated beyond the boundaries of the confidence intervals , the nearest boundary as adapted language model values and , for confidence intervals not determined from the different speech signal , the basic language model values for the further recognition of the different speech signal .

US7979277B2
CLAIM 3
. A speech recognition circuit as claimed in claim 1 , comprising dynamic scheduling whether the first processor (reference values) should run the front end or search stage code , based on availability or unavailability of distance results (reference values) and/or availability of space for storing more feature vectors and/or distance results .
US6081779A
CLAIM 1
. A method of adapting a language model with language model values for automatic speech recognition , in which test values are derived from a speech signal and compared with reference values (first processor, distance results, control means) determining a given vocabulary , there being derived scores which are linked to language model values at word boundaries , the language model values being dependent on the probability of occurrence of a given word of the vocabulary in dependence on at least one predecessor word , said method including the following steps : determination of a basic language model with basic language model values on the basis of training speech signals , determination , utilizing statistic calculation methods , of confidence intervals , having an upper and a lower boundary for language model values , on the basis of a different speech signal which deviates from the training speech signals , determination of a scaling factor in such a manner that the basic language model values scaled thereby satisfy an optimization criterion which the position of the scaled language model values are within the confidence intervals , use of scaled language model values which are situated in the confidence intervals and , in the case of scaled language model values situated beyond the boundaries of the confidence intervals , the nearest boundary as adapted language model values and , for confidence intervals not determined from the different speech signal , the basic language model values for the further recognition of the different speech signal .

US7979277B2
CLAIM 4
. A speech recognition circuit as claimed in claim 1 , wherein the first processor (reference values) supports multi-threaded operation , and runs the search stage and front ends as separate threads .
US6081779A
CLAIM 1
. A method of adapting a language model with language model values for automatic speech recognition , in which test values are derived from a speech signal and compared with reference values (first processor, distance results, control means) determining a given vocabulary , there being derived scores which are linked to language model values at word boundaries , the language model values being dependent on the probability of occurrence of a given word of the vocabulary in dependence on at least one predecessor word , said method including the following steps : determination of a basic language model with basic language model values on the basis of training speech signals , determination , utilizing statistic calculation methods , of confidence intervals , having an upper and a lower boundary for language model values , on the basis of a different speech signal which deviates from the training speech signals , determination of a scaling factor in such a manner that the basic language model values scaled thereby satisfy an optimization criterion which the position of the scaled language model values are within the confidence intervals , use of scaled language model values which are situated in the confidence intervals and , in the case of scaled language model values situated beyond the boundaries of the confidence intervals , the nearest boundary as adapted language model values and , for confidence intervals not determined from the different speech signal , the basic language model values for the further recognition of the different speech signal .

US7979277B2
CLAIM 6
. The speech recognition circuit of claim 1 , comprising control means (reference values) adapted to implement frame dropping , to discard one or more audio time frames .
US6081779A
CLAIM 1
. A method of adapting a language model with language model values for automatic speech recognition , in which test values are derived from a speech signal and compared with reference values (first processor, distance results, control means) determining a given vocabulary , there being derived scores which are linked to language model values at word boundaries , the language model values being dependent on the probability of occurrence of a given word of the vocabulary in dependence on at least one predecessor word , said method including the following steps : determination of a basic language model with basic language model values on the basis of training speech signals , determination , utilizing statistic calculation methods , of confidence intervals , having an upper and a lower boundary for language model values , on the basis of a different speech signal which deviates from the training speech signals , determination of a scaling factor in such a manner that the basic language model values scaled thereby satisfy an optimization criterion which the position of the scaled language model values are within the confidence intervals , use of scaled language model values which are situated in the confidence intervals and , in the case of scaled language model values situated beyond the boundaries of the confidence intervals , the nearest boundary as adapted language model values and , for confidence intervals not determined from the different speech signal , the basic language model values for the further recognition of the different speech signal .

US7979277B2
CLAIM 10
. The speech recognition circuit of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory (speech signal) .
US6081779A
CLAIM 1
. A method of adapting a language model with language model values for automatic speech recognition , in which test values are derived from a speech signal (result memory) and compared with reference values determining a given vocabulary , there being derived scores which are linked to language model values at word boundaries , the language model values being dependent on the probability of occurrence of a given word of the vocabulary in dependence on at least one predecessor word , said method including the following steps : determination of a basic language model with basic language model values on the basis of training speech signals , determination , utilizing statistic calculation methods , of confidence intervals , having an upper and a lower boundary for language model values , on the basis of a different speech signal which deviates from the training speech signals , determination of a scaling factor in such a manner that the basic language model values scaled thereby satisfy an optimization criterion which the position of the scaled language model values are within the confidence intervals , use of scaled language model values which are situated in the confidence intervals and , in the case of scaled language model values situated beyond the boundaries of the confidence intervals , the nearest boundary as adapted language model values and , for confidence intervals not determined from the different speech signal , the basic language model values for the further recognition of the different speech signal .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means (following steps) for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US6081779A
CLAIM 1
. A method of adapting a language model with language model values for automatic speech recognition , in which test values are derived from a speech signal and compared with reference values determining a given vocabulary , there being derived scores which are linked to language model values at word boundaries , the language model values being dependent on the probability of occurrence of a given word of the vocabulary in dependence on at least one predecessor word , said method including the following steps (calculating means, speech recognition method, calculating distances) : determination of a basic language model with basic language model values on the basis of training speech signals , determination , utilizing statistic calculation methods , of confidence intervals , having an upper and a lower boundary for language model values , on the basis of a different speech signal which deviates from the training speech signals , determination of a scaling factor in such a manner that the basic language model values scaled thereby satisfy an optimization criterion which the position of the scaled language model values are within the confidence intervals , use of scaled language model values which are situated in the confidence intervals and , in the case of scaled language model values situated beyond the boundaries of the confidence intervals , the nearest boundary as adapted language model values and , for confidence intervals not determined from the different speech signal , the basic language model values for the further recognition of the different speech signal .

US7979277B2
CLAIM 15
. A speech recognition method (following steps) , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US6081779A
CLAIM 1
. A method of adapting a language model with language model values for automatic speech recognition , in which test values are derived from a speech signal and compared with reference values determining a given vocabulary , there being derived scores which are linked to language model values at word boundaries , the language model values being dependent on the probability of occurrence of a given word of the vocabulary in dependence on at least one predecessor word , said method including the following steps (calculating means, speech recognition method, calculating distances) : determination of a basic language model with basic language model values on the basis of training speech signals , determination , utilizing statistic calculation methods , of confidence intervals , having an upper and a lower boundary for language model values , on the basis of a different speech signal which deviates from the training speech signals , determination of a scaling factor in such a manner that the basic language model values scaled thereby satisfy an optimization criterion which the position of the scaled language model values are within the confidence intervals , use of scaled language model values which are situated in the confidence intervals and , in the case of scaled language model values situated beyond the boundaries of the confidence intervals , the nearest boundary as adapted language model values and , for confidence intervals not determined from the different speech signal , the basic language model values for the further recognition of the different speech signal .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method (following steps) , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
US6081779A
CLAIM 1
. A method of adapting a language model with language model values for automatic speech recognition , in which test values are derived from a speech signal and compared with reference values determining a given vocabulary , there being derived scores which are linked to language model values at word boundaries , the language model values being dependent on the probability of occurrence of a given word of the vocabulary in dependence on at least one predecessor word , said method including the following steps (calculating means, speech recognition method, calculating distances) : determination of a basic language model with basic language model values on the basis of training speech signals , determination , utilizing statistic calculation methods , of confidence intervals , having an upper and a lower boundary for language model values , on the basis of a different speech signal which deviates from the training speech signals , determination of a scaling factor in such a manner that the basic language model values scaled thereby satisfy an optimization criterion which the position of the scaled language model values are within the confidence intervals , use of scaled language model values which are situated in the confidence intervals and , in the case of scaled language model values situated beyond the boundaries of the confidence intervals , the nearest boundary as adapted language model values and , for confidence intervals not determined from the different speech signal , the basic language model values for the further recognition of the different speech signal .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US5983180A

Filed: 1998-02-27     Issued: 1999-11-09

Recognition of sequential data using finite state sequence models organized in a tree structure

(Original Assignee) SoftSound Ltd     (Current Assignee) Longsand Ltd

Anthony John Robinson
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal (temporal alignment) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

a calculating circuit for calculating distances (determining means) indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US5983180A
CLAIM 16
. A method according to claim 12 wherein if a said determined grouping model state is already stored in said memory means , the calculated accumulated scores and associated information are merged with the stored group of accumulated scores and associated information in temporal alignment (audio signal, digital audio signal) whereby if there is temporal overlap between accumulated scores , the highest accumulated score is stored in said memory means together with the respective associated information .

US5983180A
CLAIM 17
. A method according to claim 1 wherein said data comprises digitised speech , said sequential data units comprise sample frames of the speech data (audio time frame, time frame) , said tokens comprise units of speech , and said items comprise spoken words .

US5983180A
CLAIM 18
. Recognition apparatus for recognising sequential tokens grouped into one or more items , the apparatus comprising : storage means for storing data representing known items as respective finite state sequence models , where each state corresponds to a token and said models having common prefix states are organised in a tree structure such that suffix states comprise branches from common prefix states and there are a plurality of tree structures each having a different prefix state ;
comparing means for comparing each sequential data unit with stored reference data units identified by respective reference tokens to generate scores for each data unit indicating the similarity of the data unit to respective said reference data units ;
determining means (calculating distances) for determining an accumulated score for a final state in each of a number of the models comprising a) means for sequentially calculating the accumulated score for a model to reach the final state comprising a leaf in the tree , b) means for identifying the closest branch to the leaf corresponding to a next model for which an accumulated score for the final state has not yet been calculated , and c) means for accumulating the score from the identified closest branch for the next model to the final state , wherein the scores are accumulated for the branches of the tree and accumulated for the plurality of trees ;
and means for identifying at least the item corresponding to the model having the highest accumulated score .

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal (temporal alignment) for a predetermined time frame (speech data) .
US5983180A
CLAIM 16
. A method according to claim 12 wherein if a said determined grouping model state is already stored in said memory means , the calculated accumulated scores and associated information are merged with the stored group of accumulated scores and associated information in temporal alignment (audio signal, digital audio signal) whereby if there is temporal overlap between accumulated scores , the highest accumulated score is stored in said memory means together with the respective associated information .

US5983180A
CLAIM 17
. A method according to claim 1 wherein said data comprises digitised speech , said sequential data units comprise sample frames of the speech data (audio time frame, time frame) , said tokens comprise units of speech , and said items comprise spoken words .

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end is configured to input a digital audio signal (temporal alignment) .
US5983180A
CLAIM 16
. A method according to claim 12 wherein if a said determined grouping model state is already stored in said memory means , the calculated accumulated scores and associated information are merged with the stored group of accumulated scores and associated information in temporal alignment (audio signal, digital audio signal) whereby if there is temporal overlap between accumulated scores , the highest accumulated score is stored in said memory means together with the respective associated information .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal (temporal alignment) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

calculating means (calculating means) for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US5983180A
CLAIM 16
. A method according to claim 12 wherein if a said determined grouping model state is already stored in said memory means , the calculated accumulated scores and associated information are merged with the stored group of accumulated scores and associated information in temporal alignment (audio signal, digital audio signal) whereby if there is temporal overlap between accumulated scores , the highest accumulated score is stored in said memory means together with the respective associated information .

US5983180A
CLAIM 17
. A method according to claim 1 wherein said data comprises digitised speech , said sequential data units comprise sample frames of the speech data (audio time frame, time frame) , said tokens comprise units of speech , and said items comprise spoken words .

US5983180A
CLAIM 25
. Recognition apparatus according to claim 24 including second storage means , for storing the accumulated scores for the temporal states for a final state of a model of an item as a group together with information identifying a grouping model state used to arrive at the final state , information identifying positions in the tree structure and information identifying the temporal position of the accumulated scores ;
reading means for reading a group of accumulated scores from said second storage means ;
wherein said calculating means (calculating means) is adapted to use the said accumulated scores for the plurality of temporal states as a plurality of temporally different initial scores for the calculation of the accumulated scores for the plurality of temporal states of a final state of a model of a subsequent item ;
and including means for determining the grouping model state used to arrive at the final state of the model of the subsequent item from the identification of the subsequent item and the grouping model state of the read group of accumulated scores .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal (temporal alignment) using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US5983180A
CLAIM 16
. A method according to claim 12 wherein if a said determined grouping model state is already stored in said memory means , the calculated accumulated scores and associated information are merged with the stored group of accumulated scores and associated information in temporal alignment (audio signal, digital audio signal) whereby if there is temporal overlap between accumulated scores , the highest accumulated score is stored in said memory means together with the respective associated information .

US5983180A
CLAIM 17
. A method according to claim 1 wherein said data comprises digitised speech , said sequential data units comprise sample frames of the speech data (audio time frame, time frame) , said tokens comprise units of speech , and said items comprise spoken words .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal (temporal alignment) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
US5983180A
CLAIM 16
. A method according to claim 12 wherein if a said determined grouping model state is already stored in said memory means , the calculated accumulated scores and associated information are merged with the stored group of accumulated scores and associated information in temporal alignment (audio signal, digital audio signal) whereby if there is temporal overlap between accumulated scores , the highest accumulated score is stored in said memory means together with the respective associated information .

US5983180A
CLAIM 17
. A method according to claim 1 wherein said data comprises digitised speech , said sequential data units comprise sample frames of the speech data (audio time frame, time frame) , said tokens comprise units of speech , and said items comprise spoken words .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6353661B1

Filed: 1997-12-18     Issued: 2002-03-05

Network and communication access systems

(Original Assignee) Bailey, Iii John Edson     

John Edson Bailey, III
US7979277B2
CLAIM 2
. A speech recognition circuit as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing (volume level) on the first processor .
US6353661B1
CLAIM 5
. The system of claim 1 , wherein the processing means further comprises means for determining whether the at least one audible navigation signal exceeds a predetermined volume level (search stage processing) and , if so , forwarding the at least one audible navigation signal to the accessing means .

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end is configured to input a digital audio (continuous fashion) signal .
US6353661B1
CLAIM 21
. The system of claim 19 , wherein the audible navigation signal corresponds to a command to navigate through the content in a continuous fashion (digital audio) , a stepped fashion , or a combination thereof .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means (display portion) for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US6353661B1
CLAIM 18
. The system of claim 14 , wherein the telephone further comprises at least one key which corresponds to a selected navigation command and the system further comprises : (e) a visually perceptible display having at least one display portion (calculating means) associated with each of the at least one keys ;
and (f) means for displaying , on the visually perceptible display , selected text and/or images corresponding to the user designated information .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
JPH11119791A

Filed: 1997-10-20     Issued: 1999-04-30

音声感情認識システムおよび方法

(Original Assignee) Hitachi Ltd; 株式会社日立製作所; Hitachi Ulsi Systems Co Ltd; 株式会社日立超エル・エス・アイ・システムズ     

Shinji Wakizaka, 新路 脇坂, Kazuo Kondo, 和夫 近藤, Yasunari Obuchi, 康成 大淵, Tetsuji Toushita, 哲司 塔下, Yasuyo Ishikawa, 泰代 石川
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal (データ) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
JPH11119791A
CLAIM 5
【請求項5】音声認識の対象となる単語や文章を集めた 辞書と、取り込んだ音声に対して音声分析処理を行う音 声分析部と、音声のパターンを音素単位でもつ音響モデ ル部と、感情による音韻スペクトルの変形を表す発声変 形感情モデル部と、音声分析結果に対して、音響モデル 部と発声変形感情モデル部と辞書部とを連結して音声認 識処理を行う音声認識部とを備え、音声の特徴から音声 認識の対象となる単語や文章を音声認識結果として出力 するとともに、発声変形感情モデル部からのデータ (audio signal) を用 いて音声がもっている話者の感情の度合を出力すること を特徴とする音声感情認識方法。

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal (データ) for a predetermined time frame .
JPH11119791A
CLAIM 5
【請求項5】音声認識の対象となる単語や文章を集めた 辞書と、取り込んだ音声に対して音声分析処理を行う音 声分析部と、音声のパターンを音素単位でもつ音響モデ ル部と、感情による音韻スペクトルの変形を表す発声変 形感情モデル部と、音声分析結果に対して、音響モデル 部と発声変形感情モデル部と辞書部とを連結して音声認 識処理を行う音声認識部とを備え、音声の特徴から音声 認識の対象となる単語や文章を音声認識結果として出力 するとともに、発声変形感情モデル部からのデータ (audio signal) を用 いて音声がもっている話者の感情の度合を出力すること を特徴とする音声感情認識方法。

US7979277B2
CLAIM 10
. The speech recognition circuit of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory (認識結果) .
JPH11119791A
CLAIM 1
【請求項1】音声認識の対象となる単語や文章を集めて 辞書として定義し、音声認識結果 (result memory) として、それらの単語 や文章を辞書部からピックアップして、文字列表示や音 声合成を用いて出力する音声認識システムにおいて、 取り込んだ音声に対して音声分析処理を行う音声分析部 と、音声のパターンを音素単位でもつ音響モデル部と、 感情による音韻スペクトルの変形を表す発声変形感情モ デル部と、音声分析結果に対して、音響モデル部と発声 変形感情モデル部と辞書部とを連結して、音声認識処理 を行う音声認識部とを備え、音声の特徴から、音声認識 の対象となる単語や文章を音声認識結果として出力する とともに、音声がもっている話者の感情の度合を示すレ ベルを出力することを特徴とする音声感情認識システ ム。

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end is configured to input a digital audio signal (データ) .
JPH11119791A
CLAIM 5
【請求項5】音声認識の対象となる単語や文章を集めた 辞書と、取り込んだ音声に対して音声分析処理を行う音 声分析部と、音声のパターンを音素単位でもつ音響モデ ル部と、感情による音韻スペクトルの変形を表す発声変 形感情モデル部と、音声分析結果に対して、音響モデル 部と発声変形感情モデル部と辞書部とを連結して音声認 識処理を行う音声認識部とを備え、音声の特徴から音声 認識の対象となる単語や文章を音声認識結果として出力 するとともに、発声変形感情モデル部からのデータ (audio signal) を用 いて音声がもっている話者の感情の度合を出力すること を特徴とする音声感情認識方法。

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal (データ) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
JPH11119791A
CLAIM 5
【請求項5】音声認識の対象となる単語や文章を集めた 辞書と、取り込んだ音声に対して音声分析処理を行う音 声分析部と、音声のパターンを音素単位でもつ音響モデ ル部と、感情による音韻スペクトルの変形を表す発声変 形感情モデル部と、音声分析結果に対して、音響モデル 部と発声変形感情モデル部と辞書部とを連結して音声認 識処理を行う音声認識部とを備え、音声の特徴から音声 認識の対象となる単語や文章を音声認識結果として出力 するとともに、発声変形感情モデル部からのデータ (audio signal) を用 いて音声がもっている話者の感情の度合を出力すること を特徴とする音声感情認識方法。

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal (データ) using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words (の変形) within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
JPH11119791A
CLAIM 1
【請求項1】音声認識の対象となる単語や文章を集めて 辞書として定義し、音声認識結果として、それらの単語 や文章を辞書部からピックアップして、文字列表示や音 声合成を用いて出力する音声認識システムにおいて、 取り込んだ音声に対して音声分析処理を行う音声分析部 と、音声のパターンを音素単位でもつ音響モデル部と、 感情による音韻スペクトルの変形 (search stage to identify words, processor to identify words) を表す発声変形感情モ デル部と、音声分析結果に対して、音響モデル部と発声 変形感情モデル部と辞書部とを連結して、音声認識処理 を行う音声認識部とを備え、音声の特徴から、音声認識 の対象となる単語や文章を音声認識結果として出力する とともに、音声がもっている話者の感情の度合を示すレ ベルを出力することを特徴とする音声感情認識システ ム。

JPH11119791A
CLAIM 5
【請求項5】音声認識の対象となる単語や文章を集めた 辞書と、取り込んだ音声に対して音声分析処理を行う音 声分析部と、音声のパターンを音素単位でもつ音響モデ ル部と、感情による音韻スペクトルの変形を表す発声変 形感情モデル部と、音声分析結果に対して、音響モデル 部と発声変形感情モデル部と辞書部とを連結して音声認 識処理を行う音声認識部とを備え、音声の特徴から音声 認識の対象となる単語や文章を音声認識結果として出力 するとともに、発声変形感情モデル部からのデータ (audio signal) を用 いて音声がもっている話者の感情の度合を出力すること を特徴とする音声感情認識方法。

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal (データ) , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words (の変形) within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
JPH11119791A
CLAIM 1
【請求項1】音声認識の対象となる単語や文章を集めて 辞書として定義し、音声認識結果として、それらの単語 や文章を辞書部からピックアップして、文字列表示や音 声合成を用いて出力する音声認識システムにおいて、 取り込んだ音声に対して音声分析処理を行う音声分析部 と、音声のパターンを音素単位でもつ音響モデル部と、 感情による音韻スペクトルの変形 (search stage to identify words, processor to identify words) を表す発声変形感情モ デル部と、音声分析結果に対して、音響モデル部と発声 変形感情モデル部と辞書部とを連結して、音声認識処理 を行う音声認識部とを備え、音声の特徴から、音声認識 の対象となる単語や文章を音声認識結果として出力する とともに、音声がもっている話者の感情の度合を示すレ ベルを出力することを特徴とする音声感情認識システ ム。

JPH11119791A
CLAIM 5
【請求項5】音声認識の対象となる単語や文章を集めた 辞書と、取り込んだ音声に対して音声分析処理を行う音 声分析部と、音声のパターンを音素単位でもつ音響モデ ル部と、感情による音韻スペクトルの変形を表す発声変 形感情モデル部と、音声分析結果に対して、音響モデル 部と発声変形感情モデル部と辞書部とを連結して音声認 識処理を行う音声認識部とを備え、音声の特徴から音声 認識の対象となる単語や文章を音声認識結果として出力 するとともに、発声変形感情モデル部からのデータ (audio signal) を用 いて音声がもっている話者の感情の度合を出力すること を特徴とする音声感情認識方法。




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6101467A

Filed: 1997-09-29     Issued: 2000-08-08

Method of and system for recognizing a spoken text

(Original Assignee) US Philips Corp     (Current Assignee) Nuance Communications Austria GmbH

Heinrich Bartosik
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US6101467A
CLAIM 8
. A method of recognizing spoken text uttered by a speaker , comprising the steps of : converting spoken text uttered by a speaker into first digital data ;
converting the first digital data which represent the spoken text , into second digital data which represent recognized text in a speech recognition process depending on conversion data including available lexicon data which represent a lexicon , available language model data which represent a language model , and available reference data which represent phonemes ;
communicating the recognized text ;
obtaining third digital data which represent corrections to the recognized text , depending on the communication of recognized text ;
correcting the second digital data using the third digital data to generate fourth digital data which represent corrected text ;
adapting the conversion data depending on the fourth digital data ;
converting the first digital data , into fifth digital data which represent recognized text depending on the adapted speech data (audio time frame, time frame) ;
and re-adapting the conversion data depending on the fifth digital data .

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal for a predetermined time frame (speech data) .
US6101467A
CLAIM 8
. A method of recognizing spoken text uttered by a speaker , comprising the steps of : converting spoken text uttered by a speaker into first digital data ;
converting the first digital data which represent the spoken text , into second digital data which represent recognized text in a speech recognition process depending on conversion data including available lexicon data which represent a lexicon , available language model data which represent a language model , and available reference data which represent phonemes ;
communicating the recognized text ;
obtaining third digital data which represent corrections to the recognized text , depending on the communication of recognized text ;
correcting the second digital data using the third digital data to generate fourth digital data which represent corrected text ;
adapting the conversion data depending on the fourth digital data ;
converting the first digital data , into fifth digital data which represent recognized text depending on the adapted speech data (audio time frame, time frame) ;
and re-adapting the conversion data depending on the fifth digital data .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator (speech recognition means) has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
US6101467A
CLAIM 3
. A system for recognizing a spoken text , comprising : conversion means for converting the spoken text uttered by a speaker into first digital text means data which represent the spoken text ;
a speech recognition unit , including : lexicon data means for storing lexicon data which represent a lexicon ;
language model data means for storing language model data which represent a language model ;
reference data means for storing reference data which represent phonemes ;
and speech recognition means (speech accelerator) to generate second digital text data which represent recognized text , in a speech recognition process depending on conversion data including : the first digital text data , the lexicon data , the language model data , and the reference data ;
means for obtaining third digital text data representing error correction data ;
error correction means for correcting the recognized text represented by the second digital text data depending on the third digital text data , by changing a part of the second digital text data depending on the third digital text data , and to generate fourth digital text data which represent corrected text ;
and adaptation means for adapting the speech recognition unit based on digital text data , including : means for adapting the lexicon data to the speaker depending on the fourth digital text data and storing the adapted lexicon data in the lexicon data means ;
means for adapting the language model data to the speaker depending on the fourth digital text data and storing the adapted language model data in the language model data means : means for adapting the reference data to the speaker depending on the first digital text data and the fourth digital text data and storing the adapted reference data in the reference data means ;
re-adaption means , including : means for converting the first digital data into fifth digital data which represent newly recognized text , using the speech recognition unit after the adaptation means has adapted the speech recognition data depending on the fourth digital data ;
and the adaptation means being for adapting the available reference data to the speaker of the spoken text depending on the fifth digital data and the first digital data .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

calculating means (conversion data) for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US6101467A
CLAIM 1
. A method of recognizing a spoken text , comprising the steps of : converting spoken text uttered by a speaker into first digital data ;
converting the first digital data which represent the spoken text , into second digital data which represent recognized text in a speech recognition process depending on conversion data (calculating means) including : available lexicon data which represent a lexicon , available language model data which represent a language model , and available reference data which represent phonemes ;
communicating the recognized text ;
obtaining third digital data which represent corrections to the recognized text , depending on the communication of recognized text ;
correcting the second digital data using the third digital data to generate fourth digital data which represent corrected text ;
adapting the speech recognition process to the speaker depending on the fourth digital data ;
converting the first digital data into fifth digital data which represent additionally recognized text using the adapted speech recognition process ;
and adapting the available reference data to the speaker depending on the fifth digital data and the first digital data .

US6101467A
CLAIM 8
. A method of recognizing spoken text uttered by a speaker , comprising the steps of : converting spoken text uttered by a speaker into first digital data ;
converting the first digital data which represent the spoken text , into second digital data which represent recognized text in a speech recognition process depending on conversion data including available lexicon data which represent a lexicon , available language model data which represent a language model , and available reference data which represent phonemes ;
communicating the recognized text ;
obtaining third digital data which represent corrections to the recognized text , depending on the communication of recognized text ;
correcting the second digital data using the third digital data to generate fourth digital data which represent corrected text ;
adapting the conversion data depending on the fourth digital data ;
converting the first digital data , into fifth digital data which represent recognized text depending on the adapted speech data (audio time frame, time frame) ;
and re-adapting the conversion data depending on the fifth digital data .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US6101467A
CLAIM 8
. A method of recognizing spoken text uttered by a speaker , comprising the steps of : converting spoken text uttered by a speaker into first digital data ;
converting the first digital data which represent the spoken text , into second digital data which represent recognized text in a speech recognition process depending on conversion data including available lexicon data which represent a lexicon , available language model data which represent a language model , and available reference data which represent phonemes ;
communicating the recognized text ;
obtaining third digital data which represent corrections to the recognized text , depending on the communication of recognized text ;
correcting the second digital data using the third digital data to generate fourth digital data which represent corrected text ;
adapting the conversion data depending on the fourth digital data ;
converting the first digital data , into fifth digital data which represent recognized text depending on the adapted speech data (audio time frame, time frame) ;
and re-adapting the conversion data depending on the fifth digital data .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
US6101467A
CLAIM 8
. A method of recognizing spoken text uttered by a speaker , comprising the steps of : converting spoken text uttered by a speaker into first digital data ;
converting the first digital data which represent the spoken text , into second digital data which represent recognized text in a speech recognition process depending on conversion data including available lexicon data which represent a lexicon , available language model data which represent a language model , and available reference data which represent phonemes ;
communicating the recognized text ;
obtaining third digital data which represent corrections to the recognized text , depending on the communication of recognized text ;
correcting the second digital data using the third digital data to generate fourth digital data which represent corrected text ;
adapting the conversion data depending on the fourth digital data ;
converting the first digital data , into fifth digital data which represent recognized text depending on the adapted speech data (audio time frame, time frame) ;
and re-adapting the conversion data depending on the fifth digital data .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6771743B1

Filed: 1997-08-12     Issued: 2004-08-03

Voice processing system, method and computer program product having common source for internet world wide web pages and voice applications

(Original Assignee) International Business Machines Corp     (Current Assignee) Nuance Communications Inc

Nicholas David Butler, Jeremy Peter James Hughes, Stephen Graham Copinger Lawrence, Susan Malaika, Lawrence Leon Porter
US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator has an interrupt signal (incoming telephone call) to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
US6771743B1
CLAIM 11
. The apparatus of claim 10 wherein said means for converting converts said HTML document prior to the receipt of an incoming telephone call (interrupt signal) .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation (incoming call) , to the distance calculation , and to the word identification .
US6771743B1
CLAIM 3
. The system of claim 2 wherein said means for forming forms said voice application each time an incoming call (feature calculation) is received .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6038305A

Filed: 1997-08-01     Issued: 2000-03-14

Personal dial tone service with personalized caller ID

(Original Assignee) Bell Atlantic Network Services Inc     (Current Assignee) Google LLC

Alexander I. McAllister, Robert D. Farris, Michael J. Strauss
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor (control instructions) , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US6038305A
CLAIM 3
. A method as in claim 1 , wherein the step of receiving and processing speech signals comprises : receiving signals from the person over the predetermined link , and comparing information characteristic of the received speech signals to stored speech data (audio time frame, time frame) corresponding to a plurality of subscribers associated with the predetermined link .

US6038305A
CLAIM 45
. A telephone central office switching system comprising : interface units for coupling to a plurality of communication links ;
a switch fabric for providing selective communication connections between the interface units ;
and an administrative module for controlling connections provided by the switch fabric , wherein the administrative module comprises : mass storage containing subscriber profiles , each of at least some of the subscriber profiles including data representing identity of a specific , associated subscriber , a processor for providing control instructions (first processor) to the switch fabric , and a signaling interface for coupling the processor to a signaling link , for communication with an external network node , wherein : in response to a virtual office equipment number received via the signaling interface , the processor retrieves a subscriber profile corresponding to the virtual office equipment number from the mass storage , uses the retrieved profile to process a selective connection through the switch fabric between two of the interface units and forwards the data representing identity from the retrieved profile through one of the interface units .

US7979277B2
CLAIM 2
. A speech recognition circuit as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor (control instructions) .
US6038305A
CLAIM 45
. A telephone central office switching system comprising : interface units for coupling to a plurality of communication links ;
a switch fabric for providing selective communication connections between the interface units ;
and an administrative module for controlling connections provided by the switch fabric , wherein the administrative module comprises : mass storage containing subscriber profiles , each of at least some of the subscriber profiles including data representing identity of a specific , associated subscriber , a processor for providing control instructions (first processor) to the switch fabric , and a signaling interface for coupling the processor to a signaling link , for communication with an external network node , wherein : in response to a virtual office equipment number received via the signaling interface , the processor retrieves a subscriber profile corresponding to the virtual office equipment number from the mass storage , uses the retrieved profile to process a selective connection through the switch fabric between two of the interface units and forwards the data representing identity from the retrieved profile through one of the interface units .

US7979277B2
CLAIM 3
. A speech recognition circuit as claimed in claim 1 , comprising dynamic scheduling whether the first processor (control instructions) should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
US6038305A
CLAIM 45
. A telephone central office switching system comprising : interface units for coupling to a plurality of communication links ;
a switch fabric for providing selective communication connections between the interface units ;
and an administrative module for controlling connections provided by the switch fabric , wherein the administrative module comprises : mass storage containing subscriber profiles , each of at least some of the subscriber profiles including data representing identity of a specific , associated subscriber , a processor for providing control instructions (first processor) to the switch fabric , and a signaling interface for coupling the processor to a signaling link , for communication with an external network node , wherein : in response to a virtual office equipment number received via the signaling interface , the processor retrieves a subscriber profile corresponding to the virtual office equipment number from the mass storage , uses the retrieved profile to process a selective connection through the switch fabric between two of the interface units and forwards the data representing identity from the retrieved profile through one of the interface units .

US7979277B2
CLAIM 4
. A speech recognition circuit as claimed in claim 1 , wherein the first processor (control instructions) supports multi-threaded operation , and runs the search stage and front ends as separate threads .
US6038305A
CLAIM 45
. A telephone central office switching system comprising : interface units for coupling to a plurality of communication links ;
a switch fabric for providing selective communication connections between the interface units ;
and an administrative module for controlling connections provided by the switch fabric , wherein the administrative module comprises : mass storage containing subscriber profiles , each of at least some of the subscriber profiles including data representing identity of a specific , associated subscriber , a processor for providing control instructions (first processor) to the switch fabric , and a signaling interface for coupling the processor to a signaling link , for communication with an external network node , wherein : in response to a virtual office equipment number received via the signaling interface , the processor retrieves a subscriber profile corresponding to the virtual office equipment number from the mass storage , uses the retrieved profile to process a selective connection through the switch fabric between two of the interface units and forwards the data representing identity from the retrieved profile through one of the interface units .

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal for a predetermined time frame (speech data) .
US6038305A
CLAIM 3
. A method as in claim 1 , wherein the step of receiving and processing speech signals comprises : receiving signals from the person over the predetermined link , and comparing information characteristic of the received speech signals to stored speech data (audio time frame, time frame) corresponding to a plurality of subscribers associated with the predetermined link .

US7979277B2
CLAIM 10
. The speech recognition circuit of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory (speech signal) .
US6038305A
CLAIM 1
. A method comprising : detecting a request to make a call from a predetermined link through a multiple link communication network ;
receiving and processing speech signal (result memory) s from a person requesting the call via the predetermined link to identify the person as an individual one of a plurality of subscribers to services offered through the communication network ;
instructing a switching office of the communication network to select one subscriber service profile corresponding to the identified individual subscriber , from a plurality of stored profiles , for processing of the call from the predetermined link through the network to a destination station ;
and based on data from the one selected subscriber service profile , transmitting data representing the identity of the identified individual subscriber to the destination station .

US7979277B2
CLAIM 12
. The speech recognition circuit of claim 1 , wherein the audio front end is configured to input a digital audio signal (receiving signals) .
US6038305A
CLAIM 3
. A method as in claim 1 , wherein the step of receiving and processing speech signals comprises : receiving signals (digital audio signal) from the person over the predetermined link , and comparing information characteristic of the received speech signals to stored speech data corresponding to a plurality of subscribers associated with the predetermined link .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US6038305A
CLAIM 3
. A method as in claim 1 , wherein the step of receiving and processing speech signals comprises : receiving signals from the person over the predetermined link , and comparing information characteristic of the received speech signals to stored speech data (audio time frame, time frame) corresponding to a plurality of subscribers associated with the predetermined link .

US7979277B2
CLAIM 15
. A speech recognition method (voice authentication) , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US6038305A
CLAIM 3
. A method as in claim 1 , wherein the step of receiving and processing speech signals comprises : receiving signals from the person over the predetermined link , and comparing information characteristic of the received speech signals to stored speech data (audio time frame, time frame) corresponding to a plurality of subscribers associated with the predetermined link .

US6038305A
CLAIM 15
. A telecommunication network comprising : a central office for processing calls originated over a plurality of communication links , said central office including mass storage containing subscriber service profiles for a plurality of subscribers each having an individually assigned office equipment number ;
and a peripheral coupled to the central office , said peripheral including a voice authentication (speech recognition method) module for analyzing speech of a caller from one communication link to identify the caller as one individual from among the subscribers and provide an office equipment number assigned to the identified individual subscriber to the central office , wherein : the central office retrieves one of the subscriber service profiles , which corresponds to the office equipment number assigned to the identified individual subscriber and processes at least one call over a communication link using information from the retrieved profile , and the profile information includes at least some data identifying the individual caller for use in caller identification services .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method (voice authentication) , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (speech data) (speech data) ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification .
US6038305A
CLAIM 3
. A method as in claim 1 , wherein the step of receiving and processing speech signals comprises : receiving signals from the person over the predetermined link , and comparing information characteristic of the received speech signals to stored speech data (audio time frame, time frame) corresponding to a plurality of subscribers associated with the predetermined link .

US6038305A
CLAIM 15
. A telecommunication network comprising : a central office for processing calls originated over a plurality of communication links , said central office including mass storage containing subscriber service profiles for a plurality of subscribers each having an individually assigned office equipment number ;
and a peripheral coupled to the central office , said peripheral including a voice authentication (speech recognition method) module for analyzing speech of a caller from one communication link to identify the caller as one individual from among the subscribers and provide an office equipment number assigned to the identified individual subscriber to the central office , wherein : the central office retrieves one of the subscriber service profiles , which corresponds to the office equipment number assigned to the identified individual subscriber and processes at least one call over a communication link using information from the retrieved profile , and the profile information includes at least some data identifying the individual caller for use in caller identification services .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
CN1171592A

Filed: 1997-04-30     Issued: 1998-01-28

采用连续密度隐藏式马尔克夫模型的语音识别方法和系统

(Original Assignee) Microsoft Corp     (Current Assignee) Microsoft Technology Licensing LLC

黄学东, 米林德·V·马哈简
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states (的状态) of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
CN1171592A
CLAIM 12
. 一种在计算机可读存储介质中识别输入语音的方法,所说方法包括以下步骤:培训一组与前后音有关的连续密度隐藏式马尔克夫模型,以表示语音的一组音素单位,所说培训利用在一个给定时间间隔内表示所说语音的声特性的一定数量的语音培训数据,每个模型具有与变换相关的状态 (acoustic states) ,每个状态表示音素单位的一部分并具有一个输出概率,所说输出概率指示一个语音的声特性出现在所说的音素单位的一部分中的概率;对于表示语音的同一音素单位的一组与前后音有关的连续密度隐藏式马尔克夫模型产生一个与前后音无关的连续密度隐藏式马尔克夫模型;产生一组成序列的与前后音有关的模型,每个序列表示一个语言表达式;对于每个序列的与前后音有关的模型,确定所说输入语音的声特性与在所说序列的与前后音有关模型中的状态匹配的声概率,所说声概率包括在该序列中的每个与前后音有关的模型的每个状态的输出概率和对应于同一音素单位的与前后音无关的模型的输出概率;和利用所说声概率识别与所说输入语音最接近匹配的语言表达式。

US7979277B2
CLAIM 11
. The speech recognition circuit of claim 1 , comprising increasing the pipeline depth by computing extra front frames (一个表) in advance .
CN1171592A
CLAIM 8
. 如权利要求5所述的一种方法,其特征在于:产生一组与前后音有关的句音的步骤还包括以下步骤:利用代表语音的一定数量的培训数据培训所说与前后音有关的句音的步骤;对于每一个与前后音有关的句音产生一个表 (computing extra front frames) 示用于估算所说句音的培训数据的数量的加权因子;和确定输出概率的步骤还包括根据所说加权因子组合所说的与前后音有关的句音和与前后音无关的句音的步骤。

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means (计算机系) for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
CN1171592A
CLAIM 1
. 一种在计算机系 (calculating means) 统中使输入语音与语言表达式匹配的方法,该方法包括以下步骤:对于语音的许多音素单位中的每一个,提供一组较为细致的声模型和一个不大细致的声模型表示该音素单位,每个声模型具有一组状态,其后是一组变换,每种状态表示在某一时间点出现在该音素单位中的语音的一部分,并具有一个输出概率,表示输入语音的一部分在某一时间点出现在该音素单位中的似然性;对于所选择的较为细致的声模型序列中的每一个,确定输入语音与这一序列匹配的接近程度,所说的匹配还包括以下步骤:对于所选择的一序列较为细致的声模型的每种状态,确定一个累计的输出概率,作为该状态和代表相同音素单位的不大细致的声模型的相同状态的输出概率的组合;以及确定与输入语音匹配最好的一个序列,这一序列代表语言表达式。




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
JPH10282986A

Filed: 1997-04-04     Issued: 1998-10-23

音声認識方法およびそのモデル設計方法

(Original Assignee) Hitachi Ltd; 株式会社日立製作所     

Tomohito Nakagawa, 智仁 中川, Hideo Maejima, 英雄 前島
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (音声データ, ワーク) (音声データ, ワーク) ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
JPH10282986A
CLAIM 4
【請求項4】請求項1に記載の音声認識システムのHM Mモデル設計方法において、 前記第3のステップは、ニューラルネットワーク (audio time frame, time frame) を用い ており、入力された学習データに基づいて誤差逆伝播法 により結合係数の調整を行い、パラメータを修正した 後、収束判定を行うことを特徴とするHMMモデル設計 方法。

JPH10282986A
CLAIM 7
【請求項7】請求項5または6に記載の連続分布型HM Mの出力確率計算方法において、 前記第1のステップは、音声データ (audio time frame, time frame) である特徴ベクトル をニューラルネットワークの入力層に入力し、該ニュー ラルネットワークの各ノードごとに計算を実行し、出力 に相当するノードの計算値を最終出力とすることを特徴 とする連続分布型HMMの出力確率計算方法。

US7979277B2
CLAIM 7
. The speech recognition circuit of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal for a predetermined time frame (音声データ, ワーク) .
JPH10282986A
CLAIM 4
【請求項4】請求項1に記載の音声認識システムのHM Mモデル設計方法において、 前記第3のステップは、ニューラルネットワーク (audio time frame, time frame) を用い ており、入力された学習データに基づいて誤差逆伝播法 により結合係数の調整を行い、パラメータを修正した 後、収束判定を行うことを特徴とするHMMモデル設計 方法。

JPH10282986A
CLAIM 7
【請求項7】請求項5または6に記載の連続分布型HM Mの出力確率計算方法において、 前記第1のステップは、音声データ (audio time frame, time frame) である特徴ベクトル をニューラルネットワークの入力層に入力し、該ニュー ラルネットワークの各ノードごとに計算を実行し、出力 に相当するノードの計算値を最終出力とすることを特徴 とする連続分布型HMMの出力確率計算方法。

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (音声データ, ワーク) (音声データ, ワーク) ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
JPH10282986A
CLAIM 4
【請求項4】請求項1に記載の音声認識システムのHM Mモデル設計方法において、 前記第3のステップは、ニューラルネットワーク (audio time frame, time frame) を用い ており、入力された学習データに基づいて誤差逆伝播法 により結合係数の調整を行い、パラメータを修正した 後、収束判定を行うことを特徴とするHMMモデル設計 方法。

JPH10282986A
CLAIM 7
【請求項7】請求項5または6に記載の連続分布型HM Mの出力確率計算方法において、 前記第1のステップは、音声データ (audio time frame, time frame) である特徴ベクトル をニューラルネットワークの入力層に入力し、該ニュー ラルネットワークの各ノードごとに計算を実行し、出力 に相当するノードの計算値を最終出力とすることを特徴 とする連続分布型HMMの出力確率計算方法。

US7979277B2
CLAIM 15
. A speech recognition method (システム) , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (音声データ, ワーク) (音声データ, ワーク) ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
JPH10282986A
CLAIM 1
【請求項1】隠れマルコフモデル(HMM)を用いた音 声認識方法のうち、HMMの出力確率を確率密度関数 (出力関数)で定義する連続分布型HMMを用いた音声 認識システム (speech recognition method) のモデル設計方法であって、 老若男女等の話者の特質ごとにカテゴリーに分類したサ ンプルを用いて学習し、各カテゴリごとに最適な出力関 数を決定する第1のステップと、 該第1のステップで決定された各カテゴリーごとの出力 関数を用いて、全体の出力関数を決定する第2のステッ プとを有することを特徴とする音声認識システムのHM Mモデル設計方法。

JPH10282986A
CLAIM 4
【請求項4】請求項1に記載の音声認識システムのHM Mモデル設計方法において、 前記第3のステップは、ニューラルネットワーク (audio time frame, time frame) を用い ており、入力された学習データに基づいて誤差逆伝播法 により結合係数の調整を行い、パラメータを修正した 後、収束判定を行うことを特徴とするHMMモデル設計 方法。

JPH10282986A
CLAIM 7
【請求項7】請求項5または6に記載の連続分布型HM Mの出力確率計算方法において、 前記第1のステップは、音声データ (audio time frame, time frame) である特徴ベクトル をニューラルネットワークの入力層に入力し、該ニュー ラルネットワークの各ノードごとに計算を実行し、出力 に相当するノードの計算値を最終出力とすることを特徴 とする連続分布型HMMの出力確率計算方法。

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method (システム) , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame (音声データ, ワーク) (音声データ, ワーク) ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation (計算方法) , to the distance calculation (計算方法) , and to the word identification .
JPH10282986A
CLAIM 1
【請求項1】隠れマルコフモデル(HMM)を用いた音 声認識方法のうち、HMMの出力確率を確率密度関数 (出力関数)で定義する連続分布型HMMを用いた音声 認識システム (speech recognition method) のモデル設計方法であって、 老若男女等の話者の特質ごとにカテゴリーに分類したサ ンプルを用いて学習し、各カテゴリごとに最適な出力関 数を決定する第1のステップと、 該第1のステップで決定された各カテゴリーごとの出力 関数を用いて、全体の出力関数を決定する第2のステッ プとを有することを特徴とする音声認識システムのHM Mモデル設計方法。

JPH10282986A
CLAIM 4
【請求項4】請求項1に記載の音声認識システムのHM Mモデル設計方法において、 前記第3のステップは、ニューラルネットワーク (audio time frame, time frame) を用い ており、入力された学習データに基づいて誤差逆伝播法 により結合係数の調整を行い、パラメータを修正した 後、収束判定を行うことを特徴とするHMMモデル設計 方法。

JPH10282986A
CLAIM 5
【請求項5】HMMを用いた音声認識方法のうち、HM Mの出力確率を出力関数で定義する連続分布型HMMの 出力確率計算方法 (feature calculation, distance calculation) において、 上記各HMMの状態ごとに、各カテゴリごとの出力関数 を決定するパラメータを備え、入力された音声の分析情 報から、話者のカテゴリをその特質で明確に推定する第 1のステップと、 該第1のステップで推定したカテゴリの情報と音声の分 析情報から、出力確率を決定する第2のステップとを有 することを特徴とする連続分布型HMMの出力確率計算 方法。

JPH10282986A
CLAIM 7
【請求項7】請求項5または6に記載の連続分布型HM Mの出力確率計算方法において、 前記第1のステップは、音声データ (audio time frame, time frame) である特徴ベクトル をニューラルネットワークの入力層に入力し、該ニュー ラルネットワークの各ノードごとに計算を実行し、出力 に相当するノードの計算値を最終出力とすることを特徴 とする連続分布型HMMの出力確率計算方法。




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6208638B1

Filed: 1997-04-01     Issued: 2001-03-27

Method and apparatus for transmission and retrieval of facsimile and audio messages over a circuit or packet switched network

(Original Assignee) j 2 Global Communications Inc     (Current Assignee) j 2 Global Communications Inc ; J2 Cloud Services LLC

Jack Rieley, Jaye Muller
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor (digital representation) , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US6208638B1
CLAIM 1
. A system comprising : a set of switches coupled to a circuit switched network for receiving a set of incoming call signals , wherein the incoming call signal includes an inbound address , and wherein a switch in the set of switches redirects an incoming can signal from a first communications server to a second communications server if a first condition occurs ;
and , a set of communications servers coupled to the set of switches for receiving the set of incoming call signals , each communications server being coupled to a network and containing a message processing resource configured to process a received audio message into a digital representation (second processor) , wherein each communications server further comprises a trunk line interface to extract the inbound address and stores the inbound address , a set of final destination addresses and account status , and the message processing resource is further configured to determine , based on the inbound address , a user account and a destination on a packet switched network and send the digital representation to the destination , wherein the inbound address is assigned to the user account and the outbound address comprises at least one email address .

US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator (voice message) has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
US6208638B1
CLAIM 12
. The system of claim 1 , where the message processing resource further comprises a processor to : determine if the audio message contains a facsimile message or a voice message (speech accelerator) ;
and , digitize the audio message if the audio message contains the voice message and receive the facsimile message if the audio message contains the facsimile message .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6021181A

Filed: 1997-02-24     Issued: 2000-02-01

Electronic voice mail message handling system

(Original Assignee) Wildfire Communications Inc     (Current Assignee) Orange SA

Richard A. Miner, David M. Pelland, William J. Warner, Nancy Benovich Gilby
US7979277B2
CLAIM 9
. The speech recognition circuit of claim 1 , wherein the speech accelerator (voice message) has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
US6021181A
CLAIM 1
. A message handling method implemented by a computer-based electronic assistant , said method comprising : storing a voice message (speech accelerator) that includes a recorded message and stored information describing said voice message , and wherein said voice message is a reply to a first message ;
receiving a command of a first command type from a caller ;
in response to receiving said command of said first command type , playing said recorded message to said caller ;
after playing said recorded contents , receiving a command of a second command type from said caller ;
in response to receiving the command of said second command type , reporting to said caller said stored information describing said voice message ;
and after reporting said stored information , receiving a further command of said second type ;
and in response to receiving the further command of said second command type , playing said first message to which the voice message was a reply .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification (second command) .
US6021181A
CLAIM 1
. A message handling method implemented by a computer-based electronic assistant , said method comprising : storing a voice message that includes a recorded message and stored information describing said voice message , and wherein said voice message is a reply to a first message ;
receiving a command of a first command type from a caller ;
in response to receiving said command of said first command type , playing said recorded message to said caller ;
after playing said recorded contents , receiving a command of a second command (word identification) type from said caller ;
in response to receiving the command of said second command type , reporting to said caller said stored information describing said voice message ;
and after reporting said stored information , receiving a further command of said second type ;
and in response to receiving the further command of said second command type , playing said first message to which the voice message was a reply .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US6018710A

Filed: 1996-12-13     Issued: 2000-01-25

Web-based interactive radio environment: WIRE

(Original Assignee) Siemens Corporate Research Inc     (Current Assignee) Siemens Corp

Michael J. Wynblatt, Arding Hsu, Daniel C. Benson
US7979277B2
CLAIM 1
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit (calculation means) for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US6018710A
CLAIM 11
. A non-visual browsing environment for the world-wide web comprising : a system for rendering structured documents using audio ;
an interface for information exchange to users ;
and , a non-keyword based search system ;
wherein said system for rendering structured documents using audio comprises : a pre-rendering system which converts a structured document into an intermediate document ;
and , an audio rendering system which actually generates an audio output ;
wherein said pre-rendering system comprises : a segmenting document system for dividing said structured document into logical segments ;
a categorizing document system for categorizing said logical segments as either navigation segments or content segments ;
a document sectioning system for determining section structure of said structured document ;
and , a speech mark-up information system for creating said intermediate document which can be interpreted by a text-to-speech engine ;
wherein said categorizing document system comprises : calculation means (calculating circuit) for calculating a link density of each of said logical segments according to the formula : ##EQU2## where D is said link density , C HREF is a number of non-tag characters in each of said logical segments which appear inside of HREF tags , C is a total number of non-tag characters in each of said logical segments , L I is a number of hyperlinks within image maps in each of said logical segments and K represents a weight given to image map links .

US7979277B2
CLAIM 5
. A speech recognition circuit as claimed in claim 1 , wherein the said calculating circuit (calculation means) is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
US6018710A
CLAIM 11
. A non-visual browsing environment for the world-wide web comprising : a system for rendering structured documents using audio ;
an interface for information exchange to users ;
and , a non-keyword based search system ;
wherein said system for rendering structured documents using audio comprises : a pre-rendering system which converts a structured document into an intermediate document ;
and , an audio rendering system which actually generates an audio output ;
wherein said pre-rendering system comprises : a segmenting document system for dividing said structured document into logical segments ;
a categorizing document system for categorizing said logical segments as either navigation segments or content segments ;
a document sectioning system for determining section structure of said structured document ;
and , a speech mark-up information system for creating said intermediate document which can be interpreted by a text-to-speech engine ;
wherein said categorizing document system comprises : calculation means (calculating circuit) for calculating a link density of each of said logical segments according to the formula : ##EQU2## where D is said link density , C HREF is a number of non-tag characters in each of said logical segments which appear inside of HREF tags , C is a total number of non-tag characters in each of said logical segments , L I is a number of hyperlinks within image maps in each of said logical segments and K represents a weight given to image map links .

US7979277B2
CLAIM 14
. A speech recognition circuit , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means (other parameters) for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US6018710A
CLAIM 7
. A non-visual browsing environment for the world-wide web as claimed in claim 4 wherein said speech mark-up information system comprises : meta-information generation means for producing meta-information in a form of commands which will cause said text-to-speech engine to vary voice , tone , rate , and other parameters (calculating means) to adequately convey information within said structured document .

US7979277B2
CLAIM 15
. A speech recognition method , comprising : calculating a feature vector from an audio signal using an audio front end , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model using a calculating circuit (calculation means) ;

and using a search stage to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words ;

wherein data is pipelined from the front end , to the calculating circuit , and to the search stage .
US6018710A
CLAIM 11
. A non-visual browsing environment for the world-wide web comprising : a system for rendering structured documents using audio ;
an interface for information exchange to users ;
and , a non-keyword based search system ;
wherein said system for rendering structured documents using audio comprises : a pre-rendering system which converts a structured document into an intermediate document ;
and , an audio rendering system which actually generates an audio output ;
wherein said pre-rendering system comprises : a segmenting document system for dividing said structured document into logical segments ;
a categorizing document system for categorizing said logical segments as either navigation segments or content segments ;
a document sectioning system for determining section structure of said structured document ;
and , a speech mark-up information system for creating said intermediate document which can be interpreted by a text-to-speech engine ;
wherein said categorizing document system comprises : calculation means (calculating circuit) for calculating a link density of each of said logical segments according to the formula : ##EQU2## where D is said link density , C HREF is a number of non-tag characters in each of said logical segments which appear inside of HREF tags , C is a total number of non-tag characters in each of said logical segments , L I is a number of hyperlinks within image maps in each of said logical segments and K represents a weight given to image map links .




US7979277B2

Filed: 2004-09-14     Issued: 2011-07-12

Speech recognition circuit and method

(Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd

Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
US5881134A

Filed: 1996-09-03     Issued: 1999-03-09

Intelligent call processing platform for home telephone system

(Original Assignee) Voice Control Systems Inc     (Current Assignee) Philips North America LLC

Peter J. Foster, Bernard F. Bareis
US7979277B2
CLAIM 1
. A speech recognition circuit (first voice) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end and said search stage are implemented using a first processor , and said calculating circuit is implemented using a second processor , and wherein data is pipelined from the front end to the calculating circuit to the search stage .
US5881134A
CLAIM 1
. A control platform for use between a remote switching system and an on-site telephone system including at least one telephone , the platform comprising : means for interfacing the platform directly to the on-site telephone system ;
a trigger recognizer for recognizing voice or DTMF command triggers input through the at least one telephone ;
announcing means for providing predetermined messages ;
processor means , operative under the control of a program stored therein and responsive to receipt of an incoming call from the remote switching system , for identifying a source identifier of the incoming call ;
and the processor means , further operative under the control of the program and responsive to receipt by the trigger recognizer of a first voice (speech recognition circuit, speech accelerator) or DTMF command trigger during the ringing stage of the incoming call , for controlling the announcing means to announce to a user an identification associated with the source identifier of the incoming call .

US7979277B2
CLAIM 2
. A speech recognition circuit (first voice) as claimed in claim 1 , wherein the pipelining comprises alternating of front end and search stage processing on the first processor .
US5881134A
CLAIM 1
. A control platform for use between a remote switching system and an on-site telephone system including at least one telephone , the platform comprising : means for interfacing the platform directly to the on-site telephone system ;
a trigger recognizer for recognizing voice or DTMF command triggers input through the at least one telephone ;
announcing means for providing predetermined messages ;
processor means , operative under the control of a program stored therein and responsive to receipt of an incoming call from the remote switching system , for identifying a source identifier of the incoming call ;
and the processor means , further operative under the control of the program and responsive to receipt by the trigger recognizer of a first voice (speech recognition circuit, speech accelerator) or DTMF command trigger during the ringing stage of the incoming call , for controlling the announcing means to announce to a user an identification associated with the source identifier of the incoming call .

US7979277B2
CLAIM 3
. A speech recognition circuit (first voice) as claimed in claim 1 , comprising dynamic scheduling whether the first processor should run the front end or search stage code , based on availability or unavailability of distance results and/or availability of space for storing more feature vectors and/or distance results .
US5881134A
CLAIM 1
. A control platform for use between a remote switching system and an on-site telephone system including at least one telephone , the platform comprising : means for interfacing the platform directly to the on-site telephone system ;
a trigger recognizer for recognizing voice or DTMF command triggers input through the at least one telephone ;
announcing means for providing predetermined messages ;
processor means , operative under the control of a program stored therein and responsive to receipt of an incoming call from the remote switching system , for identifying a source identifier of the incoming call ;
and the processor means , further operative under the control of the program and responsive to receipt by the trigger recognizer of a first voice (speech recognition circuit, speech accelerator) or DTMF command trigger during the ringing stage of the incoming call , for controlling the announcing means to announce to a user an identification associated with the source identifier of the incoming call .

US7979277B2
CLAIM 4
. A speech recognition circuit (first voice) as claimed in claim 1 , wherein the first processor supports multi-threaded operation , and runs the search stage and front ends as separate threads .
US5881134A
CLAIM 1
. A control platform for use between a remote switching system and an on-site telephone system including at least one telephone , the platform comprising : means for interfacing the platform directly to the on-site telephone system ;
a trigger recognizer for recognizing voice or DTMF command triggers input through the at least one telephone ;
announcing means for providing predetermined messages ;
processor means , operative under the control of a program stored therein and responsive to receipt of an incoming call from the remote switching system , for identifying a source identifier of the incoming call ;
and the processor means , further operative under the control of the program and responsive to receipt by the trigger recognizer of a first voice (speech recognition circuit, speech accelerator) or DTMF command trigger during the ringing stage of the incoming call , for controlling the announcing means to announce to a user an identification associated with the source identifier of the incoming call .

US7979277B2
CLAIM 5
. A speech recognition circuit (first voice) as claimed in claim 1 , wherein the said calculating circuit is configured to autonomously calculate distances for every acoustic state defined by the acoustic model .
US5881134A
CLAIM 1
. A control platform for use between a remote switching system and an on-site telephone system including at least one telephone , the platform comprising : means for interfacing the platform directly to the on-site telephone system ;
a trigger recognizer for recognizing voice or DTMF command triggers input through the at least one telephone ;
announcing means for providing predetermined messages ;
processor means , operative under the control of a program stored therein and responsive to receipt of an incoming call from the remote switching system , for identifying a source identifier of the incoming call ;
and the processor means , further operative under the control of the program and responsive to receipt by the trigger recognizer of a first voice (speech recognition circuit, speech accelerator) or DTMF command trigger during the ringing stage of the incoming call , for controlling the announcing means to announce to a user an identification associated with the source identifier of the incoming call .

US7979277B2
CLAIM 6
. The speech recognition circuit (first voice) of claim 1 , comprising control means adapted to implement frame dropping , to discard one or more audio time frames .
US5881134A
CLAIM 1
. A control platform for use between a remote switching system and an on-site telephone system including at least one telephone , the platform comprising : means for interfacing the platform directly to the on-site telephone system ;
a trigger recognizer for recognizing voice or DTMF command triggers input through the at least one telephone ;
announcing means for providing predetermined messages ;
processor means , operative under the control of a program stored therein and responsive to receipt of an incoming call from the remote switching system , for identifying a source identifier of the incoming call ;
and the processor means , further operative under the control of the program and responsive to receipt by the trigger recognizer of a first voice (speech recognition circuit, speech accelerator) or DTMF command trigger during the ringing stage of the incoming call , for controlling the announcing means to announce to a user an identification associated with the source identifier of the incoming call .

US7979277B2
CLAIM 7
. The speech recognition circuit (first voice) of claim 1 , wherein the feature vector comprises a plurality of spectral components of an audio signal for a predetermined time frame .
US5881134A
CLAIM 1
. A control platform for use between a remote switching system and an on-site telephone system including at least one telephone , the platform comprising : means for interfacing the platform directly to the on-site telephone system ;
a trigger recognizer for recognizing voice or DTMF command triggers input through the at least one telephone ;
announcing means for providing predetermined messages ;
processor means , operative under the control of a program stored therein and responsive to receipt of an incoming call from the remote switching system , for identifying a source identifier of the incoming call ;
and the processor means , further operative under the control of the program and responsive to receipt by the trigger recognizer of a first voice (speech recognition circuit, speech accelerator) or DTMF command trigger during the ringing stage of the incoming call , for controlling the announcing means to announce to a user an identification associated with the source identifier of the incoming call .

US7979277B2
CLAIM 8
. The speech recognition circuit (first voice) of claim 1 , wherein the processor is configured to divert to another task if the data flow stalls .
US5881134A
CLAIM 1
. A control platform for use between a remote switching system and an on-site telephone system including at least one telephone , the platform comprising : means for interfacing the platform directly to the on-site telephone system ;
a trigger recognizer for recognizing voice or DTMF command triggers input through the at least one telephone ;
announcing means for providing predetermined messages ;
processor means , operative under the control of a program stored therein and responsive to receipt of an incoming call from the remote switching system , for identifying a source identifier of the incoming call ;
and the processor means , further operative under the control of the program and responsive to receipt by the trigger recognizer of a first voice (speech recognition circuit, speech accelerator) or DTMF command trigger during the ringing stage of the incoming call , for controlling the announcing means to announce to a user an identification associated with the source identifier of the incoming call .

US7979277B2
CLAIM 9
. The speech recognition circuit (first voice) of claim 1 , wherein the speech accelerator (first voice) has an interrupt signal to inform the front end that the accelerator is ready to receive a next feature vector from the front end .
US5881134A
CLAIM 1
. A control platform for use between a remote switching system and an on-site telephone system including at least one telephone , the platform comprising : means for interfacing the platform directly to the on-site telephone system ;
a trigger recognizer for recognizing voice or DTMF command triggers input through the at least one telephone ;
announcing means for providing predetermined messages ;
processor means , operative under the control of a program stored therein and responsive to receipt of an incoming call from the remote switching system , for identifying a source identifier of the incoming call ;
and the processor means , further operative under the control of the program and responsive to receipt by the trigger recognizer of a first voice (speech recognition circuit, speech accelerator) or DTMF command trigger during the ringing stage of the incoming call , for controlling the announcing means to announce to a user an identification associated with the source identifier of the incoming call .

US7979277B2
CLAIM 10
. The speech recognition circuit (first voice) of claim 1 , wherein the accelerator signals to the search stage when the distances for a new frame are available in a result memory .
US5881134A
CLAIM 1
. A control platform for use between a remote switching system and an on-site telephone system including at least one telephone , the platform comprising : means for interfacing the platform directly to the on-site telephone system ;
a trigger recognizer for recognizing voice or DTMF command triggers input through the at least one telephone ;
announcing means for providing predetermined messages ;
processor means , operative under the control of a program stored therein and responsive to receipt of an incoming call from the remote switching system , for identifying a source identifier of the incoming call ;
and the processor means , further operative under the control of the program and responsive to receipt by the trigger recognizer of a first voice (speech recognition circuit, speech accelerator) or DTMF command trigger during the ringing stage of the incoming call , for controlling the announcing means to announce to a user an identification associated with the source identifier of the incoming call .

US7979277B2
CLAIM 11
. The speech recognition circuit (first voice) of claim 1 , comprising increasing the pipeline depth by computing extra front frames in advance .
US5881134A
CLAIM 1
. A control platform for use between a remote switching system and an on-site telephone system including at least one telephone , the platform comprising : means for interfacing the platform directly to the on-site telephone system ;
a trigger recognizer for recognizing voice or DTMF command triggers input through the at least one telephone ;
announcing means for providing predetermined messages ;
processor means , operative under the control of a program stored therein and responsive to receipt of an incoming call from the remote switching system , for identifying a source identifier of the incoming call ;
and the processor means , further operative under the control of the program and responsive to receipt by the trigger recognizer of a first voice (speech recognition circuit, speech accelerator) or DTMF command trigger during the ringing stage of the incoming call , for controlling the announcing means to announce to a user an identification associated with the source identifier of the incoming call .

US7979277B2
CLAIM 12
. The speech recognition circuit (first voice) of claim 1 , wherein the audio front end is configured to input a digital audio signal .
US5881134A
CLAIM 1
. A control platform for use between a remote switching system and an on-site telephone system including at least one telephone , the platform comprising : means for interfacing the platform directly to the on-site telephone system ;
a trigger recognizer for recognizing voice or DTMF command triggers input through the at least one telephone ;
announcing means for providing predetermined messages ;
processor means , operative under the control of a program stored therein and responsive to receipt of an incoming call from the remote switching system , for identifying a source identifier of the incoming call ;
and the processor means , further operative under the control of the program and responsive to receipt by the trigger recognizer of a first voice (speech recognition circuit, speech accelerator) or DTMF command trigger during the ringing stage of the incoming call , for controlling the announcing means to announce to a user an identification associated with the source identifier of the incoming call .

US7979277B2
CLAIM 13
. A speech recognition circuit (first voice) of claim 1 , wherein said distance comprises a Mahalanobis distance .
US5881134A
CLAIM 1
. A control platform for use between a remote switching system and an on-site telephone system including at least one telephone , the platform comprising : means for interfacing the platform directly to the on-site telephone system ;
a trigger recognizer for recognizing voice or DTMF command triggers input through the at least one telephone ;
announcing means for providing predetermined messages ;
processor means , operative under the control of a program stored therein and responsive to receipt of an incoming call from the remote switching system , for identifying a source identifier of the incoming call ;
and the processor means , further operative under the control of the program and responsive to receipt by the trigger recognizer of a first voice (speech recognition circuit, speech accelerator) or DTMF command trigger during the ringing stage of the incoming call , for controlling the announcing means to announce to a user an identification associated with the source identifier of the incoming call .

US7979277B2
CLAIM 14
. A speech recognition circuit (first voice) , comprising : an audio front end for calculating a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

calculating means for calculating a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and a search stage for using said calculated distances to identify words within a lexical tree , the lexical tree comprising a model of words ;

wherein said audio front end , said calculating means , and said search stage are connected to each other to enable pipelined data flow .
US5881134A
CLAIM 1
. A control platform for use between a remote switching system and an on-site telephone system including at least one telephone , the platform comprising : means for interfacing the platform directly to the on-site telephone system ;
a trigger recognizer for recognizing voice or DTMF command triggers input through the at least one telephone ;
announcing means for providing predetermined messages ;
processor means , operative under the control of a program stored therein and responsive to receipt of an incoming call from the remote switching system , for identifying a source identifier of the incoming call ;
and the processor means , further operative under the control of the program and responsive to receipt by the trigger recognizer of a first voice (speech recognition circuit, speech accelerator) or DTMF command trigger during the ringing stage of the incoming call , for controlling the announcing means to announce to a user an identification associated with the source identifier of the incoming call .

US7979277B2
CLAIM 16
. A non-transitory storage medium storing processor implementable code for controlling at least one processor to implement a speech recognition method , the code comprising : code for controlling the processor to calculate a feature vector from an audio signal , wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame ;

code for controlling the processor to calculate a distance indicating the similarity between a feature vector and a predetermined acoustic state of an acoustic model ;

and code for controlling the processor to identify words within a lexical tree using said calculated distances , the lexical tree comprising a model of words , wherein data is pipelined by the processor pursuant to the code from the feature calculation , to the distance calculation , and to the word identification (second command) .
US5881134A
CLAIM 2
. The control platform of claim 1 , wherein the processor means is further operative under the control of the program and responsive to receipt by the trigger recognizer of a second command (word identification) trigger entered by the user , for forwarding the incoming call to a selected service .