Apex Standards Claim Construction


Patent: US7979277B2
Filed: 2004-09-14
Issued: 2011-07-12
Patent Holder: (Original Assignee) Zentian Ltd     (Current Assignee) Zentian Ltd
Inventor(s): Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds

Title: Speech recognition circuit and method

Abstract: A speech recognition circuit comprising a circuit for providing state identifiers which identify states corresponding to nodes or groups of adjacent nodes in a lexical tree, and for providing scores corresponding to said state identifiers, the lexical tree comprising a model of words; a memory structure for receiving and storing state identifiers identified by a node identifier identifying a node or group of adjacent nodes, said memory structure being adapted to allow lookup to identify particular state identifiers, reading of the scores corresponding to the state identifiers, and writing back of the scores to the memory structure after modification of the scores; an accumulator for receiving score updates corresponding to particular state identifiers from a score update generating circuit which generates the score updates using audio input, for receiving scores from the memory structure, and for modifying said scores by adding said score updates to said scores; and a selector circuit for selecting at least one node or group of adjacent nodes of the lexical tree according to said scores.

The First Claim: 1. A speech recognition circuit, comprising: an audio front end for calculating a feature vector from an audio signal, wherein the feature vector comprises a plurality of extracted and/or derived quantities from said audio signal during a defined audio time frame; a calculating circuit for calculating distances indicating the similarity between a feature vector and a plurality of predetermined acoustic states of an acoustic model; and a search stage for using said calculated distances to identify words within a lexical tree, the lexical tree comprising a model of words; wherein said audio front end and said search stage are implemented using a first processor, and said calculating circuit is implemented using a second processor, and wherein data is pipelined from the front end to the calculating circuit to the search stage.


Disclaimer: The promise of Apex Standards Claim Construction (CC) is that it will conduct the Broadest Reasonable, Ordinary, or Customary Interpretation for the claim elements of a target patent or the technical specification languages of an industrial standard. Therefore, a top-down, apriori evaluation is enabled, allowing stakeholders to swiftly and effectively analyze the relevance and scope of terms and phrases under consideration prior to making complex, high-value judgments. CC is intended to alleviate the initial burden of evidence by providing an exhaustive list of contextual semantic understandings that can be used as building blocks in the development of a prosecution-ready, licensing-ready, or litigation-ready work product. Stakeholders can then utilize the CC to revise the original claim or technical languages, as well as to find more relevant arts in order to build strategy and accomplish additional goals.



Reference Claim Element / Subject Matter Feature Meaning Claim Construction & Interpretation
[1] 1 .

A speech recognition circuit [1]
speech recognition circuit [1] [Meaning 1] system
[Meaning 2] device
[Meaning 3] decoder
[Meaning 4] processor
[Meaning 5] circuit
[Meaning 6] speech recognizer
[Meaning 7] recognition system
[Meaning 8] search system
[Meaning 9] computer system
[Meaning 10] voice recognizer
[Meaning 11] spoken language recognizer
[Meaning 12] computer implemented speech recognizer
[Meaning 13] machine learning speech recognizer
[Meaning 14] system for recognizing spoken words
[Interpretation 1] voice recognition system for recognizing spoken words in an audio signal comprising at least one of speech and audio data and for generating corresponding recognition results
[Interpretation 2] speech recognition system for recognizing spoken words in an audio signal from which the spoken words are to be recognized
[Interpretation 3] voice recognition system for recognizing spoken words in an audio signal and for generating an output signal indicating said words
[Interpretation 4] method and apparatus for identifying words within an audio signal using acoustic models and lexical trees for the identified words
[2] , comprising (wherein the circuit comprises, the recognition circuit comprising, which comprises at least, comprising the following elements, comprising at least the following, which is characterized by comprising, in particular an hmm circuit comprising) : an audio front end for calculating (performing the extraction of, receiving and extract of, extract and or deriving, obtaining at least part of, performing an analysis to extract, generating and processing at least, extract or deriving quantities of) a feature vector [2] feature vector [2] [Meaning 1] vector
[Meaning 2] feature
[Meaning 3] characteristic
[Meaning 4] signal
[Meaning 5] features vector
[Meaning 6] parameter vector
[Meaning 7] vector feature
[Meaning 8] predetermined feature vector
[Meaning 9] feature vector derived
[Meaning 10] plurality of feature vectors
[Meaning 11] time domain feature vector
[Meaning 12] feature vector of speech
[Meaning 13] feature vector of an acoustic model
[Meaning 14] vector of features of speech extracted
[Interpretation 1] feature vector from an audio signal received from an audio input device or for deriving and storing said feature vector
[Interpretation 2] time domain feature vector of an audio signal or an audio signal and for extract and or deriving feature vectors
[3] from an audio signal [3] signal [3] [Meaning 1] input
[Meaning 2] signals
[Meaning 3] stream
[Meaning 4] sample
[Meaning 5] input signal
[Meaning 6] speech signal
[Meaning 7] audio signal
[Meaning 8] data signal
[Meaning 9] signal input
[Meaning 10] signal being processed
[Meaning 11] signal of an utterance
[Meaning 12] stream of audio signals
[Meaning 13] input signal comprising speech
[Meaning 14] stream comprising an audio signal
[Interpretation 1] signal received from an audio input device and for output of said feature vector as part of said audio signal
[Interpretation 2] input signal and for using said calculated feature vector to identify words within an acoustic model of said audio signal
[Interpretation 3] sample of an audio signal and for storing said feature vector in an audio feature vector memory of the circuit
[4] , wherein the feature vector [2] comprises (comprises the values of, is calculated based on, indicates the distribution of, includes the values of, comprises at least one of, is calculated using values of, represents at least one quantity of) a plurality of extracted (values indicative of measured, features which are extracted, feature values representing measured, parameters which are measured, time series of measured, features representing the audio signal, time series of extracted quantities) and / or derived (calculated audio feature vector, calculated features that represent feature, computed features representing one or more, determined features and one or more, derived features of the audio signal, estimated features and corresponding time domain, calculated features of one or more physical) quantities [4] quantities [4] [Meaning 1] features
[Meaning 2] parameters
[Meaning 3] values
[Meaning 4] characteristics
[Meaning 5] samples
[Meaning 6] audio features
[Meaning 7] feature values
[Meaning 8] signal features
[Meaning 9] features calculated
[Meaning 10] data values
[Meaning 11] speech features extracted
[Meaning 12] feature values that occur
[Meaning 13] features of the audio signal
[Meaning 14] feature values of at least one audio frame
[Interpretation 1] parameters of the audio signal and wherein the audio front end is configured to extract and or derive said parameters
[Interpretation 2] feature values of the audio signal and wherein the audio front end is configured to calculate said feature vector only
[5] from said audio signal [3] during a defined (predetermined portion of an, first portion of an, time window of an, portion of an input, predefine long or short, plurality of portions of an, time window corresponding to an) audio time frame [5] time frame [5] [Meaning 1] period
[Meaning 2] interval
[Meaning 3] window
[Meaning 4] frame
[Meaning 5] duration
[Meaning 6] time period
[Meaning 7] analysis window
[Meaning 8] signal period
[Meaning 9] processing window
[Meaning 10] frame of time
[Meaning 11] or speech period
[Meaning 12] time of interest
[Meaning 13] portion of the signal
[Meaning 14] frame of the signal
[Interpretation 1] processing window of said audio signal and wherein the extracted and or derived quantities are stored in an audio buffer
[Interpretation 2] time window and wherein the extracted and said derived quantities are calculated using an algorithm based on an acoustic model
[6] ; a calculating circuit [6] circuit [6] [Meaning 1] circuitry
[Meaning 2] stage
[Meaning 3] means
[Meaning 4] unit
[Meaning 5] electronic circuit
[Meaning 6] circuit configured
[Meaning 7] circuitry stage
[Meaning 8] circuit comprising means
[Meaning 9] neural network circuit
[Meaning 10] and comparing circuit
[Meaning 11] or processing circuit
[Meaning 12] stage comprising calculating circuit
[Meaning 13] circuit coupled to said front end
[Meaning 14] means comprising at least one processor
[Interpretation 1] stage for using said feature vector to identify words within an acoustic model of said signal comprising an acoustic model
[Interpretation 2] means for calculating an acoustic model based on the feature vector and the audio time frame and comprising calculating means
[7] for calculating distances [7] distances [7] [Meaning 1] distance
[Meaning 2] differences
[Meaning 3] values
[Meaning 4] parameters
[Meaning 5] respective distances
[Meaning 6] corresponding distances
[Meaning 7] distance values
[Meaning 8] distances each
[Meaning 9] an acoustic distance
[Meaning 10] distances between states
[Meaning 11] one or more distances
[Meaning 12] at least one distance
[Meaning 13] one or more distances from said feature vector
[8] indicating (wherein the distances represent, that are indicative of, between features indicative of, from the measure of, between feature vectors based on, from the feature vector representing, from said feature vector indicative of) the similarity [8] similarity [8] [Meaning 1] distance
[Meaning 2] correspondence
[Meaning 3] differences
[Meaning 4] respective distances
[Meaning 5] euclidean distance
[Meaning 6] likelihood values
[Meaning 7] distance measures
[Meaning 8] degree of similarity
[Meaning 9] likelihood of correspondence
[Meaning 10] probability of correspondence
[Meaning 11] likelihood of an association
[Meaning 12] strength of the correspondence
[Meaning 13] magnitude of the differences
[Meaning 14] likelihood of an acoustic match
[Interpretation 1] likelihood of speech in said audio signal during said defined audio time frame using at least one of the distances
[9] between a feature vector [2] and a plurality of predetermined (feature vectors corresponding to, features corresponding to different, states comprising one or more, model vectors representing one or more, acoustic states selected from the plurality of, acoustic states of at least one set of, model vectors comprising the features of the different) acoustic states [9] states [9] [Meaning 1] features
[Meaning 2] parameters
[Meaning 3] vectors
[Meaning 4] characteristics
[Meaning 5] models
[Meaning 6] feature vectors
[Meaning 7] model vectors
[Meaning 8] reference vectors
[Meaning 9] parameter vectors
[Meaning 10] models or parameters
[Meaning 11] features as part
[Meaning 12] parameters forming part
[Meaning 13] feature vector representations
[Meaning 14] feature vectors forming part
[10] of an acoustic model ; and a search stage [10] search stage [10] [Meaning 1] searcher
[Meaning 2] stage
[Meaning 3] circuit
[Meaning 4] processor
[Meaning 5] searching stage
[Meaning 6] recognition stage
[Meaning 7] detection stage
[Meaning 8] first search stage
[Meaning 9] speech search stage
[Meaning 10] search stage configured
[Meaning 11] lexical tree search stage
[Meaning 12] search stage comprising means
[Meaning 13] word identification search stage
[Meaning 14] stage of the search
[Interpretation 1] recognition stage for identifying words within said acoustic model using said calculated distances and further comprising an additional search stage
[Interpretation 2] stage for searching for words in the audio signal using said calculated distances and said acoustic model comprising an algorithm
[11] for using said calculated distances [7] to identify (identify one or more, locate one or more, find one or more, determine the presence of, search for and identify, search for one or more, determine the likely occurrence of) words within a lexical tree , the lexical tree comprising a model of words ; wherein said audio front end and said search stage [10] are implemented (at least partially implemented, in each case implemented, coupled to each other, implemented in one chip, in communication with each other, integrated into one circuit implemented, implemented in parallel with each other) using a first processor [11] processor [11] [Meaning 1] microprocessor
[Meaning 2] computer
[Meaning 3] digital processor
[Meaning 4] hardware processor
[Meaning 5] processing unit
[Meaning 6] processor circuit
[Meaning 7] or main processor
[Meaning 8] program controlled processor
[Meaning 9] one or more processors
[Meaning 10] of two separate processors
[Meaning 11] processor having multiple cores
[Meaning 12] non program digital processor
[Meaning 13] multi core processor core
[Meaning 14] processor of the same type
[12] , and said calculating circuit [6] is implemented using a second processor [11] , and wherein data is pipelined from the front end to the calculating circuit [6] to the search stage [10] .