Purpose: Invalidity Analysis


Patent: US8990073B2
Filed: 2007-06-22
Issued: 2015-03-24
Patent Holder: (Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC
Inventor(s): Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami

Title: Method and device for sound activity detection and sound signal classification

Abstract: A device and method for estimating a tonal stability of a sound signal include: calculating a current residual spectrum of the sound signal; detecting peaks in the current residual spectrum; calculating a correlation map between the current residual spectrum and a previous residual spectrum for each detected peak; and calculating a long-term correlation map based on the calculated correlation map, the long-term correlation map being indicative of a tonal stability in the sound signal.




Disclaimer: The promise of Apex Standards Pseudo Claim Charting (PCC) [ Request Form ] is not to replace expert opinion but to provide due diligence and transparency prior to high precision charting. PCC conducts aggressive mapping (based on Broadest Reasonable, Ordinary or Customary Interpretation and Multilingual Translation) between a target patent's claim elements and other documents (potential technical standard specification or prior arts in the same or across different jurisdictions), therefore allowing for a top-down, apriori evaluation, with which, stakeholders can assess standard essentiality (potential strengths) or invalidity (potential weaknesses) quickly and effectively before making complex, high-value decisions. PCC is designed to relieve initial burden of proof via an exhaustive listing of contextual semantic mapping as potential building blocks towards a litigation-ready work product. Stakeholders may then use the mapping to modify upon shortlisted PCC or identify other relevant materials in order to formulate strategy and achieve further purposes.

Click on references to view corresponding claim charts.


Non-Patent Literature        WIPO Prior Art        EP Prior Art        US Prior Art        CN Prior Art        JP Prior Art        KR Prior Art

GroundReferencesOwner of the ReferenceTitleSemantic MappingChallenged Claims
1234567810111213141516171819202122242526272829303132333435363738394041
1

SPEECH COMMUNICATION. 30 (4): 207-221 APR 2000

(Verhelst, 2000)
Katholieke Universiteit Leuven (KU Leuven)Overlap-add Methods For Time-scaling Of Speech second order reconstruction method
background noise signal initial phase
XXXXXX
2

IEEE TRANSACTIONS ON SIGNAL PROCESSING. 39 (12): 2573-2592 DEC 1991

(Kates, 1991)
Center for Res. in Speech & Hearing Sci., City Univ of New York, NY, USAA TIME-DOMAIN DIGITAL COCHLEAR MODEL initial value traveling wave
second energy values filter section
XXXX
3

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING. 14 (4): 1218-1234 JUL 2006

(Chen, 2006)
Lucent Technologies, Inc., Université du Québec, Katholieke Universiteit Leuven (KU Leuven)New Insights Into The Noise Reduction Wiener Filter binary decision linear prediction coefficient
adaptive threshold noise reduction
noise character parameter, activity prediction parameter speech signal
noise ratio, SNR LT noise ratio
XXXXXXXXX
4

2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS. : 237-240 2004

(Fuchs, 2004)
TEMIC Speech Dialog Systems (SDS), Ulm (Germany)Noise Suppression For Automotive Applications Based On Directional Information noise energy estimates, noise estimates spatial information
sound signal noise attenuation
current frame energy, average frame energy adaptive beam
noise ratio, SNR LT noise ratio
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
5

IEEE TRANSACTIONS ON SIGNAL PROCESSING. 52 (5): 1149-1160 MAY 2004

(Cohen, 2004)
The Technion – Israel Institute of Technology (הטכניון – מכון טכנולוגי לישראל)Multichannel Post-filtering In Nonstationary Noise Environments initial value power spectral density
SNR calculation, SNR LT noise environment, noise power
average signal, sound signal noise component
noise energy estimates, second energy values proposed method
noise ratio clean signal
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
6

2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS. : 901-904 2002

(Cohen, 2002)
Lamar Signal Processing LtdMicrophone Array Post-filtering For Non-stationary Noise Suppression initial value additional reduction
sound signal, sound activity microphone array, noise component
noise energy estimates, second energy values proposed method
current frame energy, average frame energy adaptive beam
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
7

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA. 110 (6): 3218-3231 DEC 2001

(Liu, 2001)
University of Illinois, Motorola LabsA Two-microphone Dual Delay-line Approach For Extraction Of A Speech Sound In The Presence Of Multiple Interferers SNR av binaural processing
average frame energy sound sources
X
8

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING. 8 (2): 146-158 MAR 2000

(Graupe, 2000)
University of IllinoisBlind Adaptive Filtering Of Speech From Noise Of Unknown Spectrum Using A Virtual Feedback Configuration noise energy estimates, noise estimates adaptive filter
noise ratio, SNR LT noise ratio
XXXXXXXXXXX
9

2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI. : 1875-1878 2000

(Stahl, 2000)
Philips Research Lab, Aachen, GermanyQuantile Based Noise Estimation For Spectral Subtraction And Wiener Filtering current frame energy, average frame energy audio signal processing
frequency bins noise estimation
background noise signal, frequency dependent signal spectral domain
adaptive threshold noise reduction
noise character parameter, activity prediction parameter speech signal
SNR LT, SNR calculation noise power
XXXXXXXXXXXXXXXXX
10

JOURNAL OF THE AUDIO ENGINEERING SOCIETY. 47 (4): 240-251 APR 1999

(Jeong, 1999)
Korea Advanced Institute of Science and Technology (KAIST, 한국과학기술원)Implementation Of A New Algorithm Using The STFT With Variable Frequency Resolution For The Time-frequency Auditory Model frequency bin fast Fourier transform
music signal frequency analysis
second order analysis method
linear prediction residual error energies sound signals
XXXXXXXXX
11

PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6. : 1001-1004 1998

(Mizumachi, 1998)
Japan Advanced Institute of Science and Technology (JAIST 北陸先端科学技術大学院大学)Noise Reduction By Paired-microphones Using Spectral Subtraction sound signal, sound activity microphone array
adaptive threshold noise reduction
noise character parameter, activity prediction parameter speech signal
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
12

ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE. 89 (2): 43-53 2006

(Kato, 2006)
NEC Media & Information Research LabsNoise Suppression With High Speech Quality Based On Weighted Noise Estimation And MMSE STSA frequency bands, first frequency bands quality evaluations
frequency bins noise estimation, spectral gain
current frame energy noise suppressor
noise energy estimates, second energy values proposed method
noise ratio, SNR LT noise ratio
XXXXXXXXXXXXXXXXX
13

US20070088540A1

(Toshiyuki Ohta, 2007)
(Original Assignee) Fujitsu Ltd     

(Current Assignee)
Fujitsu Ltd
Voice data processing method and device noise ratio second value
SNR calculation signal data
XX
14

US20060130637A1

(Jean-Luc Crebouw, 2006)
(Original Assignee) Jean-Luc Crebouw     Method for differentiated digital voice and music processing, noise filtering, creation of special effects and device for carrying out said method noise estimator pre-processed signal
background noise signal first coefficient
activity prediction parameter negative value
linear prediction ambient noise
sound signal sound signal
XXXXXXXXXXXXXXXXXXXXXXXXXXXX
15

JP2007025290A

(Akihisa Kawamura, 2007)
(Original Assignee) Matsushita Electric Ind Co Ltd; 松下電器産業株式会社     マルチチャンネル音響コーデックにおける残響を制御する装置 frequency spectrum スペクトル領域
pole filter フィルタ処理, ローパス
XXXXX
16

US20060224381A1

(Jari Makinen, 2006)
(Original Assignee) Nokia Oyj     

(Current Assignee)
Nokia Oyj
Detecting speech frames belonging to a low energy sequence background noise signal first coefficient
residual error energy threshold
second order following steps
frequency spectrum, noise estimator speech encoder
noise character parameter, activity prediction parameter speech signal
current frame current frame
frequency bin basis energy levels
next frame when r
XXXXXXXXXXXXXXXXXXXXXX
17

US20050143989A1

(Milan Jelinek, 2005)
(Original Assignee) Nokia Oyj     

(Current Assignee)
Nokia Technologies Oy
Method and device for speech enhancement in the presence of background noise noise energy estimates noise energy estimates
frequency spectrum domain representation
noise ratio, SNR LT noise ratio
XXXXXXXXXXXXXXX
18

US20050065781A1

(Andreas Tell, 2005)
(Original Assignee) Empire Interactive Europe Ltd     

(Current Assignee)
Empire Interactive Europe Ltd
Method for analysing audio signals average frame energy predetermined time interval
first frequency, first group analysis filterbank, band signals
music signal frequency analysis
background noise signal, noise ratio extracted signal, local maxima
average signal, sound signal prevents updating frequency noise
second group steps a
XXXXXXXXXXXXXXXXXXXXX
19

US20040133424A1

(Douglas Ealey, 2004)
(Original Assignee) Motorola Solutions Inc     

(Current Assignee)
Motorola Solutions Inc
Processing speech signals binary decision significant speech
frequency spectrum frequency spectrum
average signal respective peak
first frequency first frequency
update factor following group
sound activity detection speech detector
detecting sound activity frequency value
frequency bins frequency bins
noise character parameter, activity prediction parameter speech signal
second energy values given number
frequency bands band data
XXXXXXXXXXXXXXXXXXXXXXXXX
20

US6988064B2

(Tenkasi V. Ramabadran, 2006)
(Original Assignee) International Business Machines Corp; Motorola Solutions Inc     

(Current Assignee)
International Business Machines Corp ; Google Technology Holdings LLC
System and method for combined frequency-domain and time-domain pitch extraction for speech signals correlation value correlation value
activity prediction parameter previous frame
pole filter second pitch, first pitch
XXXX
21

US20040181393A1

(Frank Baumgarte, 2004)
(Original Assignee) Agere Systems LLC     

(Current Assignee)
MUCH SHELIST FREED DENENBERG ARNENT & RUBENSTEIN PC ; Avago Technologies International Sales Pte Ltd
Tonal analysis for perceptual audio coding using a compressed spectral representation linear prediction residual error energies inverse discrete cosine
frequency spectrum domain representation
first frequency first frequency
XXXXXXX
22

US7124075B2

(Dmitry Edward Terez, 2006)
(Original Assignee) Dmitry Edward Terez     Methods and apparatus for pitch determination adaptive threshold predetermined threshold value
consecutive minima, two consecutive minima singular value decomposition
noise character parameter, activity prediction parameter linear transformation, speech signal
noise character Euclidean distance
next frame selected pairs
current residual spectrum sample values
frequency bins said subset
SNR calculation ordered set
XXXXXXXXXXXXXXXXXXXXX
23

CN1543639A

(P・黄, 2004)
(Original Assignee) 高通股份有限公司     强壮语音分类方法和装置 activity prediction parameter 至少一个参数
linear prediction residual error energies 音调信息
update decision, preventing update 经更新
update factor 一种语
XXXXXXXXX
24

CN1447963A

(J·塞斯, 2003)
(Original Assignee) 康奈克森特系统公司     语音编码中噪音鲁棒分类方法 activity prediction parameter 至少一个参数
adaptive threshold 一个阈值
SNR calculation 语音处理
noise character parameter 这些参数
update factor 一种语
average frame energy 中设置
XXXXXXXXXXXX
25

CN1624766A

(J·塞斯, 2005)
(Original Assignee) 康奈克森特系统公司     语音编码中噪音鲁棒分类方法 activity prediction parameter 至少一个参数
adaptive threshold 预定阈值, 阈值进行
noise character parameter 多个参数
XXXXXXX
26

US6636829B1

(Adil Benyassine, 2003)
(Original Assignee) Mindspeed Technologies LLC     

(Current Assignee)
HTC Corp ; WIAV Solutions LLC
Speech communication system and method for handling lost frames consecutive minima minimum difference
frequency spectrum frequency spectrum
second group, term correlation map decoding method
two consecutive minima signal detector
frequency bin periodic signal
sound activity, sound activity detection gain codebook
current frame current frame
noise ratio second value
XXXXXXXXXXXXXXXXXXXXXXXX
27

US20010023395A1

(Huan-Yu Su, 2001)
(Original Assignee) Lakestar Semi Inc     

(Current Assignee)
Samsung Electronics Co Ltd
Speech encoder adaptively applying pitch preprocessing with warping of target signal linear prediction, residual error linear prediction
sound signal adaptive codebook
sound activity, detecting sound activity second encoding
frequency spectrum, noise estimator speech encoder
noise character parameter, activity prediction parameter speech signal
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
28

CN1159639A

(保罗·E·雅各布, 1997)
(Original Assignee) 夸尔柯姆股份有限公司     可变速率声码器 linear prediction 预测编码
background noise signal 背景噪声
second energy values 一个输出
detecting sound activity 声音信号
XXXXXXX
29

US5848388A

(Kevin Joseph Power, 1998)
(Original Assignee) British Telecommunications PLC     

(Current Assignee)
British Telecommunications PLC
Speech recognition with sequence parsing, rejection and pause detection options activity prediction parameter probability distributions
first energy value, second energy value different states
adaptive threshold minimum values
frequency bin basis energy levels
noise ratio second value
XXXXXXXX
30

CN1131473A

(安德鲁·P·德雅克, 1996)
(Original Assignee) 夸尔柯姆股份有限公司     在速率可变的声码器中选择编码速率的方法和装置 adaptive threshold 一个阈值
background noise signal 背景噪声
SNR calculation 计算装置
XXXXXXX
31

CN1512487A

(安德鲁・P・德雅克, 2004)
(Original Assignee) 夸尔柯姆股份有限公司     在速率可变的声码器中选择编码速率的方法和装置 background noise signal 背景噪声
SNR calculation 计算装置, 来计算
XXXXXX
32

CN1945696A

(安德鲁·P·德雅克, 2007)
(Original Assignee) 高通股份有限公司     在速率可变的声码器中选择编码速率的方法和装置 adaptive threshold 一个阈值
first group 残留信号
background noise signal 背景噪声
SNR calculation 计算装置
XXXXXXXX
33

EP1233408A1

(Andrew P. Dejaco, 2002)
(Original Assignee) Qualcomm Inc     

(Current Assignee)
Qualcomm Inc
Method and apparatus for selecting an encoding rate in a variable rate vocoder first frequency, first frequency bands assigned frequency, band signal
current residual spectrum frequency subbands
average signal bandpass filter
noise estimates combined signal, subband filter
SNR calculation rate selection
XXXXXXXXXXXX
34

EP1703493A2

(Andrew P. Dejaco, 2006)
(Original Assignee) Qualcomm Inc     

(Current Assignee)
Qualcomm Inc
Method and apparatus for selecting an encoding rate in a variable rate vocoder adaptive threshold second threshold
noise ratio, SNR LT noise ratio
XXX
35

US5751903A

(Kumar Swaminathan, 1998)
(Original Assignee) Hughes Electronics Corp     

(Current Assignee)
JPMorgan Chase Bank NA ; Hughes Network Systems LLC
Low rate multi-mode CELP codec that encodes line SPECTRAL frequencies utilizing an offset second group, term correlation map decoding method
noise energy estimates, noise estimates extracted set
XXXXXXXXXXXX
36

JPH07334190A

(Tadashi Yonezaki, 1995)
(Original Assignee) Matsushita Electric Ind Co Ltd; 松下電器産業株式会社     高調波振幅値量子化装置 frequency spectrum, frequency dependent signal 高調波周波数
tonal stability tonal stability estimator フレーム
residual error の誤差
XXXXXXX
37

US5594833A

(Takeo Miyazawa, 1997)
(Original Assignee) Miyazawa; Takeo     Rapid sound data compression in code book creation average frame energy predetermined time interval
correlation map correlation map
correlation value initial values
first frequency said series
XXXXXXXXXX
38

JPH07114396A

(Atsushi Matsumoto, 1995)
(Original Assignee) Sony Corp; ソニー株式会社     ピッチ検出方法 sound signal 音声信号
music signal 出方法
XXXXXXXXXXXXXXXXXXXXXXXXXX
39

US5406635A

(Kari J. Jarvinen, 1995)
(Original Assignee) Nokia Mobile Phones Ltd     

(Current Assignee)
Intellectual Ventures I LLC
Noise attenuation system sound signal noise attenuation
frequency dependent signal, frequency bands band signals, noise estimation
noise ratio, SNR LT noise ratio, noise power
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
40

CN1071036A

(保罗·E·雅各布, 1993)
(Original Assignee) 夸尔柯姆股份有限公司     可变速率声码器 linear prediction 预测编码
background noise signal 背景噪声, 一对应
second energy values 一个输出
detecting sound activity 声音信号
XXXXXXX
41

CN1381956A

(保罗·E·雅各布, 2002)
(Original Assignee) 夸尔柯姆股份有限公司     可变速率声码器 activity prediction parameter 至少一个参数
linear prediction 预测编码
XXXX
42

EP1107231A2

(William R. Gardener, 2001)
(Original Assignee) Qualcomm Inc     

(Current Assignee)
Qualcomm Inc
Variable rate decoder binary decision linear prediction coefficient, block code
activity prediction parameter, noise character parameter linear predictive coefficient, previous frame
correlation value autocorrelation coefficients
XXXXXXX
43

EP1162601A2

(William R. Gardner, 2001)
(Original Assignee) Qualcomm Inc     

(Current Assignee)
Qualcomm Inc
Variable rate vocoder noise ratio gain parameter, second value
current frame current frame
noise energy estimates, term signal random code
XXXXXXXXXXXXXXXXX
44

EP1239456A1

(William R. Gardner, 2002)
(Original Assignee) Qualcomm Inc     

(Current Assignee)
Qualcomm Inc
Variable rate vocoder consecutive minima, two consecutive minima second limiter, first limiter
activity prediction parameter previous frame
current frame current frame
background noise signal first adder
XXXXXXXXXXXXXXXXX
45

US5040217A

(Karlheinz Brandenburg, 1991)
(Original Assignee) Nokia Bell Labs     

(Current Assignee)
Nokia Bell Labs ; AT&T Corp
Perceptual coding of audio signals second group absolute value
noise character parameter function value
frequency spectrum said blocks
XXXXXXXX
46

US20070136059A1

(Gregory Gadbois, 2007)
(Original Assignee) Gadbois Gregory J     Multi-voice speech recognition sound activity, sound activity detection acoustic models
linear prediction speech input
XXXXXXXXXXXXX
47

WO2007051548A1

(Lars Villemoes, 2007)
(Original Assignee) Coding Technologies Ab     Time warped modified transform coding of audio signals second energy value second intermediate
current residual spectrum sample values
current frame, average frame two frames
next frame when r
XXXXXXXXXXXX
48

US20070071089A1

(Dohyung Kim, 2007)
(Original Assignee) Samsung Electronics Co Ltd     

(Current Assignee)
Samsung Electronics Co Ltd
Scalable audio encoding and decoding apparatus, method, and medium linear prediction scalable decoding method
frequency bin, frequency bands high frequency band
detecting sound activity low frequency band
XXXXXXXX
49

CN1909060A

(金炫秀, 2007)
(Original Assignee) 三星电子株式会社     提取浊音/清音分类信息的方法和设备 current residual spectrum 系数计算
first energy 谐波模型
SNR calculation 单元计算, 信号计算
preventing update 残余能
XXXXXXXXXXX
50

JP2007065636A

(James P Ashley, 2007)
(Original Assignee) Motorola Inc; モトローラ・インコーポレイテッドMotorola Incorporated     音声通信システムにおいて快適雑音を生成する方法および装置 sixteenth order システム
sound signal 音声信号
XXXXXXXXXXXXXXXXXXXXXXXXX
51

CN1905006A

(荒川隆行, 2007)
(Original Assignee) 日本电气株式会社     噪声抑制系统与方法及程序 sound signal, sound signal prevents updating 进行上述
SNR calculation 信号计算
frequency bands 频率方向
first energy value 概率分布
detecting sound activity 声音信号
initial value, term value 平均值
XXXXXXXXXXXXXXXXXXXXXXXXXX
52

US20070016411A1

(Junghoe Kim, 2007)
(Original Assignee) Samsung Electronics Co Ltd     

(Current Assignee)
Samsung Electronics Co Ltd
Method and apparatus to encode/decode low bit-rate audio signal frequency bins, frequency bin high frequency component, high frequency band
first frequency first frequency
sound activity, sound activity detection more codebooks
XXXXXXXXXXXXXXXXXX
53

WO2007001068A1

(Chia-Shin Yen, 2007)
(Original Assignee) Matsushita Electric Industrial Co., Ltd.     Sound classification system and method capable of adding and correcting a sound type linear prediction, activity prediction parameter classification results
second order following steps
frequency bins frequency bins
linear prediction residual error energies sound signals
noise character parameter, prevent update input window
frequency bands splay area
XXXXXXXXXXXX
54

JP2007011341A

(David Giesbrecht, 2007)
(Original Assignee) Harman Becker Automotive Systems-Wavemakers Inc; ハーマン ベッカー オートモーティブ システムズ−ウェーブメーカーズ, インコーポレイテッド     高調波信号の周波数拡張 frequency spectrum, frequency dependent signal 高調波周波数
tonal sound, tonal stability tonal stability estimation モジュール
term signal デジタル
background noise signal ...
XXXXXXXXXXX
55

US20060251178A1

(Masahiro Oshikiri, 2006)
(Original Assignee) Panasonic Corp     

(Current Assignee)
Panasonic Intellectual Property Corp
Encoder apparatus and decoder apparatus frequency spectrum filter characteristic
second group, term correlation map decoding method
frequency bands, first frequency bands band signal, second band
XXXXXXX
56

US20060271356A1

(Koen Vos, 2006)
(Original Assignee) Qualcomm Inc     

(Current Assignee)
Qualcomm Inc
Systems, methods, and apparatus for quantization of spectral envelope representation correlation map, term correlation map corresponding portion
linear prediction, residual error linear prediction
frequency spectrum, noise estimator speech encoder
noise character parameter, activity prediction parameter speech signal
background noise signal first adder
XXXXXXXXXXXXXXXXXXXXXX
57

US20070088558A1

(Koen Vos, 2007)
(Original Assignee) Qualcomm Inc     

(Current Assignee)
Qualcomm Inc
Systems, methods, and apparatus for speech signal filtering current residual spectrum different sampling rates
SNR calculation gain factors
XXXXXXXX
58

US20060147124A1

(Bernd Edler, 2006)
(Original Assignee) Agere Systems LLC     

(Current Assignee)
Agere Systems LLC
Perceptual coding of image signals using separated irrelevancy reduction and redundancy reduction residual error redundancy reduction
average signal, average frame magnitude response, band signals
SNR LT filter input
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
59

WO2006067436A1

(Tobias Dahl, 2006)
(Original Assignee) Universitetet I Oslo; Samuels, Adrian, James     Channel impulse response estimation consecutive minima, two consecutive minima singular value decomposition
frequency bands, first frequency bands same frequency band
current residual spectrum, current frame containing sample
frequency bins said subset
XXXXXXXXXXXXX
60

US20060036432A1

(Kristofer Kjorling, 2006)
(Original Assignee) Kristofer Kjorling; Per Ekstrand; Fredrik Henn; Lars Villemoes     

(Current Assignee)
Dolby International AB
Apparatus and method applying adaptive spectral whitening in a high-frequency reconstruction coding system linear prediction, residual error linear prediction
second order following steps
noise estimates subband filter
first group, first frequency band signals
XXXXXXXXXXXXX
61

JP2006094522A

(Markus Buck, 2006)
(Original Assignee) Harman Becker Automotive Systems Gmbh; ハーマン ベッカー オートモーティブ システムズ ゲーエムベーハー     ノイズ低減による多重チャンネル適応の音声信号処理 tonal sound signal サブバンド
frequency bin ロッキング
sound signal 音声信号
tonal stability tonal stability estimator フレーム
XXXXXXXXXXXXXXXXXXXXXXXXXXX
62

JP2006085176A

(Bernd Iser, 2006)
(Original Assignee) Harman Becker Automotive Systems Gmbh; ハーマン ベッカー オートモーティブ システムズ ゲーエムベーハー     帯域制限オーディオ信号の帯域拡大 activity prediction parameter 特徴パラメータ
tonal stability, tonal sound ノイズ
current frame, average frame ワーク
binary decision, update decision の決定
XXXXXXXXXXXXXXXX
63

JP2007065226A

(Norihiro Hagita, 2007)
(Original Assignee) Advanced Telecommunication Research Institute International; 株式会社国際電気通信基礎技術研究所     ボーカル・フライ検出装置及びコンピュータプログラム dependent signal 最大パワー
update factor のフィルタ
sound activity detection 検出手段, 検出装置
XXXXXXXX
64

US20070050189A1

(Edgardo Cruz-Zeno, 2007)
(Original Assignee) Motorola Solutions Inc     

(Current Assignee)
Google Technology Holdings LLC
Method and apparatus for comfort noise generation in speech communication systems noise character noise character
activity prediction parameter, noise character parameter previous frame, speech signal
current frame current frame
term signal includes sets
XXXXXXXXXXXXXX
65

US20060053007A1

(Riitta Niemisto, 2006)
(Original Assignee) Nokia Oyj     

(Current Assignee)
Nokia Solutions and Networks Oy
Detection of voice activity in an audio signal frequency spectrum frequency spectrum
activity prediction parameter, noise character parameter previous frame, speech signal
XXXXXXXXXXX
66

US20050278174A1

(Hitoshi Sasaki, 2005)
(Original Assignee) Fujitsu Ltd     

(Current Assignee)
Fujitsu Ltd
Audio coder sound signal, sound signal prevents updating reproduced signals
frequency bins, frequency bands sampled values
XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
67

US20060171419A1

(Serafin Spindola, 2006)
(Original Assignee) Qualcomm Inc     

(Current Assignee)
Qualcomm Inc
Method for discontinuous transmission and accurate reproduction of background noise information second group, sound activity detector signal processor
frequency bins packet format
noise estimates, noise estimator said model
XXXXXXXXX
68

US7216074B2

(David Malah, 2007)
(Original Assignee) AT&T Corp     

(Current Assignee)
Nuance Communications Inc
System for bandwidth extension of narrow-band speech term value partial correlation
background noise signal first coefficient
frequency bin basis sampling rate
frequency bands, first frequency bands band signal
XXXXXXXXX
69

US20060241937A1

(Changxue Ma, 2006)
(Original Assignee) Motorola Solutions Inc     

(Current Assignee)
Motorola Solutions Inc
Method and apparatus for automatically discriminating information bearing audio segments and background noise audio segments music signal frequency analysis
first frequency different times
frequency bands frequency bands
frequency spectrum joint time
XXXXXXXXXXXXX
70

US20050246164A1

(Pasi Ojala, 2005)
(Original Assignee) Nokia Oyj     

(Current Assignee)
Nokia Oyj
Coding of audio signals noise character parameter said time
second group steps a
first frequency time t
XXX
71

US20060098809A1

(Rajeev Nongpiur, 2006)
(Original Assignee) QNX Software Systems Wavemakers Inc     

(Current Assignee)
2236008 Ontario Inc ; 8758271 Canada Inc
Periodic signal enhancement system current residual spectrum adaptive filter coefficient
consecutive minima, two consecutive minima signal enhancement
noise estimates, noise estimator adaptive filters
detecting sound activity detection output
frequency bin periodic signal
term signal enhanced signal
noise ratio, SNR LT noise ratio
initial value first stage
XXXXXXXXXXXXXXXX
72

US20050216261A1

(Philip Garner, 2005)
(Original Assignee) Canon Inc     

(Current Assignee)
Canon Inc
Signal processing apparatus and method adaptive threshold predetermined threshold value, second threshold
noise character parameter, activity prediction parameter speech signal, first threshold value
XXXXXXX
73

US20050203735A1

(Osamu Ichikawa, 2005)
(Original Assignee) International Business Machines Corp     

(Current Assignee)
International Business Machines Corp
Signal noise reduction frequency spectrum frequency spectrum
average signal, sound signal noise component
noise character parameter, activity prediction parameter speech signal
frequency bin basis method steps
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
74

CN1922659A

(雅里·马基南, 2007)
(Original Assignee) 诺基亚公司     编码模式选择 adaptive threshold 预定阈值, 超过预定
next frame 至少基
SNR calculation 来计算
noise estimates 的噪声
XXXX
75

US20050192798A1

(Janne Vainio, 2005)
(Original Assignee) Nokia Oyj     

(Current Assignee)
Nokia Technologies Oy
Classification of audio signals term signal, sound signal prevents updating narrower bandwidth
first group first group
XX
76

CN1957398A

(布鲁诺·贝塞特, 2007)
(Original Assignee) 沃伊斯亚吉公司     在基于代数码激励线性预测/变换编码激励的音频压缩期间低频加重的方法和设备 frequency spectrum, frequency bins 频率分量
music signal, background noise signal 产生增益
current residual spectrum 系数计算, 获得的
successive minima 响应之间
binary decision 持续时间
SNR calculation 计算装置, 来计算
first frequency 宽扩展
update factor 的因子
XXXXXXXXXXXXXXXXXXX
77

WO2005078706A1

(Bruno Bessette, 2005)
(Original Assignee) Voiceage Corporation     Methods and devices for low-frequency emphasis during audio compression based on acelp/tcx second energy value determined time period, following expressions
frequency bands bandwidth extension
activity prediction parameter previous frame
average frame energy, first energy value last portion
pole filter pole filter
current residual spectrum first wind
XXXXXXXXXXXXX
78

US20050177363A1

(Kwangcheol Oh, 2005)
(Original Assignee) Samsung Electronics Co Ltd     

(Current Assignee)
Samsung Electronics Co Ltd
Apparatus, method, and medium for detecting voiced sound and unvoiced sound average frame predetermined threshold values
activity prediction parameter first threshold value
adaptive threshold second threshold
average signal third slope
music signal first slope
XXXXXXXXXXXX
79

US20050177364A1

(Milan Jelinek, 2005)
(Original Assignee) Nokia Oyj     

(Current Assignee)
Nokia Technologies Oy
Methods and devices for source controlled variable bit-rate wideband speech coding activity prediction parameter first threshold value, previous frame
sound activity high frequencies
adaptive threshold second threshold
current frame, current frame energy third threshold
average signal, sound signal prevents updating frequency noise
first frequency first frequency
frequency spectrum, noise estimator speech encoder
frequency bins frequency bins
initial value initial value
linear prediction residual error energies Comfort Noise
next frame when r
XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
80

US20050267746A1

(Milan Jelinek, 2005)
(Original Assignee) Nokia Oyj     

(Current Assignee)
Nokia Technologies Oy
Method for interoperation between adaptive multi-rate wideband (AMR-WB) and multi-mode variable bit-rate wideband (VMR-WB) codecs first frequency, first frequency bands algebraic codebook
linear prediction residual error energies Comfort Noise
XX
81

JP2005173607A

(Per Rune Albin Ekstrand, 2005)
(Original Assignee) Coding Technologies Ab; コーディング テクノロジーズ アクチボラゲット     時間的に離散した音声信号のアップサンプリングした信号を発生する方法と装置 consecutive minima 与える手段
residual error メーション
update factor のフィルタ
tonal sound signal サブバンド
XXXXXXX
82

US20050165603A1

(Bruno Bessette, 2005)
(Original Assignee) VoiceAge Corp     

(Current Assignee)
VoiceAge Corp
Method and device for frequency-selective pitch enhancement of synthesized speech second order, second group encoding parameters
term value transfer function
noise energy estimates, noise estimates adaptive filter
current residual spectrum, pole filter -off frequency, lower band
adaptive threshold sampling means
XXXXXXXXXXXXXXXXXXX
83

US20050240399A1

(Jari Makinen, 2005)
(Original Assignee) Nokia Oyj     

(Current Assignee)
Nokia Technologies Oy
Signal encoding sound activity detector signal processing device
linear prediction, residual error linear prediction
noise energy estimates said second set
frequency bands frequency bands
frequency bin basis energy levels
noise ratio, SNR LT noise ratio
XXXXXXXXXXXXXXX
84

EP1672618A1

(Kok Seng 50 Regent Grove CHONG, 2006)
(Original Assignee) Panasonic Corp     

(Current Assignee)
Panasonic Corp
Method for deciding time boundary for encoding spectrum envelope and frequency resolution first frequency, frequency bands band signals, frequency bands
activity prediction parameter previous frame
current frame current frame
noise estimator domain signal
XXXXXXXXXXXXXXXX
85

WO2005041169A2

(Anssi RÄMÖ, 2005)
(Original Assignee) Nokia Corporation; Nokia Inc.     Method and system for speech coding noise character parameter, activity prediction parameter speech signal
prevent update base stations
first frequency first number
SNR calculation signal data
XXXXXXX
86

KR20060025203A

(알버투스 씨. 덴 브린커, 2006)
(Original Assignee) 코닌클리케 필립스 일렉트로닉스 엔.브이.     잡음 부가에 의한 디코딩된 오디오의 품질 개선 noise estimates, noise estimator 추정치를
first group 특성들을
XXX
87

WO2004114133A1

(Magnus HÖGSTEDT, 2004)
(Original Assignee) Abb Research Ltd.     Method to diagnose equipment status term signal computer readable media
noise character displaying information
term correlation map additional event
noise character parameter said time
XXXXX
88

US20050278171A1

(Seth Suppappola, 2005)
(Original Assignee) Acoustic Technologies Inc     

(Current Assignee)
Cirrus Logic Inc
Comfort noise generator using modified doblinger noise estimate sound activity detection speech detector
activity prediction parameter previous frame
frequency bins spectral gain
current frame current frame
SNR LT, SNR calculation noise power
XXXXXXXXXXXXXXXXXXX
89

JP2005195955A

(Ko Amada, 2005)
(Original Assignee) Toshiba Corp; 株式会社東芝     雑音抑圧装置及び雑音抑圧方法 noise character 切替えること
tonal sound signal サブバンド
binary decision の判定
XXXXXX
90

CN1735928A

(巴拉兹·科弗西, 2006)
(Original Assignee) 法国电信公司     用于可变速率音频编解码的方法 noise character parameter 这些参数
frequency bins N个比特
SNR calculation 来计算
XXXXXXXX
91

WO2004040830A1

(Jari MÄKINEN, 2004)
(Original Assignee) Nokia Corporation     Variable rate speech codec second energy, second energy value different rates
residual error residual error
noise character parameter, activity prediction parameter speech signal
initial value, correlation value limit values
XXXXXXXXXXX
92

US20040128126A1

(Young Nam, 2004)
(Original Assignee) WILDERTHANCOM Co Ltd     

(Current Assignee)
WILDERTHANCOM Co Ltd ; Realnetworks Asia Pacific Co Ltd
Preprocessing of digital audio data for mobile audio codecs adaptive threshold automatic gain control
sound activity, sound activity detection signal level
XXXXXXXXXXXX
93

JP2005110127A

(Katsutoshi Takahashi, 2005)
(Original Assignee) Canon Inc; キヤノン株式会社     風雑音検出装置及びそれを有するビデオカメラ装置 tonal sound, sound activity 風雑音検出
first group えること
sound signal 音声信号
sound activity detection 検出手段, 検出装置
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
94

WO2004027368A1

(Naoya Tanaka, 2004)
(Original Assignee) Matsushita Electric Industrial Co., Ltd.; Nec Corporation     Audio decoding apparatus and method frequency bins, frequency bin high frequency component
next frame adjacent sub
noise ratio square root
XXXXXXX
95

US20040225505A1

(Robert Andersen, 2004)
(Original Assignee) Dolby Laboratories Licensing Corp     

(Current Assignee)
Dolby Laboratories Licensing Corp
Audio coding systems and methods using spectral component coupling and spectral component regeneration first frequency analysis filterbank
frequency bands, first frequency bands band information
first energy more output
frequency bin, frequency bin basis third sets
XXXX
96

GB2400003A

(Halil Fikretler, 2004)
(Original Assignee) Motorola Solutions Inc     

(Current Assignee)
Motorola Solutions Inc
Pitch estimation within a speech signal activity prediction parameter, noise character parameter previous frame, speech signal
update factor Kalman filter
residual error sample point
correlation value peak values
XXXXXXXXXXX
97

US7209567B1

(David Kozel, 2007)
(Original Assignee) Purdue Research Foundation     

(Current Assignee)
Purdue Research Foundation
Communication system with adaptive noise suppression adaptive threshold noise reduction
frequency bins frequency bins
noise character parameter, activity prediction parameter speech signal
noise ratio, SNR LT noise ratio
frequency spectrum time frames
XXXXXXXXXXXXXXXXXX
98

EP1474755A1

(Lloyd Watts, 2004)
(Original Assignee) Audience LLC     

(Current Assignee)
Audience LLC
Filter set for frequency analysis sound signal, sound signal prevents updating frequency channel
first frequency first frequency
frequency bins sampled signal
background noise signal, noise ratio first filter
second energy audio stream
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
99

US6985856B2

(Ye Wang, 2006)
(Original Assignee) Nokia Oyj     

(Current Assignee)
Provenance Asset Group LLC ; Nokia USA Inc
Method and device for compressed-domain packet loss concealment prevent update transform coefficient
second energy audio stream
XX
100

US20030078770A1

(Alexander Fischer, 2003)
(Original Assignee) Deutsche Telekom AG     

(Current Assignee)
Deutsche Telekom AG
Method for detecting a voice activity decision (voice activity detector) binary decision linear prediction coefficient
correlation value output values
initial value first stage
first frequency time t
XXXXXX
101

CN1539137A

(戴维・王, 2004)
(Original Assignee) 格鲁斯番 维拉塔公司; 格鲁斯番维拉塔公司     产生有色舒适噪声的方法和系统 adaptive threshold 个检测器
pole filter 一个标识
background noise signal 背景噪声
average frame 装置适
XXXXXXX
102

CN1539138A

(瓦苏德夫・S・纳雅克, 2004)
(Original Assignee) 格鲁斯番维拉塔公司     执行低复杂性频谱估计技术来产生舒适噪声的方法和系统 adaptive threshold 个检测器
linear prediction 预测编码
correlation value 预测信号
noise estimates 的噪声
XXXX
103

EP1265224A1

(Dunling Li, 2002)
(Original Assignee) Telogy Networks Inc     

(Current Assignee)
Telogy Networks Inc
Method for converging a G.729 annex B compliant voice activity detection circuit correlation value, term value autocorrelation coefficients, said first value
second energy digital representation
noise character noise character
prevent update n times
XXXXXX
104

US7054808B2

(Koji Yoshida, 2006)
(Original Assignee) Panasonic Corp     

(Current Assignee)
Panasonic Corp
Noise suppressing apparatus and noise suppressing method initial value determining section
sound activity detection, sound signal prevents updating conversion section
background noise signal first coefficient
noise character parameter, activity prediction parameter speech signal
frequency spectrum, frequency bands greater part
XXXXXXXXXXXXXXXXXXXXXX
105

US7065486B1

(Jes Thyssen, 2006)
(Original Assignee) Mindspeed Technologies LLC     

(Current Assignee)
MACOM Technology Solutions Holdings Inc ; WIAV Solutions LLC
Linear prediction based noise suppression binary decision linear prediction coefficient
noise character parameter said time
XXXX
106

US20030144840A1

(Changxue Ma, 2003)
(Original Assignee) Motorola Solutions Inc     

(Current Assignee)
Google Technology Holdings LLC
Method and apparatus for speech detection using time-frequency variance current frame given frequency
average signal bandpass filter
term value shift register
pole filter pass filters
prevent update time sample
XXXXXXXXX
107

US7065485B1

(Nicola R. Chong-White, 2006)
(Original Assignee) AT&T Corp     

(Current Assignee)
Nuance Communications Inc
Enhancing speech intelligibility using variable-rate time-scale modification adaptive threshold differential pulse
linear prediction, residual error linear prediction
noise ratio second value
XXX
108

JP2003195881A

(Takao Yamabe, 2003)
(Original Assignee) Victor Co Of Japan Ltd; 日本ビクター株式会社     周波数変換ブロック長適応変換装置及びプログラム current residual spectrum 周波数スペクトル
binary decision, update decision 決定手段
XXXXXXXXX
109

US20030110029A1

(Masoud Ahmadi, 2003)
(Original Assignee) Nortel Networks Ltd     

(Current Assignee)
Nortel Networks Ltd
Noise detection and cancellation in communications systems correlation value successive frames
current frame comparison means
second order, sixteenth order includes means
XXXXXXXXX
110

US6785645B2

(Hosam Adel Khalil, 2004)
(Original Assignee) Microsoft Corp     

(Current Assignee)
Microsoft Technology Licensing LLC
Real-time speech and music classifier correlation value pattern recognition method
initial value predefined criterion
two consecutive minima Mahalanobis distance
noise character Euclidean distance
second energy, current frame energy switching time
current frame data frames
average signal test window
XXXXXXXXXXXXXX
111

US20030101050A1

(Hosam Khalil, 2003)
(Original Assignee) Microsoft Corp     

(Current Assignee)
Microsoft Technology Licensing LLC
Real-time speech and music classifier correlation value pattern recognition method
initial value predefined criterion
two consecutive minima Mahalanobis distance
noise character Euclidean distance
second energy, current frame energy switching time
current frame current frame
average signal test window
XXXXXXXXXXXXXX
112

US20030101052A1

(Lang Chen, 2003)
(Original Assignee) VIPEX TECHNOLOGIES Inc     

(Current Assignee)
VIPEX TECHNOLOGIES Inc
Voice recognition and activation system noise energy estimates respective output signals
adaptive threshold second threshold
first energy value, second energy value repeating step
current residual spectrum frequency band
pole filter first segment
noise ratio second value
current frame, current frame energy middle line
XXXXXXXXXXXXXXXXXXXXX
113

US20030093278A1

(David Malah, 2003)
(Original Assignee) AT&T Corp     

(Current Assignee)
Nuance Communications Inc
Method of bandwidth extension for narrow-band speech activity prediction parameter linear predictive coefficient
frequency bands bandwidth extension, band data
current residual spectrum, residual error prediction residual, lower band
term value partial correlation, transfer function
frequency bin basis sampling rate
sound signal, sound activity detector higher band
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
114

CN1344067A

(F·伍帕曼, 2002)
(Original Assignee) 皇家菲利浦电子有限公司     采用不同编码原理的传送系统 first frequency, first frequency bands 至少其中一个, 第二解
current residual spectrum 获得的
XXXXXXXX
115

US20010044722A1

(Harald Gustafsson, 2001)
(Original Assignee) Telefonaktiebolaget LM Ericsson AB     

(Current Assignee)
Optis Wireless Technology LLC
System and method for modifying speech signals first frequency, frequency bands predetermined frequency range, harmonic frequencies
frequency bin fast Fourier transform
frequency spectrum frequency spectrum
current residual spectrum frequency band, lower band
residual error residual error
next frame analog format
XXXXXXXXXXXXXXXXXXXXX
116

US6708145B1

(Lars Gustaf Liljeryd, 2004)
(Original Assignee) Coding Technologies Sweden AB     

(Current Assignee)
Dolby International AB
Enhancing perceptual performance of sbr and related hfr coding methods by adaptive noise-floor addition and noise substitution limiting binary decision polynomial representation
second group, term correlation map decoding method
current residual spectrum, residual error source encoding
second order following steps
frequency bands, first frequency bands frequency bands, band frequency
consecutive minima, two consecutive minima minimum point
average frame local maximum
XXXXXXXXXXXX
117

US20010001853A1

(Anthony Mauro, 2001)
(Original Assignee) Mauro Anthony P.; Sih Gilbert C.     Low frequency spectral enhancement system and method sound activity fundamental frequencies
sound activity detection speech detector
noise ratio, SNR LT noise ratio
XXXXXXXXXXXX
118

EP1158494A1

(Oded Ghitza, 2001)
(Original Assignee) Nokia of America Corp     

(Current Assignee)
Nokia of America Corp
Method and apparatus for performing audio coding and decoding by interleaving smoothed critical band evelopes at higher frequencies detecting sound activity low frequency band
frequency bands frequency bands
noise character parameter, activity prediction parameter speech signal
pole filter pass filters
music signal music signal
XXXXXXXXXXXX
119

US20020111798A1

(Pengjun Huang, 2002)
(Original Assignee) Qualcomm Inc     

(Current Assignee)
Qualcomm Inc
Method and apparatus for robust speech classification initial value correlation Coefficient Function threshold, parameter analyzer
current frame energy current frame energy
noise character parameter, activity prediction parameter speech signal
noise ratio Noise Ratio
XXXXXXXXXXX
120

JP2002118517A

(Kenichi Makino, 2002)
(Original Assignee) Sony Corp; ソニー株式会社     直交変換装置及び方法、逆直交変換装置及び方法、変換符号化装置及び方法、並びに復号装置及び方法 current residual spectrum なるサンプル
average frame フレーム間
binary decision, update decision 決定手段
first group えること
sound signal 音声信号
XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
121

US7191123B1

(Bruno Bessette, 2007)
(Original Assignee) VoiceAge Corp     

(Current Assignee)
SAINT LAWRENCE COMMUNICATIONS LLC
Gain-smoothing in wideband speech and audio signal decoder sound signal adaptive codebook
prevent update base stations
XXXXXXXXXXXXXXXXXXXXXXXXXX
122

EP1216474A1

(Per Ekstrand, 2002)
(Original Assignee) Coding Technologies Sweden AB     

(Current Assignee)
Dolby International AB
Efficient spectral envelope coding using variable time/frequency resolution and time/frequency switching term signal received control signal
second energy values given number
noise character parameter said time
XXX
123

EP1093113A2

(Kenneth Finlon, 2001)
(Original Assignee) Motorola Solutions Inc     

(Current Assignee)
Motorola Solutions Inc
Method and apparatus for dynamic segmentation of a low bit rate digital voice message first energy value, second energy value repeating step
frequency spectrum, noise estimator speech encoder
activity prediction parameter previous frame
XXXXXXXXXX
124

US7139700B1

(Jacek Stachurski, 2006)
(Original Assignee) Texas Instruments Inc     

(Current Assignee)
Texas Instruments Inc
Hybrid speech coding and system linear prediction, residual error linear prediction
frequency spectrum, noise estimator speech encoder
XXXXXXX
125

US6355869B1

(Duane Mitton, 2002)
(Original Assignee) Duane Mitton     Method and system for creating musical scores from musical recordings first frequency, frequency spectrum sample rate
second energy values four points
XXXXXXXXXX
126

US6691082B1

(Joseph Gerard Aguilar, 2004)
(Original Assignee) Nokia of America Corp     

(Current Assignee)
Nokia of America Corp
Method and system for sub-band hybrid coding binary decision linear prediction coefficient
noise estimator pre-processed signal
sound signal adaptive codebook
noise character parameter, activity prediction parameter excitation signal, speech signal
sound activity, detecting sound activity second encoding
average frame energy fixed codebook
current frame current frame
first group, first frequency band signals
linear prediction residual error energies time scale
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
127

EP1119911A1

(Béatrice PESQUET-POPESCU, 2001)
(Original Assignee) Koninklijke Philips NV     

(Current Assignee)
Koninklijke Philips NV
Filtering device pole filter filtering device
preventing update updating circuit
sound signal, music signal digital signals
background noise signal, noise ratio first filter
XXXXX
128

EP1120775A1

(Koji Yoshida, 2001)
(Original Assignee) Panasonic Corp     

(Current Assignee)
Panasonic Corp
Noise signal encoder and voice signal encoder background noise signal background noise signal
music signal readable record
XXXXXX
129

US6434417B1

(Eric G. Lovett, 2002)
(Original Assignee) Cardiac Pacemakers Inc     

(Current Assignee)
Cardiac Pacemakers Inc
Method and system for detecting cardiac depolarization next frame accordance therewith
average signal bandpass filter
background noise signal, noise ratio first filter
pole filter pass filters
XXXXXXXX
130

CN1437747A

(A·达斯, 2003)
(Original Assignee) 高通股份有限公司     闭环多模混合域线性预测(mdlp)语音编解码器 frequency spectrum, frequency bin 包括频率
adaptive threshold 预定阈值
pitch stability 稳定状态
SNR av 多项式
XXXXXXXXXXX
131

US7058572B1

(Elias J. Nemer, 2006)
(Original Assignee) Nortel Networks Ltd     

(Current Assignee)
Apple Inc
Reducing acoustic noise in wireless and landline based telephony activity prediction parameter first threshold value, previous frame
adaptive threshold second threshold
frequency bins sampled values
second group absolute value
first frequency first number
frequency bin, frequency bands one band, square root
XXXXXXXXXXX
132

US6717991B1

(Harald Gustafsson, 2004)
(Original Assignee) Telefonaktiebolaget LM Ericsson AB     

(Current Assignee)
Optis Wireless Technology LLC
System and method for dual microphone signal noise reduction using spectral subtraction SNR calculation scalar multiplication
updating noise energy estimates controller estimates
first group weighting function
first frequency first frequency
noise character parameter, activity prediction parameter speech signal
frequency bin basis energy levels
XXXXXXXXXX
133

US6339758B1

(Hiroshi Kanazawa, 2002)
(Original Assignee) Toshiba Corp     

(Current Assignee)
Toshiba Corp
Noise suppress processing apparatus and method frequency bin fast Fourier transform
frequency spectrum frequency spectrum
average signal, sound activity noise component, arrival direction
linear prediction speech input
SNR LT, SNR calculation noise power
frequency bins first range
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
134

US6226616B1

(Yu-Li You, 2001)
(Original Assignee) Digital Theater Systems Inc     

(Current Assignee)
DTS LLC
Sound quality of established low bit-rate audio coding systems without loss of decoder compatibility frequency spectrum, first energy value band signals, N subbands
music signal transition band
noise estimator N difference
current residual spectrum lower band
XXXXXXXXXXXXXXXXXXXXXXXXX
135

JP2000357000A

(Koji Yoshida, 2000)
(Original Assignee) Matsushita Electric Ind Co Ltd; 松下電器産業株式会社     雑音信号符号化装置および音声信号符号化装置 noise estimator 音声信号復号化
sound activity detection 検出手段
update factor, update decision 更新後
binary decision の判定
SNR calculation 信号又
XXXXXXXXXXXXXX
136

CN1274456A

(S·P·维勒特, 2000)
(Original Assignee) 萨里大学     语音编码器 linear prediction 预测编码
binary decision 产生表征
first frequency 的第二个
term correlation map 系数对应
background noise signal 背景噪声
update factor 一种语, 的因子
preventing update 选择前
XXXXXXXXXXXX
137

US6351730B2

(Juin-Hwey Chen, 2002)
(Original Assignee) Nokia of America Corp     

(Current Assignee)
Nokia of America Corp
Low-complexity, low-delay, scalable and embedded speech and audio coding with adaptive frame loss concealment sound activity detection Discrete Cosine Transform
consecutive minima successive time intervals
frequency spectrum domain representation
noise character parameter, noise character wise linear function
second order following steps
frequency bands, first frequency bands frequency bands, second band
current residual spectrum sample values
current frame current frame
frequency bin basis sampling rate
noise estimator domain signal
noise ratio, SNR LT noise ratio, Noise Ratio
XXXXXXXXXXXXXXXXXXXXXXX
138

US6363345B1

(Joseph Marash, 2002)
(Original Assignee) Andrea Electronics Corp     

(Current Assignee)
Andrea Electronics Corp
System, method and apparatus for cancelling noise frequency spectrum frequency spectrum
frequency bins noise estimation
adaptive threshold minimum values
XXXXXXXXXX
139

US6381570B2

(Dunling Li, 2002)
(Original Assignee) Telogy Networks Inc     

(Current Assignee)
Telogy Networks Inc
Adaptive two-threshold method for discriminating noise from speech in a communication signal residual error energy threshold
frequency bins relative values
current residual spectrum sample values
XXXXXXXXXXX
140

US6680972B1

(Lars Gustaf Liljeryd, 2004)
(Original Assignee) Coding Technologies Sweden AB     

(Current Assignee)
LARS GUSAF LILJERYD ; Dolby International AB
Source coding enhancement using spectral-band replication average signal bandpass filter
frequency bands, first group frequency bands, band signals
pole filter pass filters
background noise signal first adder
XXXXXXXXXXXXXXXXX
141

US20030009325A1

(Raif Kirchherr, 2003)
(Original Assignee) Deutsche Telekom AG     

(Current Assignee)
Deutsche Telekom AG
Method for signal controlled switching between different audio coding schemes activity prediction parameter, noise character parameter previous frame, speech signal
current frame current frame
XXXXXXXXXXXXX
142

US6266633B1

(Alan Lawrence Higgins, 2001)
(Original Assignee) ITT Manufacturing Enterprises LLC     

(Current Assignee)
Harris Corp
Noise suppression and channel equalization preprocessor for speech and speaker recognizers: method and apparatus correlation map probability density function
frequency bin fast Fourier transform
average signal, sound signal noise component
sound activity detection, adaptive threshold input voice, sampling means
activity prediction parameter negative value
frequency bin basis sampling rate
noise character parameter said time
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
143

US6240386B1

(Jes Thyssen, 2001)
(Original Assignee) Lakestar Semi Inc     

(Current Assignee)
MACOM Technology Solutions Holdings Inc
Speech codec employing noise classification for noise compensation current residual spectrum, residual error source encoding
next frame when r
XXXXXXXXX
144

US6122610A

(Steven H. Isabelle, 2000)
(Original Assignee) Verance Corp     

(Current Assignee)
GCOMM Corp ; Verance Corp
Noise suppression for low bitrate speech coder frequency bins high frequency components, spectral gain
frequency bands, first frequency bands different frequency band, frequency bands
correlation value domain representations
first frequency domain filter
pole filter pole filter
XXXXXX
145

US6223090B1

(Douglas S. Brungart, 2001)
(Original Assignee) US Air Force     

(Current Assignee)
US Air Force
Manikin positioning for acoustic measuring correlation value domain representations
frequency bin fast Fourier transform
frequency spectrum, term correlation map frequency interval, frequency spectrum
average signal second microphones
background noise signal phase difference
second order, sixteenth order includes means, first position
pitch stability selected time
noise character parameter, noise character second motor, said time
SNR calculation least square
current frame said motor
adaptive threshold near field
XXXXXXXXXXXXXXXXXXXXXX
146

US6173255B1

(Dennis L. Wilson, 2001)
(Original Assignee) Lockheed Martin Corp     

(Current Assignee)
Lockheed Martin Corp ; Lockheed Martin Aerospace Corp
Synchronized overlap add voice processing using windows and one bit correlators music signal, background noise signal analog audio signal
linear prediction residual error energies current sample
XXXXXXX
147

US6449586B1

(Osamu Hoshuyama, 2002)
(Original Assignee) NEC Corp     

(Current Assignee)
NEC Corp
Control method of adaptive array and adaptive array apparatus noise character parameter non-linear function
noise estimates, noise estimator adaptive filters
second group absolute value
linear prediction residual error energies other signals
SNR calculation least square
XXXXXXX
148

JPH1198090A

(Kiyoko Tanaka, 1999)
(Original Assignee) Nec Corp; 日本電気株式会社     音声符号化/復号化装置 current residual spectrum スペクトル特性
update factor のフィルタ, の情報
tonal sound 音声以外
tonal stability tonal stability estimator フレーム
average signal, average frame 平均化, 記復号
SNR LT ine
SNR calculation 信号又
XXXXXXXXXXXX
149

US5990405A

(Don R. Auten, 1999)
(Original Assignee) Gibson Guitar Corp     

(Current Assignee)
Bank of America NA ; Gibson Brands Inc
System and method for generating and controlling a simulated musical concert experience sound signal audio reproduction
pitch stability control device
next frame audio outputs
two consecutive minima left channels
sound activity, sound activity detection signal level
adaptive threshold video source, back device
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
150

US6137349A

(Andreas Menkhoff, 2000)
(Original Assignee) TDK Micronas GmbH     

(Current Assignee)
TDK Micronas GmbH
Filter combination for sampling rate conversion second energy, current frame energy input low pass filter, fourth data
term value transfer function
frequency bin basis sampling rate
second order second order
pole filter output data
first frequency one second
noise character parameter said time
average signal one half
XXXXXXXX
151

US6061456A

(Douglas Andrea, 2000)
(Original Assignee) Andrea Electronics Corp     

(Current Assignee)
Andrea Electronics Corp
Noise cancellation apparatus frequency spectrum filter characteristic, phase response
average signal second microphones
sound signal, sound signal prevents updating electric signals
noise character reduction system
prevent update determined angle
linear prediction ambient noise
dependent signal, frequency dependent signal second term, first term
XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
152

US6108626A

(Luca Cellario, 2000)
(Original Assignee) Robert Bosch GmbH; Centro Studi e Laboratori Telecomunicazioni SpA (CSELT)     

(Current Assignee)
CSELT- CENTRO STUDI E LABORATORI TELECOMUNICAZIONI SpA ; Robert Bosch GmbH ; Centro Studi e Laboratori Telecomunicazioni SpA (CSELT) ; Nuance Communications Inc
Object oriented audio coding frequency spectrum linear prediction analysis
frequency bands, first frequency bands different frequency band, predetermined bandwidth
linear prediction, activity prediction parameter classification results, speech signal
residual error energy threshold
current frame given frequency
noise energy estimates said second set
second order following steps
adaptive threshold sampling means
updating noise energy estimates different band
first energy first energy
second group second group, steps a
first group first group
XXXXXXXXXXXXXXXX
153

US5983139A

(Clemens Zierhofer, 1999)
(Original Assignee) MED EL Elektromedizinische Geraete GmbH     

(Current Assignee)
MED EL Elektromedizinische Geraete GmbH
Cochlear implant system first frequency bands binary sequence
pitch stability selected time
pole filter pass filters
average signal one half
XXX
154

US7016507B1

(Robert Brennan, 2006)
(Original Assignee) Ami Semiconductor Inc     

(Current Assignee)
Ami Semiconductor Inc ; BANK OF NOVA SCOTIA
Method and apparatus for noise reduction particularly in hearing aids term signal amplification signal
term value partial correlation, low signal
sound activity high frequencies
first frequency, activity prediction parameter successive time
frequency bin periodic signal
sound signal, music signal second noise, noise filter
frequency spectrum time frames
pole filter first block
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
155

US6317501B1

(Naoshi Matsuo, 2001)
(Original Assignee) Fujitsu Ltd     

(Current Assignee)
Fujitsu Ltd
Microphone array apparatus sound signal, sound activity microphone array
term value delay units
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
156

US6134518A

(Gilad Cohen, 2000)
(Original Assignee) International Business Machines Corp     

(Current Assignee)
Cisco Technology Inc
Digital audio signal coding using a CELP coder and a transform coder correlation value correlation value, more threshold
second group, sound activity detector signal processor, decoding method
frequency bin basis method steps
XXXXXXXX
157

US5978824A

(Shigeji Ikeda, 1999)
(Original Assignee) NEC Corp     

(Current Assignee)
NEC Corp
Noise canceler initial value, correlation value first maximum value
background noise signal second noise signal, first noise signal
noise energy estimates, noise estimates adaptive filter
SNR LT, SNR calculation noise power
linear prediction first power
first frequency time t
XXXXXXXXXXXXXXXXXX
158

US6070137A

(Leland S. Bloebaum, 2000)
(Original Assignee) Ericsson Inc     

(Current Assignee)
Ericsson Inc
Integrated frequency-domain voice coding using an adaptive spectral enhancement filter binary decision linear prediction coefficient
music signal, background noise signal analog audio signal
noise character noise character
adaptive threshold removing noise
noise ratio square root
XXXXXXXXXXX
159

US6018706A

(Jian-Cheng Huang, 2000)
(Original Assignee) Motorola Solutions Inc     

(Current Assignee)
Google Technology Holdings LLC
Pitch determiner for a speech analyzer second group absolute value
pole filter second pitch, first pitch
X
160

US5974380A

(Stephen Malcolm Smyth, 1999)
(Original Assignee) Digital Theater Systems Inc     

(Current Assignee)
DTS LLC
Multi-channel audio decoder adaptive threshold differential pulse
current residual spectrum frequency subbands
frequency bands, first frequency bands band frequency
XXXXXXXXXX
161

US5960389A

(Kari Jarvinen, 1999)
(Original Assignee) Nokia Mobile Phones Ltd     

(Current Assignee)
Nokia Technologies Oy
Methods for generating comfort noise during discontinuous transmission adaptive threshold predetermined threshold value
average frame quantization index
linear prediction other parameter
noise ratio gain parameter
linear prediction residual error energies CN parameters
second order second order
SNR calculation ordered set
frequency bins, pole filter odd number
XXXXXXXXX
162

US6140809A

(Wataru Doi, 2000)
(Original Assignee) Advantest Corp     

(Current Assignee)
Advantest Corp
Spectrum analyzer second energy value second intermediate
frequency spectrum frequency spectrum
background noise signal vertical blanking
average signal bandpass filter
first frequency first frequency
prevent update further control
current residual spectrum frequency band
first group one time slot
frequency bin basis monitor means
initial value first stage
noise character parameter said time
XXXXXXXXXXXXXXXXX
163

US5943429A

(Peter Handel, 1999)
(Original Assignee) Telefonaktiebolaget LM Ericsson AB     

(Current Assignee)
Telefonaktiebolaget LM Ericsson AB
Spectral subtraction noise suppression method initial value power spectral density
first group weighting function
XXXX
164

US6430295B1

(Peter Händel, 2002)
(Original Assignee) Telefonaktiebolaget LM Ericsson AB     

(Current Assignee)
Telefonaktiebolaget LM Ericsson AB
Methods and apparatus for measuring signal level and delay at multiple sensors second group, sound activity detector signal processor, signal source
current residual spectrum frequency band
noise ratio, SNR LT noise ratio
XXXXXXXXXXXX
165

US6072881A

(Frank X. Linder, 2000)
(Original Assignee) Chiefs Voice Inc     

(Current Assignee)
Chiefs Voice Inc
Microphone noise rejection system two consecutive minima characteristic frequency, half period
first energy, first energy value log information
adaptive threshold A/D converter
frequency bins, pole filter odd number
XXXXXXX
166

US5911128A

(Andrew P. DeJaco, 1999)
(Original Assignee) Dejaco; Andrew P.     Method and apparatus for performing speech frame encoding mode selection in a variable rate encoding system frequency bins high frequency components
average frame energy average frame energy
current frame, current frame energy third threshold
SNR calculation rate selection
activity prediction parameter previous frame
XXXXXXXXXXXXXXX
167

US5933495A

(Stephen S. Oh, 1999)
(Original Assignee) Texas Instruments Inc     

(Current Assignee)
Texas Instruments Inc
Subband acoustic noise suppression noise energy estimates, noise estimates adaptive filter
sound activity detection speech detector
linear prediction residual error energies input terminals
sound activity detector signal source
frequency bands, first frequency bands band signal, band noise
XXXXXXXXXXXXXXXXX
168

US5845243A

(Kevin Smart, 1998)
(Original Assignee) U S Robotics Mobile Communications Corp     

(Current Assignee)
HP Inc ; U S Robotics Mobile Communications Corp ; Hewlett Packard Enterprise Development LP
Method and apparatus for wavelet based data compression having adaptive bit rate control for compression of audio information adaptive threshold quantized coefficient
second energy, second energy value digital audio
music signal sampled data
current frame data frames
pole filter output data
XXXXXXXXXXXXXXX
169

JPH10214100A

(Kazuyuki Iijima, 1998)
(Original Assignee) Sony Corp; ソニー株式会社     音声合成方法 first group えること
sound signal 音声信号
tonal stability tonal stability estimator フレーム
XXXXXXXXXXXXXXXXXXXXXXXXXX
170

US6097820A

(Michael D. Turner, 2000)
(Original Assignee) Nokia of America Corp     

(Current Assignee)
Nokia of America Corp
System and method for suppressing noise in digitally represented voice signals frequency bin fast Fourier transform
current frame energy noise suppressor
second energy, second energy value digital audio
noise character parameter, activity prediction parameter speech signal, said time
pole filter pass filters
noise ratio, SNR LT noise ratio
noise estimates, noise estimator said model
XXXXXXXXXXXXX
171

US6570991B1

(Eric D. Scheirer, 2003)
(Original Assignee) Interval Research Corp     

(Current Assignee)
Vulcan Patents LLC
Multi-feature speech/music discrimination system average frame energy dimensional feature space, feature values
correlation value successive frames
frequency bands frequency bands
first frequency bands containing data
first frequency said series
XXXX
172

US5839101A

(Antti Vahatalo, 1998)
(Original Assignee) Nokia Mobile Phones Ltd     

(Current Assignee)
Nokia Technologies Oy
Noise suppressor and method for suppressing background noise in noisy speech, and a mobile station frequency bins noise estimation
average signal, sound signal noise component
first frequency first frequency
adaptive threshold sampling means
noise estimator domain signal
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
SPEECH COMMUNICATION. 30 (4): 207-221 APR 2000

Publication Year: 2000

Overlap-add Methods For Time-scaling Of Speech

Katholieke Universiteit Leuven (KU Leuven)

Verhelst
US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (initial phase) ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
Overlap-add Methods For Time-scaling Of Speech . In this tutorial on time-scaling we follow one particular line of thought towards computationally efficient high quality methods . We favor time-scaling based on time-frequency representations over model based approaches , and proceed to review an iterative phase reconstruction method for time-scaled magnitude spectrograms . The search for a good initial phase (background noise signal) estimate leads us to consider synchronized overlap-add methods which are further optimized to eventually arrive at WSOLA , a technique based on a waveform similarity criterion . (C) 2000 Elsevier Science B . V . All rights reserved .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order (reconstruction method) and a sixteenth order of linear prediction residual error energies .
Overlap-add Methods For Time-scaling Of Speech . In this tutorial on time-scaling we follow one particular line of thought towards computationally efficient high quality methods . We favor time-scaling based on time-frequency representations over model based approaches , and proceed to review an iterative phase reconstruction method (second order) for time-scaled magnitude spectrograms . The search for a good initial phase estimate leads us to consider synchronized overlap-add methods which are further optimized to eventually arrive at WSOLA , a technique based on a waveform similarity criterion . (C) 2000 Elsevier Science B . V . All rights reserved .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal (initial phase) and prevent update of noise energy estimates on the music signal .
Overlap-add Methods For Time-scaling Of Speech . In this tutorial on time-scaling we follow one particular line of thought towards computationally efficient high quality methods . We favor time-scaling based on time-frequency representations over model based approaches , and proceed to review an iterative phase reconstruction method for time-scaled magnitude spectrograms . The search for a good initial phase (background noise signal) estimate leads us to consider synchronized overlap-add methods which are further optimized to eventually arrive at WSOLA , a technique based on a waveform similarity criterion . (C) 2000 Elsevier Science B . V . All rights reserved .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (initial phase) ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
Overlap-add Methods For Time-scaling Of Speech . In this tutorial on time-scaling we follow one particular line of thought towards computationally efficient high quality methods . We favor time-scaling based on time-frequency representations over model based approaches , and proceed to review an iterative phase reconstruction method for time-scaled magnitude spectrograms . The search for a good initial phase (background noise signal) estimate leads us to consider synchronized overlap-add methods which are further optimized to eventually arrive at WSOLA , a technique based on a waveform similarity criterion . (C) 2000 Elsevier Science B . V . All rights reserved .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal (initial phase) ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
Overlap-add Methods For Time-scaling Of Speech . In this tutorial on time-scaling we follow one particular line of thought towards computationally efficient high quality methods . We favor time-scaling based on time-frequency representations over model based approaches , and proceed to review an iterative phase reconstruction method for time-scaled magnitude spectrograms . The search for a good initial phase (background noise signal) estimate leads us to consider synchronized overlap-add methods which are further optimized to eventually arrive at WSOLA , a technique based on a waveform similarity criterion . (C) 2000 Elsevier Science B . V . All rights reserved .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal (initial phase) and preventing update of noise energy estimates .
Overlap-add Methods For Time-scaling Of Speech . In this tutorial on time-scaling we follow one particular line of thought towards computationally efficient high quality methods . We favor time-scaling based on time-frequency representations over model based approaches , and proceed to review an iterative phase reconstruction method for time-scaled magnitude spectrograms . The search for a good initial phase (background noise signal) estimate leads us to consider synchronized overlap-add methods which are further optimized to eventually arrive at WSOLA , a technique based on a waveform similarity criterion . (C) 2000 Elsevier Science B . V . All rights reserved .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
IEEE TRANSACTIONS ON SIGNAL PROCESSING. 39 (12): 2573-2592 DEC 1991

Publication Year: 1991

A TIME-DOMAIN DIGITAL COCHLEAR MODEL

Center for Res. in Speech & Hearing Sci., City Univ of New York, NY, USA

Kates
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (traveling wave) of the long term correlation map .
A TIME-DOMAIN DIGITAL COCHLEAR MODEL . This paper presents a digital time-domain model of the human cochlea designed to represent normal auditory functioning and to allow for degradation related to auditory impairment . The model consists of the middle ear , the mechanical motion of the cochlea , and the neural transduction of the inner hair cells . The traveling wave (initial value) s on the cochlear partition are represented by a cascade of digital filter sections , and the cochlear micromechanics are represented by a second filter that further sharpens the excitation to the inner hair cells . The neural firing rate is determined by the sum of the outputs of multiple fibers attached to each inner hair cell , with the fiber neurons having firing characteristics representative of low- and high-spontaneous rate fibers . The cochlear model incorporates dynamic-range compression by adjusting the Q of each cochlear filter section and second filter in response to the second-filter velocity and the averaged neural firing rate . Examples of the model response to impulse and tone-burst stimuli and to synthetic speech are presented .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values (filter section) so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
A TIME-DOMAIN DIGITAL COCHLEAR MODEL . This paper presents a digital time-domain model of the human cochlea designed to represent normal auditory functioning and to allow for degradation related to auditory impairment . The model consists of the middle ear , the mechanical motion of the cochlea , and the neural transduction of the inner hair cells . The traveling waves on the cochlear partition are represented by a cascade of digital filter section (second energy values) s , and the cochlear micromechanics are represented by a second filter that further sharpens the excitation to the inner hair cells . The neural firing rate is determined by the sum of the outputs of multiple fibers attached to each inner hair cell , with the fiber neurons having firing characteristics representative of low- and high-spontaneous rate fibers . The cochlear model incorporates dynamic-range compression by adjusting the Q of each cochlear filter section and second filter in response to the second-filter velocity and the averaged neural firing rate . Examples of the model response to impulse and tone-burst stimuli and to synthetic speech are presented .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (traveling wave) of the long-term correlation map .
A TIME-DOMAIN DIGITAL COCHLEAR MODEL . This paper presents a digital time-domain model of the human cochlea designed to represent normal auditory functioning and to allow for degradation related to auditory impairment . The model consists of the middle ear , the mechanical motion of the cochlea , and the neural transduction of the inner hair cells . The traveling wave (initial value) s on the cochlear partition are represented by a cascade of digital filter sections , and the cochlear micromechanics are represented by a second filter that further sharpens the excitation to the inner hair cells . The neural firing rate is determined by the sum of the outputs of multiple fibers attached to each inner hair cell , with the fiber neurons having firing characteristics representative of low- and high-spontaneous rate fibers . The cochlear model incorporates dynamic-range compression by adjusting the Q of each cochlear filter section and second filter in response to the second-filter velocity and the averaged neural firing rate . Examples of the model response to impulse and tone-burst stimuli and to synthetic speech are presented .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (traveling wave) of the long-term correlation map .
A TIME-DOMAIN DIGITAL COCHLEAR MODEL . This paper presents a digital time-domain model of the human cochlea designed to represent normal auditory functioning and to allow for degradation related to auditory impairment . The model consists of the middle ear , the mechanical motion of the cochlea , and the neural transduction of the inner hair cells . The traveling wave (initial value) s on the cochlear partition are represented by a cascade of digital filter sections , and the cochlear micromechanics are represented by a second filter that further sharpens the excitation to the inner hair cells . The neural firing rate is determined by the sum of the outputs of multiple fibers attached to each inner hair cell , with the fiber neurons having firing characteristics representative of low- and high-spontaneous rate fibers . The cochlear model incorporates dynamic-range compression by adjusting the Q of each cochlear filter section and second filter in response to the second-filter velocity and the averaged neural firing rate . Examples of the model response to impulse and tone-burst stimuli and to synthetic speech are presented .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING. 14 (4): 1218-1234 JUL 2006

Publication Year: 2006

New Insights Into The Noise Reduction Wiener Filter

Lucent Technologies, Inc., Université du Québec, Katholieke Universiteit Leuven (KU Leuven)

Chen, Benesty, Huang, Doclo
US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (noise reduction) indicative of sound activity in the sound signal .
New Insights Into The Noise Reduction Wiener Filter . The problem of noise reduction (adaptive threshold) has attracted a considerable amount of research attention over the past several decades . Among the numerous techniques that were developed , the optimal Wiener filter can be considered as one of the most fundamental noise reduction approaches , which has been delineated in different forms and adopted in various applications . Although it is not a secret that the Wiener filter may cause some detrimental effects to the speech signal (appreciable or even significant degradation in quality or intelligibility) , few efforts have been reported to show the inherent relationship between noise reduction and speech distortion . By defining a speech-distortion index to measure the degree to which the speech signal is deformed and two noise-reduction factors to quantify the amount of noise being attenuated , this paper studies-the quantitative performance behavior of the Wiener filter in the context of noise reduction . We show that in the single-channel case the a posteriori signal-to-noise ratio (SNR) (defined after the Wiener filter) is greater than or equal to the a priori SNR (defined before the Wiener filter) , indicating that the Wiener filter is always able to achieve noise reduction . However , the amount of noise reduction is in general proportional to the amount of speech degradation . This may seem discouraging as we always expect an algorithm to have maximal noise reduction without much speech distortion . Fortunately , we show that speech distortion can be better managed in three different ways . If we have some a priori knowledge (such as the linear prediction coefficients) of the clean speech signal , this a priori knowledge can be exploited to achieve noise reduction while maintaining a low level of speech distortion . When no a priori knowledge is available , we can still achieve a better control of noise reduction and speech distortion by properly manipulating the Wiener filter , resulting in a suboptimal Wiener filter . In case that we have multiple microphone sensors , the multiple observations of the speech signal can be used to reduce noise with less or even no speech distortion .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (noise ratio) .
New Insights Into The Noise Reduction Wiener Filter . The problem of noise reduction has attracted a considerable amount of research attention over the past several decades . Among the numerous techniques that were developed , the optimal Wiener filter can be considered as one of the most fundamental noise reduction approaches , which has been delineated in different forms and adopted in various applications . Although it is not a secret that the Wiener filter may cause some detrimental effects to the speech signal (appreciable or even significant degradation in quality or intelligibility) , few efforts have been reported to show the inherent relationship between noise reduction and speech distortion . By defining a speech-distortion index to measure the degree to which the speech signal is deformed and two noise-reduction factors to quantify the amount of noise being attenuated , this paper studies-the quantitative performance behavior of the Wiener filter in the context of noise reduction . We show that in the single-channel case the a posteriori signal-to-noise ratio (noise ratio, SNR LT, SNR calculation) (SNR) (defined after the Wiener filter) is greater than or equal to the a priori SNR (defined before the Wiener filter) , indicating that the Wiener filter is always able to achieve noise reduction . However , the amount of noise reduction is in general proportional to the amount of speech degradation . This may seem discouraging as we always expect an algorithm to have maximal noise reduction without much speech distortion . Fortunately , we show that speech distortion can be better managed in three different ways . If we have some a priori knowledge (such as the linear prediction coefficients) of the clean speech signal , this a priori knowledge can be exploited to achieve noise reduction while maintaining a low level of speech distortion . When no a priori knowledge is available , we can still achieve a better control of noise reduction and speech distortion by properly manipulating the Wiener filter , resulting in a suboptimal Wiener filter . In case that we have multiple microphone sensors , the multiple observations of the speech signal can be used to reduce noise with less or even no speech distortion .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (speech signal) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
New Insights Into The Noise Reduction Wiener Filter . The problem of noise reduction has attracted a considerable amount of research attention over the past several decades . Among the numerous techniques that were developed , the optimal Wiener filter can be considered as one of the most fundamental noise reduction approaches , which has been delineated in different forms and adopted in various applications . Although it is not a secret that the Wiener filter may cause some detrimental effects to the speech signal (noise character parameter, activity prediction parameter) (appreciable or even significant degradation in quality or intelligibility) , few efforts have been reported to show the inherent relationship between noise reduction and speech distortion . By defining a speech-distortion index to measure the degree to which the speech signal is deformed and two noise-reduction factors to quantify the amount of noise being attenuated , this paper studies-the quantitative performance behavior of the Wiener filter in the context of noise reduction . We show that in the single-channel case the a posteriori signal-to-noise ratio (SNR) (defined after the Wiener filter) is greater than or equal to the a priori SNR (defined before the Wiener filter) , indicating that the Wiener filter is always able to achieve noise reduction . However , the amount of noise reduction is in general proportional to the amount of speech degradation . This may seem discouraging as we always expect an algorithm to have maximal noise reduction without much speech distortion . Fortunately , we show that speech distortion can be better managed in three different ways . If we have some a priori knowledge (such as the linear prediction coefficients) of the clean speech signal , this a priori knowledge can be exploited to achieve noise reduction while maintaining a low level of speech distortion . When no a priori knowledge is available , we can still achieve a better control of noise reduction and speech distortion by properly manipulating the Wiener filter , resulting in a suboptimal Wiener filter . In case that we have multiple microphone sensors , the multiple observations of the speech signal can be used to reduce noise with less or even no speech distortion .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (speech signal) indicative of an activity of the sound signal .
New Insights Into The Noise Reduction Wiener Filter . The problem of noise reduction has attracted a considerable amount of research attention over the past several decades . Among the numerous techniques that were developed , the optimal Wiener filter can be considered as one of the most fundamental noise reduction approaches , which has been delineated in different forms and adopted in various applications . Although it is not a secret that the Wiener filter may cause some detrimental effects to the speech signal (noise character parameter, activity prediction parameter) (appreciable or even significant degradation in quality or intelligibility) , few efforts have been reported to show the inherent relationship between noise reduction and speech distortion . By defining a speech-distortion index to measure the degree to which the speech signal is deformed and two noise-reduction factors to quantify the amount of noise being attenuated , this paper studies-the quantitative performance behavior of the Wiener filter in the context of noise reduction . We show that in the single-channel case the a posteriori signal-to-noise ratio (SNR) (defined after the Wiener filter) is greater than or equal to the a priori SNR (defined before the Wiener filter) , indicating that the Wiener filter is always able to achieve noise reduction . However , the amount of noise reduction is in general proportional to the amount of speech degradation . This may seem discouraging as we always expect an algorithm to have maximal noise reduction without much speech distortion . Fortunately , we show that speech distortion can be better managed in three different ways . If we have some a priori knowledge (such as the linear prediction coefficients) of the clean speech signal , this a priori knowledge can be exploited to achieve noise reduction while maintaining a low level of speech distortion . When no a priori knowledge is available , we can still achieve a better control of noise reduction and speech distortion by properly manipulating the Wiener filter , resulting in a suboptimal Wiener filter . In case that we have multiple microphone sensors , the multiple observations of the speech signal can be used to reduce noise with less or even no speech distortion .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (speech signal) comprises : calculating a long-term value of a binary decision (linear prediction coefficient) obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
New Insights Into The Noise Reduction Wiener Filter . The problem of noise reduction has attracted a considerable amount of research attention over the past several decades . Among the numerous techniques that were developed , the optimal Wiener filter can be considered as one of the most fundamental noise reduction approaches , which has been delineated in different forms and adopted in various applications . Although it is not a secret that the Wiener filter may cause some detrimental effects to the speech signal (noise character parameter, activity prediction parameter) (appreciable or even significant degradation in quality or intelligibility) , few efforts have been reported to show the inherent relationship between noise reduction and speech distortion . By defining a speech-distortion index to measure the degree to which the speech signal is deformed and two noise-reduction factors to quantify the amount of noise being attenuated , this paper studies-the quantitative performance behavior of the Wiener filter in the context of noise reduction . We show that in the single-channel case the a posteriori signal-to-noise ratio (SNR) (defined after the Wiener filter) is greater than or equal to the a priori SNR (defined before the Wiener filter) , indicating that the Wiener filter is always able to achieve noise reduction . However , the amount of noise reduction is in general proportional to the amount of speech degradation . This may seem discouraging as we always expect an algorithm to have maximal noise reduction without much speech distortion . Fortunately , we show that speech distortion can be better managed in three different ways . If we have some a priori knowledge (such as the linear prediction coefficient (binary decision) s) of the clean speech signal , this a priori knowledge can be exploited to achieve noise reduction while maintaining a low level of speech distortion . When no a priori knowledge is available , we can still achieve a better control of noise reduction and speech distortion by properly manipulating the Wiener filter , resulting in a suboptimal Wiener filter . In case that we have multiple microphone sensors , the multiple observations of the speech signal can be used to reduce noise with less or even no speech distortion .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (speech signal) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
New Insights Into The Noise Reduction Wiener Filter . The problem of noise reduction has attracted a considerable amount of research attention over the past several decades . Among the numerous techniques that were developed , the optimal Wiener filter can be considered as one of the most fundamental noise reduction approaches , which has been delineated in different forms and adopted in various applications . Although it is not a secret that the Wiener filter may cause some detrimental effects to the speech signal (noise character parameter, activity prediction parameter) (appreciable or even significant degradation in quality or intelligibility) , few efforts have been reported to show the inherent relationship between noise reduction and speech distortion . By defining a speech-distortion index to measure the degree to which the speech signal is deformed and two noise-reduction factors to quantify the amount of noise being attenuated , this paper studies-the quantitative performance behavior of the Wiener filter in the context of noise reduction . We show that in the single-channel case the a posteriori signal-to-noise ratio (SNR) (defined after the Wiener filter) is greater than or equal to the a priori SNR (defined before the Wiener filter) , indicating that the Wiener filter is always able to achieve noise reduction . However , the amount of noise reduction is in general proportional to the amount of speech degradation . This may seem discouraging as we always expect an algorithm to have maximal noise reduction without much speech distortion . Fortunately , we show that speech distortion can be better managed in three different ways . If we have some a priori knowledge (such as the linear prediction coefficients) of the clean speech signal , this a priori knowledge can be exploited to achieve noise reduction while maintaining a low level of speech distortion . When no a priori knowledge is available , we can still achieve a better control of noise reduction and speech distortion by properly manipulating the Wiener filter , resulting in a suboptimal Wiener filter . In case that we have multiple microphone sensors , the multiple observations of the speech signal can be used to reduce noise with less or even no speech distortion .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (speech signal) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
New Insights Into The Noise Reduction Wiener Filter . The problem of noise reduction has attracted a considerable amount of research attention over the past several decades . Among the numerous techniques that were developed , the optimal Wiener filter can be considered as one of the most fundamental noise reduction approaches , which has been delineated in different forms and adopted in various applications . Although it is not a secret that the Wiener filter may cause some detrimental effects to the speech signal (noise character parameter, activity prediction parameter) (appreciable or even significant degradation in quality or intelligibility) , few efforts have been reported to show the inherent relationship between noise reduction and speech distortion . By defining a speech-distortion index to measure the degree to which the speech signal is deformed and two noise-reduction factors to quantify the amount of noise being attenuated , this paper studies-the quantitative performance behavior of the Wiener filter in the context of noise reduction . We show that in the single-channel case the a posteriori signal-to-noise ratio (SNR) (defined after the Wiener filter) is greater than or equal to the a priori SNR (defined before the Wiener filter) , indicating that the Wiener filter is always able to achieve noise reduction . However , the amount of noise reduction is in general proportional to the amount of speech degradation . This may seem discouraging as we always expect an algorithm to have maximal noise reduction without much speech distortion . Fortunately , we show that speech distortion can be better managed in three different ways . If we have some a priori knowledge (such as the linear prediction coefficients) of the clean speech signal , this a priori knowledge can be exploited to achieve noise reduction while maintaining a low level of speech distortion . When no a priori knowledge is available , we can still achieve a better control of noise reduction and speech distortion by properly manipulating the Wiener filter , resulting in a suboptimal Wiener filter . In case that we have multiple microphone sensors , the multiple observations of the speech signal can be used to reduce noise with less or even no speech distortion .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (speech signal) inferior than a given fixed threshold .
New Insights Into The Noise Reduction Wiener Filter . The problem of noise reduction has attracted a considerable amount of research attention over the past several decades . Among the numerous techniques that were developed , the optimal Wiener filter can be considered as one of the most fundamental noise reduction approaches , which has been delineated in different forms and adopted in various applications . Although it is not a secret that the Wiener filter may cause some detrimental effects to the speech signal (noise character parameter, activity prediction parameter) (appreciable or even significant degradation in quality or intelligibility) , few efforts have been reported to show the inherent relationship between noise reduction and speech distortion . By defining a speech-distortion index to measure the degree to which the speech signal is deformed and two noise-reduction factors to quantify the amount of noise being attenuated , this paper studies-the quantitative performance behavior of the Wiener filter in the context of noise reduction . We show that in the single-channel case the a posteriori signal-to-noise ratio (SNR) (defined after the Wiener filter) is greater than or equal to the a priori SNR (defined before the Wiener filter) , indicating that the Wiener filter is always able to achieve noise reduction . However , the amount of noise reduction is in general proportional to the amount of speech degradation . This may seem discouraging as we always expect an algorithm to have maximal noise reduction without much speech distortion . Fortunately , we show that speech distortion can be better managed in three different ways . If we have some a priori knowledge (such as the linear prediction coefficients) of the clean speech signal , this a priori knowledge can be exploited to achieve noise reduction while maintaining a low level of speech distortion . When no a priori knowledge is available , we can still achieve a better control of noise reduction and speech distortion by properly manipulating the Wiener filter , resulting in a suboptimal Wiener filter . In case that we have multiple microphone sensors , the multiple observations of the speech signal can be used to reduce noise with less or even no speech distortion .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal to noise ratio (noise ratio) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
New Insights Into The Noise Reduction Wiener Filter . The problem of noise reduction has attracted a considerable amount of research attention over the past several decades . Among the numerous techniques that were developed , the optimal Wiener filter can be considered as one of the most fundamental noise reduction approaches , which has been delineated in different forms and adopted in various applications . Although it is not a secret that the Wiener filter may cause some detrimental effects to the speech signal (appreciable or even significant degradation in quality or intelligibility) , few efforts have been reported to show the inherent relationship between noise reduction and speech distortion . By defining a speech-distortion index to measure the degree to which the speech signal is deformed and two noise-reduction factors to quantify the amount of noise being attenuated , this paper studies-the quantitative performance behavior of the Wiener filter in the context of noise reduction . We show that in the single-channel case the a posteriori signal-to-noise ratio (noise ratio, SNR LT, SNR calculation) (SNR) (defined after the Wiener filter) is greater than or equal to the a priori SNR (defined before the Wiener filter) , indicating that the Wiener filter is always able to achieve noise reduction . However , the amount of noise reduction is in general proportional to the amount of speech degradation . This may seem discouraging as we always expect an algorithm to have maximal noise reduction without much speech distortion . Fortunately , we show that speech distortion can be better managed in three different ways . If we have some a priori knowledge (such as the linear prediction coefficients) of the clean speech signal , this a priori knowledge can be exploited to achieve noise reduction while maintaining a low level of speech distortion . When no a priori knowledge is available , we can still achieve a better control of noise reduction and speech distortion by properly manipulating the Wiener filter , resulting in a suboptimal Wiener filter . In case that we have multiple microphone sensors , the multiple observations of the speech signal can be used to reduce noise with less or even no speech distortion .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS. : 237-240 2004

Publication Year: 2004

Noise Suppression For Automotive Applications Based On Directional Information

TEMIC Speech Dialog Systems (SDS), Ulm (Germany)

Fuchs, Haulick, Schmidt, Ieee
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (noise attenuation) using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation (sound signal) is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal (noise attenuation) in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation (sound signal) is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (noise attenuation) .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation (sound signal) is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (noise attenuation) comprises searching in the correlation map for frequency bins having a magnitude that exceeds a given fixed threshold .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation (sound signal) is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (noise attenuation) comprises comparing the summed long-term correlation map with an adaptive threshold indicative of sound activity in the sound signal .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation (sound signal) is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal (noise attenuation) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation (sound signal) is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates (spatial information) when a tonal sound signal (noise attenuation) is detected .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation (sound signal) is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information (noise energy estimates, noise estimates, updating noise energy estimates) from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity in the sound signal (noise attenuation) further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation (sound signal) is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection comprises detecting the sound signal (noise attenuation) based on a frequency dependent signal-to-noise ratio (SNR) .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation (sound signal) is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal (noise attenuation) further comprises using noise energy estimates (spatial information) calculated in a previous frame in a SNR calculation (noise ratio) .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation (sound signal) is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio (noise ratio, SNR LT, SNR calculation) further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information (noise energy estimates, noise estimates, updating noise energy estimates) from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection further comprises updating the noise estimates (spatial information) for a next frame .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information (noise energy estimates, noise estimates, updating noise energy estimates) from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates (spatial information) for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal (noise attenuation) and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation (sound signal) is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information (noise energy estimates, noise estimates, updating noise energy estimates) from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 18
. A method as defined in claim 14 , comprising classifying the sound signal (noise attenuation) as one of an inactive sound signal and active sound signal , which comprises determining an inactive sound signal when the average signal-to-noise ratio (SNR av ) is inferior to the calculated threshold .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation (sound signal) is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 19
. A method as defined in claim 14 , comprising classifying the sound signal (noise attenuation) as one of an inactive sound signal and active sound signal , which comprises determining an active sound signal when the average signal-to-noise ratio (SNR av ) is larger than the calculated threshold .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation (sound signal) is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal (noise attenuation) prevents updating of noise energy estimates (spatial information) when a music signal is detected .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation (sound signal) is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information (noise energy estimates, noise estimates, updating noise energy estimates) from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates (spatial information) on the music signal .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information (noise energy estimates, noise estimates, updating noise energy estimates) from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame energy (adaptive beam) and an average frame energy (adaptive beam) .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beam (current frame energy, average frame energy) formers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal (noise attenuation) in a current frame and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation (sound signal) is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter indicative of an activity of the sound signal (noise attenuation) .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation (sound signal) is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal (noise attenuation) and the complementary non-stationarity parameter .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation (sound signal) is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates (spatial information) is prevented in response to having simultaneously the activity prediction parameter larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information (noise energy estimates, noise estimates, updating noise energy estimates) from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates (spatial information) is prevented in response to having the noise character parameter inferior than a given fixed threshold .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information (noise energy estimates, noise estimates, updating noise energy estimates) from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal (noise attenuation) using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation (sound signal) is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal (noise attenuation) using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation (sound signal) is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal (noise attenuation) in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation (sound signal) is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 34
. A device as defined in claim 31 , further comprising a detector of strong tones in the sound signal (noise attenuation) .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation (sound signal) is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal (noise attenuation) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation (sound signal) is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal (noise attenuation) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation (sound signal) is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal to noise ratio (noise ratio) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio (noise ratio, SNR LT, SNR calculation) further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates (spatial information) in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information (noise energy estimates, noise estimates, updating noise energy estimates) from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal (noise attenuation) for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates (spatial information) .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation (sound signal) is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information (noise energy estimates, noise estimates, updating noise energy estimates) from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .

US8990073B2
CLAIM 41
. A device as defined in claim 36 , further comprising a calculator of a spectral parameter used for detecting spectral changes and spectral attacks in the sound signal (noise attenuation) .
Noise Suppression For Automotive Applications Based On Directional Information . In noise suppression systems for automotive applications the use of adaptive beamformers has proven to be of great potential . Nevertheless , in diffuse noise fields the amount of noise attenuation (sound signal) is rather limited and depends on the number of microphones . In order to enhance the signal-to-noise ratio further , additional classical noise suppression schemes , like spectral subtraction , are often applied . Unfortunately , these schemes tend to introduce either speech distortions or leave a large amount of residual noise . In this paper we describe a method of extracting additional spatial information from a conventional beamformer in generalized sidelobe structure . This spatial information can be utilized , e . g . , to control parameters like overestimation or spectral floor of classical noise suppression schemes in a frequency selective manner or to compute a simple attenuation factor for suppressing nonstationary noise . An outlook is given on further usage of the spatial information in other algorithmic parts of a hands-free telephone or a speech recognition system .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
IEEE TRANSACTIONS ON SIGNAL PROCESSING. 52 (5): 1149-1160 MAY 2004

Publication Year: 2004

Multichannel Post-filtering In Nonstationary Noise Environments

The Technion – Israel Institute of Technology (הטכניון – מכון טכנולוגי לישראל)

Cohen
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (noise component) using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (power spectral density) of the long term correlation map .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density (initial value) , and the clean signal . The proposed method is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal (noise component) in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (noise component) .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (noise component) comprises searching in the correlation map for frequency bins having a magnitude that exceeds a given fixed threshold .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (noise component) comprises comparing the summed long-term correlation map with an adaptive threshold indicative of sound activity (noise component) in the sound signal .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 10
. A method for detecting sound activity (noise component) in a sound signal (noise component) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates (proposed method) when a tonal sound signal (noise component) is detected .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method (noise energy estimates, second energy values) is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity (noise component) in the sound signal (noise component) further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (noise component) detection comprises detecting the sound signal (noise component) based on a frequency dependent signal-to-noise ratio (SNR) .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 14
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (noise component) detection comprises comparing an average signal-to-noise ratio (SNR av ) to a threshold calculated as a function of a long-term signal-to-noise ratio (SNR LT ) .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity (noise component) detection in the sound signal (noise component) further comprises using noise energy estimates (proposed method) calculated in a previous frame in a SNR calculation (noise environment, noise power) .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environment (SNR calculation, SNR LT) s . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power (SNR calculation, SNR LT) spectral density , and the clean signal . The proposed method (noise energy estimates, second energy values) is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity (noise component) detection further comprises updating the noise estimates for a next frame .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates (proposed method) for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal (noise component) and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method (noise energy estimates, second energy values) is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 18
. A method as defined in claim 14 , comprising classifying the sound signal (noise component) as one of an inactive sound signal and active sound signal , which comprises determining an inactive sound signal when the average signal-to-noise ratio (SNR av ) is inferior to the calculated threshold .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 19
. A method as defined in claim 14 , comprising classifying the sound signal (noise component) as one of an inactive sound signal and active sound signal , which comprises determining an active sound signal when the average signal-to-noise ratio (SNR av ) is larger than the calculated threshold .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal (noise component) prevents updating of noise energy estimates (proposed method) when a music signal is detected .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method (noise energy estimates, second energy values) is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates (proposed method) on the music signal .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method (noise energy estimates, second energy values) is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal (noise component) in a current frame and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter indicative of an activity of the sound signal (noise component) .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal (noise component) and the complementary non-stationarity parameter .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates (proposed method) is prevented in response to having simultaneously the activity prediction parameter larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method (noise energy estimates, second energy values) is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values (proposed method) so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method (noise energy estimates, second energy values) is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates (proposed method) is prevented in response to having the noise character parameter inferior than a given fixed threshold .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method (noise energy estimates, second energy values) is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal (noise component) using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (power spectral density) of the long-term correlation map .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density (initial value) , and the clean signal . The proposed method is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal (noise component) using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (power spectral density) of the long-term correlation map .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density (initial value) , and the clean signal . The proposed method is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal (noise component) in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 34
. A device as defined in claim 31 , further comprising a detector of strong tones in the sound signal (noise component) .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 35
. A device for detecting sound activity (noise component) in a sound signal (noise component) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 36
. A device for detecting sound activity (noise component) in a sound signal (noise component) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 37
. A device as defined in claim 36 , further comprising a signal-to-noise ratio (SNR)-based sound activity (noise component) detector .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity (noise component) detector comprises a comparator of an average signal (noise component) to noise ratio (clean signal) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal (noise ratio) . The proposed method is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates (proposed method) in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity (noise component) detector .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method (noise energy estimates, second energy values) is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal (noise component) for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates (proposed method) .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method (noise energy estimates, second energy values) is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .

US8990073B2
CLAIM 41
. A device as defined in claim 36 , further comprising a calculator of a spectral parameter used for detecting spectral changes and spectral attacks in the sound signal (noise component) .
Multichannel Post-filtering In Nonstationary Noise Environments . In this paper , we present a multichannel post-filtering approach for minimizing the log-spectral amplitude distortion in nonstationary noise environments . The beamformer is realistically assumed to have a steering error , a blocking matrix that is unable to block all of the desired signal components , and a noise canceller that is adapted to the pseudo-stationary noise but not modified during transient interferences . A mild assumption is made that a desired signal component is stronger at the beamformer output than at any reference noise signal , and a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) is strongest at one of the reference signals . The ratio between the transient power at the beamformer output and the transient power at the reference noise signals is used to indicate whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , we derive estimators for the signal presence probability , the noise power spectral density , and the clean signal . The proposed method is tested in various nonstationary noise environments . Compared with single-channel post-filtering , a significantly reduced level of nonstationary noise is achieved without further distorting the desired signal components .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS. : 901-904 2002

Publication Year: 2002

Microphone Array Post-filtering For Non-stationary Noise Suppression

Lamar Signal Processing Ltd

Cohen, Berdugo, Ieee, Ieee
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (microphone array, noise component) using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (additional reduction) of the long term correlation map .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction (initial value) of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal (microphone array, noise component) in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method .

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (microphone array, noise component) .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (microphone array, noise component) comprises searching in the correlation map for frequency bins having a magnitude that exceeds a given fixed threshold .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (microphone array, noise component) comprises comparing the summed long-term correlation map with an adaptive threshold indicative of sound activity (microphone array, noise component) in the sound signal .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method .

US8990073B2
CLAIM 10
. A method for detecting sound activity (microphone array, noise component) in a sound signal (microphone array, noise component) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates (proposed method) when a tonal sound signal (microphone array, noise component) is detected .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method (noise energy estimates, second energy values) .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity (microphone array, noise component) in the sound signal (microphone array, noise component) further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (microphone array, noise component) detection comprises detecting the sound signal (microphone array, noise component) based on a frequency dependent signal-to-noise ratio (SNR) .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method .

US8990073B2
CLAIM 14
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (microphone array, noise component) detection comprises comparing an average signal-to-noise ratio (SNR av ) to a threshold calculated as a function of a long-term signal-to-noise ratio (SNR LT ) .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity (microphone array, noise component) detection in the sound signal (microphone array, noise component) further comprises using noise energy estimates (proposed method) calculated in a previous frame in a SNR calculation .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method (noise energy estimates, second energy values) .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity (microphone array, noise component) detection further comprises updating the noise estimates for a next frame .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates (proposed method) for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal (microphone array, noise component) and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method (noise energy estimates, second energy values) .

US8990073B2
CLAIM 18
. A method as defined in claim 14 , comprising classifying the sound signal (microphone array, noise component) as one of an inactive sound signal and active sound signal , which comprises determining an inactive sound signal when the average signal-to-noise ratio (SNR av ) is inferior to the calculated threshold .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method .

US8990073B2
CLAIM 19
. A method as defined in claim 14 , comprising classifying the sound signal (microphone array, noise component) as one of an inactive sound signal and active sound signal , which comprises determining an active sound signal when the average signal-to-noise ratio (SNR av ) is larger than the calculated threshold .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal (microphone array, noise component) prevents updating of noise energy estimates (proposed method) when a music signal is detected .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method (noise energy estimates, second energy values) .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates (proposed method) on the music signal .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise components at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method (noise energy estimates, second energy values) .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame energy (adaptive beam) and an average frame energy (adaptive beam) .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise components at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array post-filtering approach , applicable to adaptive beam (current frame energy, average frame energy) former , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal (microphone array, noise component) in a current frame and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter indicative of an activity of the sound signal (microphone array, noise component) .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal (microphone array, noise component) and the complementary non-stationarity parameter .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates (proposed method) is prevented in response to having simultaneously the activity prediction parameter larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise components at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method (noise energy estimates, second energy values) .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values (proposed method) so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise components at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method (noise energy estimates, second energy values) .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates (proposed method) is prevented in response to having the noise character parameter inferior than a given fixed threshold .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise components at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method (noise energy estimates, second energy values) .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal (microphone array, noise component) using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (additional reduction) of the long-term correlation map .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction (initial value) of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal (microphone array, noise component) using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (additional reduction) of the long-term correlation map .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction (initial value) of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal (microphone array, noise component) in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method .

US8990073B2
CLAIM 34
. A device as defined in claim 31 , further comprising a detector of strong tones in the sound signal (microphone array, noise component) .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method .

US8990073B2
CLAIM 35
. A device for detecting sound activity (microphone array, noise component) in a sound signal (microphone array, noise component) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method .

US8990073B2
CLAIM 36
. A device for detecting sound activity (microphone array, noise component) in a sound signal (microphone array, noise component) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method .

US8990073B2
CLAIM 37
. A device as defined in claim 36 , further comprising a signal-to-noise ratio (SNR)-based sound activity (microphone array, noise component) detector .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity (microphone array, noise component) detector comprises a comparator of an average signal (microphone array, noise component) to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates (proposed method) in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity (microphone array, noise component) detector .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method (noise energy estimates, second energy values) .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal (microphone array, noise component) for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates (proposed method) .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method (noise energy estimates, second energy values) .

US8990073B2
CLAIM 41
. A device as defined in claim 36 , further comprising a calculator of a spectral parameter used for detecting spectral changes and spectral attacks in the sound signal (microphone array, noise component) .
Microphone Array Post-filtering For Non-stationary Noise Suppression . Microphone array post-filtering allows additional reduction of noise component (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) s at a beamformer output . Existing techniques are either restricted to classical delay-and-sum beamformers , or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components . In this paper , we introduce a microphone array (sound signal, sound activity, sound activity detector, average signal, sound activity detection, detecting sound activity, sound signal prevents updating) post-filtering approach , applicable to adaptive beamformer , that differentiates nonstationary noise components from speech components . The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering . Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique , a significantly reduced level of non-stationary noise is achieved without further distorting speech components . Experimental results demonstrate the effectiveness of the proposed method .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA. 110 (6): 3218-3231 DEC 2001

Publication Year: 2001

A Two-microphone Dual Delay-line Approach For Extraction Of A Speech Sound In The Presence Of Multiple Interferers

University of Illinois, Motorola Labs

Liu, Wheeler, O'brien, Lansing, Bilger, Jones, Feng
US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame energy and an average frame energy (sound sources) .
A Two-microphone Dual Delay-line Approach For Extraction Of A Speech Sound In The Presence Of Multiple Interferers . This paper describes algorithms for signal extraction for use as a front-end of telecommunication devices , speech recognition systems , as well as hearing aids that operate in noisy environment . The development was based on some independent , hypothesized theories of the computational mechanics of biological systems in which directional hearing is enabled mainly by binaural processing of interaural directional cues . Our system uses two microphones as input devices and a signal processing method based on the two input channels . The signal processing procedure comprises two major stages : (i) source localization , and (ii) cancellation of noise sources based on knowledge of the locations of all sound sources (average frame energy) . The source localization , detailed in our previous paper [Liu et al . , J . Acoust . Soc . Am . 108 , 1888 (2000)] , was based on a well-recognized biological architecture comprising a dual delay-line and a coincidence detection mechanism . This paper focuses on description of the noise cancellation stage . We designed a simple subtraction method which , when strategically employed over the dual delay-line structure in the broadband manner , can effectively cancel multiple interfering sound sources and consequently enhance the desired signal . We obtained an 8-10 dB enhancement for the desired speech in the situations of four talkers in the anechoic acoustic test (or 7-10 dB enhancement in the situations of six talkers in the computer simulation) when all the sounds were equally intense and temporally aligned . (C) 2001 Acoustical Society of America .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING. 8 (2): 146-158 MAR 2000

Publication Year: 2000

Blind Adaptive Filtering Of Speech From Noise Of Unknown Spectrum Using A Virtual Feedback Configuration

University of Illinois

Graupe, Veselinovic
US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates (adaptive filter) when a tonal sound signal is detected .
Blind Adaptive Filtering Of Speech From Noise Of Unknown Spectrum Using A Virtual Feedback Configuration . The paper describes a single-receiver blind adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) (BAF) of speech from noise where neither speech nor noise are accessible , nor are their parameters known . The only prior knowledge employed by the BAF is that human speech is nonstationary whereas the noise is assumed to be quasistationary , i . e . , stationary over a longer interval than that of any speech phoneme , The BAF has a four subsystem structure . The system consists of an identifying subsystem that is followed by a speech/noise parameter-separator , The noise is identified based on the stationarity features of speech and noise . A feedforward subsystem sets optimization neighborhood to the virtual feedback subsystem where a cost-functional is minimized to jointly minimize the stationary part of the output while maximizing its nonstationary part , The system has been tested for performance for different signal to noise ratios (SNR) and for different types of noise parameters . Improvements for various noises range from 14-36 dB for -20 dB SNR inputs .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates (adaptive filter) calculated in a previous frame in a SNR calculation (noise ratio) .
Blind Adaptive Filtering Of Speech From Noise Of Unknown Spectrum Using A Virtual Feedback Configuration . The paper describes a single-receiver blind adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) (BAF) of speech from noise where neither speech nor noise are accessible , nor are their parameters known . The only prior knowledge employed by the BAF is that human speech is nonstationary whereas the noise is assumed to be quasistationary , i . e . , stationary over a longer interval than that of any speech phoneme , The BAF has a four subsystem structure . The system consists of an identifying subsystem that is followed by a speech/noise parameter-separator , The noise is identified based on the stationarity features of speech and noise . A feedforward subsystem sets optimization neighborhood to the virtual feedback subsystem where a cost-functional is minimized to jointly minimize the stationary part of the output while maximizing its nonstationary part , The system has been tested for performance for different signal to noise ratio (noise ratio, SNR LT, SNR calculation) s (SNR) and for different types of noise parameters . Improvements for various noises range from 14-36 dB for -20 dB SNR inputs .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection further comprises updating the noise estimates (adaptive filter) for a next frame .
Blind Adaptive Filtering Of Speech From Noise Of Unknown Spectrum Using A Virtual Feedback Configuration . The paper describes a single-receiver blind adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) (BAF) of speech from noise where neither speech nor noise are accessible , nor are their parameters known . The only prior knowledge employed by the BAF is that human speech is nonstationary whereas the noise is assumed to be quasistationary , i . e . , stationary over a longer interval than that of any speech phoneme , The BAF has a four subsystem structure . The system consists of an identifying subsystem that is followed by a speech/noise parameter-separator , The noise is identified based on the stationarity features of speech and noise . A feedforward subsystem sets optimization neighborhood to the virtual feedback subsystem where a cost-functional is minimized to jointly minimize the stationary part of the output while maximizing its nonstationary part , The system has been tested for performance for different signal to noise ratios (SNR) and for different types of noise parameters . Improvements for various noises range from 14-36 dB for -20 dB SNR inputs .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates (adaptive filter) for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
Blind Adaptive Filtering Of Speech From Noise Of Unknown Spectrum Using A Virtual Feedback Configuration . The paper describes a single-receiver blind adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) (BAF) of speech from noise where neither speech nor noise are accessible , nor are their parameters known . The only prior knowledge employed by the BAF is that human speech is nonstationary whereas the noise is assumed to be quasistationary , i . e . , stationary over a longer interval than that of any speech phoneme , The BAF has a four subsystem structure . The system consists of an identifying subsystem that is followed by a speech/noise parameter-separator , The noise is identified based on the stationarity features of speech and noise . A feedforward subsystem sets optimization neighborhood to the virtual feedback subsystem where a cost-functional is minimized to jointly minimize the stationary part of the output while maximizing its nonstationary part , The system has been tested for performance for different signal to noise ratios (SNR) and for different types of noise parameters . Improvements for various noises range from 14-36 dB for -20 dB SNR inputs .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal prevents updating of noise energy estimates (adaptive filter) when a music signal is detected .
Blind Adaptive Filtering Of Speech From Noise Of Unknown Spectrum Using A Virtual Feedback Configuration . The paper describes a single-receiver blind adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) (BAF) of speech from noise where neither speech nor noise are accessible , nor are their parameters known . The only prior knowledge employed by the BAF is that human speech is nonstationary whereas the noise is assumed to be quasistationary , i . e . , stationary over a longer interval than that of any speech phoneme , The BAF has a four subsystem structure . The system consists of an identifying subsystem that is followed by a speech/noise parameter-separator , The noise is identified based on the stationarity features of speech and noise . A feedforward subsystem sets optimization neighborhood to the virtual feedback subsystem where a cost-functional is minimized to jointly minimize the stationary part of the output while maximizing its nonstationary part , The system has been tested for performance for different signal to noise ratios (SNR) and for different types of noise parameters . Improvements for various noises range from 14-36 dB for -20 dB SNR inputs .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates (adaptive filter) on the music signal .
Blind Adaptive Filtering Of Speech From Noise Of Unknown Spectrum Using A Virtual Feedback Configuration . The paper describes a single-receiver blind adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) (BAF) of speech from noise where neither speech nor noise are accessible , nor are their parameters known . The only prior knowledge employed by the BAF is that human speech is nonstationary whereas the noise is assumed to be quasistationary , i . e . , stationary over a longer interval than that of any speech phoneme , The BAF has a four subsystem structure . The system consists of an identifying subsystem that is followed by a speech/noise parameter-separator , The noise is identified based on the stationarity features of speech and noise . A feedforward subsystem sets optimization neighborhood to the virtual feedback subsystem where a cost-functional is minimized to jointly minimize the stationary part of the output while maximizing its nonstationary part , The system has been tested for performance for different signal to noise ratios (SNR) and for different types of noise parameters . Improvements for various noises range from 14-36 dB for -20 dB SNR inputs .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates (adaptive filter) is prevented in response to having simultaneously the activity prediction parameter larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
Blind Adaptive Filtering Of Speech From Noise Of Unknown Spectrum Using A Virtual Feedback Configuration . The paper describes a single-receiver blind adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) (BAF) of speech from noise where neither speech nor noise are accessible , nor are their parameters known . The only prior knowledge employed by the BAF is that human speech is nonstationary whereas the noise is assumed to be quasistationary , i . e . , stationary over a longer interval than that of any speech phoneme , The BAF has a four subsystem structure . The system consists of an identifying subsystem that is followed by a speech/noise parameter-separator , The noise is identified based on the stationarity features of speech and noise . A feedforward subsystem sets optimization neighborhood to the virtual feedback subsystem where a cost-functional is minimized to jointly minimize the stationary part of the output while maximizing its nonstationary part , The system has been tested for performance for different signal to noise ratios (SNR) and for different types of noise parameters . Improvements for various noises range from 14-36 dB for -20 dB SNR inputs .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates (adaptive filter) is prevented in response to having the noise character parameter inferior than a given fixed threshold .
Blind Adaptive Filtering Of Speech From Noise Of Unknown Spectrum Using A Virtual Feedback Configuration . The paper describes a single-receiver blind adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) (BAF) of speech from noise where neither speech nor noise are accessible , nor are their parameters known . The only prior knowledge employed by the BAF is that human speech is nonstationary whereas the noise is assumed to be quasistationary , i . e . , stationary over a longer interval than that of any speech phoneme , The BAF has a four subsystem structure . The system consists of an identifying subsystem that is followed by a speech/noise parameter-separator , The noise is identified based on the stationarity features of speech and noise . A feedforward subsystem sets optimization neighborhood to the virtual feedback subsystem where a cost-functional is minimized to jointly minimize the stationary part of the output while maximizing its nonstationary part , The system has been tested for performance for different signal to noise ratios (SNR) and for different types of noise parameters . Improvements for various noises range from 14-36 dB for -20 dB SNR inputs .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal to noise ratio (noise ratio) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
Blind Adaptive Filtering Of Speech From Noise Of Unknown Spectrum Using A Virtual Feedback Configuration . The paper describes a single-receiver blind adaptive filter (BAF) of speech from noise where neither speech nor noise are accessible , nor are their parameters known . The only prior knowledge employed by the BAF is that human speech is nonstationary whereas the noise is assumed to be quasistationary , i . e . , stationary over a longer interval than that of any speech phoneme , The BAF has a four subsystem structure . The system consists of an identifying subsystem that is followed by a speech/noise parameter-separator , The noise is identified based on the stationarity features of speech and noise . A feedforward subsystem sets optimization neighborhood to the virtual feedback subsystem where a cost-functional is minimized to jointly minimize the stationary part of the output while maximizing its nonstationary part , The system has been tested for performance for different signal to noise ratio (noise ratio, SNR LT, SNR calculation) s (SNR) and for different types of noise parameters . Improvements for various noises range from 14-36 dB for -20 dB SNR inputs .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates (adaptive filter) in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector .
Blind Adaptive Filtering Of Speech From Noise Of Unknown Spectrum Using A Virtual Feedback Configuration . The paper describes a single-receiver blind adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) (BAF) of speech from noise where neither speech nor noise are accessible , nor are their parameters known . The only prior knowledge employed by the BAF is that human speech is nonstationary whereas the noise is assumed to be quasistationary , i . e . , stationary over a longer interval than that of any speech phoneme , The BAF has a four subsystem structure . The system consists of an identifying subsystem that is followed by a speech/noise parameter-separator , The noise is identified based on the stationarity features of speech and noise . A feedforward subsystem sets optimization neighborhood to the virtual feedback subsystem where a cost-functional is minimized to jointly minimize the stationary part of the output while maximizing its nonstationary part , The system has been tested for performance for different signal to noise ratios (SNR) and for different types of noise parameters . Improvements for various noises range from 14-36 dB for -20 dB SNR inputs .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates (adaptive filter) .
Blind Adaptive Filtering Of Speech From Noise Of Unknown Spectrum Using A Virtual Feedback Configuration . The paper describes a single-receiver blind adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) (BAF) of speech from noise where neither speech nor noise are accessible , nor are their parameters known . The only prior knowledge employed by the BAF is that human speech is nonstationary whereas the noise is assumed to be quasistationary , i . e . , stationary over a longer interval than that of any speech phoneme , The BAF has a four subsystem structure . The system consists of an identifying subsystem that is followed by a speech/noise parameter-separator , The noise is identified based on the stationarity features of speech and noise . A feedforward subsystem sets optimization neighborhood to the virtual feedback subsystem where a cost-functional is minimized to jointly minimize the stationary part of the output while maximizing its nonstationary part , The system has been tested for performance for different signal to noise ratios (SNR) and for different types of noise parameters . Improvements for various noises range from 14-36 dB for -20 dB SNR inputs .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI. : 1875-1878 2000

Publication Year: 2000

Quantile Based Noise Estimation For Spectral Subtraction And Wiener Filtering

Philips Research Lab, Aachen, Germany

Stahl, Fischer, Bippus, Ieee, Ieee, Ieee
US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (noise estimation) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
Quantile Based Noise Estimation For Spectral Subtraction And Wiener Filtering . Elimination of additive noise from a speech signal is a fundamental problem in audio signal processing . In this paper we restrict our considerations to the case where only a single microphone recording of the noisy signal is available . The algorithms which we investigate proceed in two steps : First , the noise power spectrum is estimated . A method based on temporal quantiles in the power spectral domain is proposed and compared with pause detection and recursive averaging . The second step is to eliminate the estimated noise from the observed signal by spectral subtraction or Wiener filtering . The database used in the experiments comprises 6034 utterances of German digits and digit strings by 770 speakers in 10 different cars . Without noise reduction , we obtain an error rate of 11 . 7% . Quantile based noise estimation (frequency bins) and Wiener filtering reduce the error rate to 8 . 6% . Similar improvements are achieved in an experiment with artificial , non-stationary noise .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (noise estimation) so as to produce a summed long-term correlation map .
Quantile Based Noise Estimation For Spectral Subtraction And Wiener Filtering . Elimination of additive noise from a speech signal is a fundamental problem in audio signal processing . In this paper we restrict our considerations to the case where only a single microphone recording of the noisy signal is available . The algorithms which we investigate proceed in two steps : First , the noise power spectrum is estimated . A method based on temporal quantiles in the power spectral domain is proposed and compared with pause detection and recursive averaging . The second step is to eliminate the estimated noise from the observed signal by spectral subtraction or Wiener filtering . The database used in the experiments comprises 6034 utterances of German digits and digit strings by 770 speakers in 10 different cars . Without noise reduction , we obtain an error rate of 11 . 7% . Quantile based noise estimation (frequency bins) and Wiener filtering reduce the error rate to 8 . 6% . Similar improvements are achieved in an experiment with artificial , non-stationary noise .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map for frequency bins (noise estimation) having a magnitude that exceeds a given fixed threshold .
Quantile Based Noise Estimation For Spectral Subtraction And Wiener Filtering . Elimination of additive noise from a speech signal is a fundamental problem in audio signal processing . In this paper we restrict our considerations to the case where only a single microphone recording of the noisy signal is available . The algorithms which we investigate proceed in two steps : First , the noise power spectrum is estimated . A method based on temporal quantiles in the power spectral domain is proposed and compared with pause detection and recursive averaging . The second step is to eliminate the estimated noise from the observed signal by spectral subtraction or Wiener filtering . The database used in the experiments comprises 6034 utterances of German digits and digit strings by 770 speakers in 10 different cars . Without noise reduction , we obtain an error rate of 11 . 7% . Quantile based noise estimation (frequency bins) and Wiener filtering reduce the error rate to 8 . 6% . Similar improvements are achieved in an experiment with artificial , non-stationary noise .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (noise reduction) indicative of sound activity in the sound signal .
Quantile Based Noise Estimation For Spectral Subtraction And Wiener Filtering . Elimination of additive noise from a speech signal is a fundamental problem in audio signal processing . In this paper we restrict our considerations to the case where only a single microphone recording of the noisy signal is available . The algorithms which we investigate proceed in two steps : First , the noise power spectrum is estimated . A method based on temporal quantiles in the power spectral domain is proposed and compared with pause detection and recursive averaging . The second step is to eliminate the estimated noise from the observed signal by spectral subtraction or Wiener filtering . The database used in the experiments comprises 6034 utterances of German digits and digit strings by 770 speakers in 10 different cars . Without noise reduction (adaptive threshold) , we obtain an error rate of 11 . 7% . Quantile based noise estimation and Wiener filtering reduce the error rate to 8 . 6% . Similar improvements are achieved in an experiment with artificial , non-stationary noise .

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (spectral domain) ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
Quantile Based Noise Estimation For Spectral Subtraction And Wiener Filtering . Elimination of additive noise from a speech signal is a fundamental problem in audio signal processing . In this paper we restrict our considerations to the case where only a single microphone recording of the noisy signal is available . The algorithms which we investigate proceed in two steps : First , the noise power spectrum is estimated . A method based on temporal quantiles in the power spectral domain (background noise signal, frequency dependent signal) is proposed and compared with pause detection and recursive averaging . The second step is to eliminate the estimated noise from the observed signal by spectral subtraction or Wiener filtering . The database used in the experiments comprises 6034 utterances of German digits and digit strings by 770 speakers in 10 different cars . Without noise reduction , we obtain an error rate of 11 . 7% . Quantile based noise estimation and Wiener filtering reduce the error rate to 8 . 6% . Similar improvements are achieved in an experiment with artificial , non-stationary noise .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (noise power) .
Quantile Based Noise Estimation For Spectral Subtraction And Wiener Filtering . Elimination of additive noise from a speech signal is a fundamental problem in audio signal processing . In this paper we restrict our considerations to the case where only a single microphone recording of the noisy signal is available . The algorithms which we investigate proceed in two steps : First , the noise power (SNR LT, SNR calculation) spectrum is estimated . A method based on temporal quantiles in the power spectral domain is proposed and compared with pause detection and recursive averaging . The second step is to eliminate the estimated noise from the observed signal by spectral subtraction or Wiener filtering . The database used in the experiments comprises 6034 utterances of German digits and digit strings by 770 speakers in 10 different cars . Without noise reduction , we obtain an error rate of 11 . 7% . Quantile based noise estimation and Wiener filtering reduce the error rate to 8 . 6% . Similar improvements are achieved in an experiment with artificial , non-stationary noise .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (speech signal) in order to distinguish a music signal from a background noise signal (spectral domain) and prevent update of noise energy estimates on the music signal .
Quantile Based Noise Estimation For Spectral Subtraction And Wiener Filtering . Elimination of additive noise from a speech signal (noise character parameter, activity prediction parameter) is a fundamental problem in audio signal processing . In this paper we restrict our considerations to the case where only a single microphone recording of the noisy signal is available . The algorithms which we investigate proceed in two steps : First , the noise power spectrum is estimated . A method based on temporal quantiles in the power spectral domain (background noise signal, frequency dependent signal) is proposed and compared with pause detection and recursive averaging . The second step is to eliminate the estimated noise from the observed signal by spectral subtraction or Wiener filtering . The database used in the experiments comprises 6034 utterances of German digits and digit strings by 770 speakers in 10 different cars . Without noise reduction , we obtain an error rate of 11 . 7% . Quantile based noise estimation and Wiener filtering reduce the error rate to 8 . 6% . Similar improvements are achieved in an experiment with artificial , non-stationary noise .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame energy (audio signal processing) and an average frame energy (audio signal processing) .
Quantile Based Noise Estimation For Spectral Subtraction And Wiener Filtering . Elimination of additive noise from a speech signal is a fundamental problem in audio signal processing (current frame energy, average frame energy) . In this paper we restrict our considerations to the case where only a single microphone recording of the noisy signal is available . The algorithms which we investigate proceed in two steps : First , the noise power spectrum is estimated . A method based on temporal quantiles in the power spectral domain is proposed and compared with pause detection and recursive averaging . The second step is to eliminate the estimated noise from the observed signal by spectral subtraction or Wiener filtering . The database used in the experiments comprises 6034 utterances of German digits and digit strings by 770 speakers in 10 different cars . Without noise reduction , we obtain an error rate of 11 . 7% . Quantile based noise estimation and Wiener filtering reduce the error rate to 8 . 6% . Similar improvements are achieved in an experiment with artificial , non-stationary noise .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (speech signal) indicative of an activity of the sound signal .
Quantile Based Noise Estimation For Spectral Subtraction And Wiener Filtering . Elimination of additive noise from a speech signal (noise character parameter, activity prediction parameter) is a fundamental problem in audio signal processing . In this paper we restrict our considerations to the case where only a single microphone recording of the noisy signal is available . The algorithms which we investigate proceed in two steps : First , the noise power spectrum is estimated . A method based on temporal quantiles in the power spectral domain is proposed and compared with pause detection and recursive averaging . The second step is to eliminate the estimated noise from the observed signal by spectral subtraction or Wiener filtering . The database used in the experiments comprises 6034 utterances of German digits and digit strings by 770 speakers in 10 different cars . Without noise reduction , we obtain an error rate of 11 . 7% . Quantile based noise estimation and Wiener filtering reduce the error rate to 8 . 6% . Similar improvements are achieved in an experiment with artificial , non-stationary noise .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (speech signal) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
Quantile Based Noise Estimation For Spectral Subtraction And Wiener Filtering . Elimination of additive noise from a speech signal (noise character parameter, activity prediction parameter) is a fundamental problem in audio signal processing . In this paper we restrict our considerations to the case where only a single microphone recording of the noisy signal is available . The algorithms which we investigate proceed in two steps : First , the noise power spectrum is estimated . A method based on temporal quantiles in the power spectral domain is proposed and compared with pause detection and recursive averaging . The second step is to eliminate the estimated noise from the observed signal by spectral subtraction or Wiener filtering . The database used in the experiments comprises 6034 utterances of German digits and digit strings by 770 speakers in 10 different cars . Without noise reduction , we obtain an error rate of 11 . 7% . Quantile based noise estimation and Wiener filtering reduce the error rate to 8 . 6% . Similar improvements are achieved in an experiment with artificial , non-stationary noise .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (speech signal) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
Quantile Based Noise Estimation For Spectral Subtraction And Wiener Filtering . Elimination of additive noise from a speech signal (noise character parameter, activity prediction parameter) is a fundamental problem in audio signal processing . In this paper we restrict our considerations to the case where only a single microphone recording of the noisy signal is available . The algorithms which we investigate proceed in two steps : First , the noise power spectrum is estimated . A method based on temporal quantiles in the power spectral domain is proposed and compared with pause detection and recursive averaging . The second step is to eliminate the estimated noise from the observed signal by spectral subtraction or Wiener filtering . The database used in the experiments comprises 6034 utterances of German digits and digit strings by 770 speakers in 10 different cars . Without noise reduction , we obtain an error rate of 11 . 7% . Quantile based noise estimation and Wiener filtering reduce the error rate to 8 . 6% . Similar improvements are achieved in an experiment with artificial , non-stationary noise .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (speech signal) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
Quantile Based Noise Estimation For Spectral Subtraction And Wiener Filtering . Elimination of additive noise from a speech signal (noise character parameter, activity prediction parameter) is a fundamental problem in audio signal processing . In this paper we restrict our considerations to the case where only a single microphone recording of the noisy signal is available . The algorithms which we investigate proceed in two steps : First , the noise power spectrum is estimated . A method based on temporal quantiles in the power spectral domain is proposed and compared with pause detection and recursive averaging . The second step is to eliminate the estimated noise from the observed signal by spectral subtraction or Wiener filtering . The database used in the experiments comprises 6034 utterances of German digits and digit strings by 770 speakers in 10 different cars . Without noise reduction , we obtain an error rate of 11 . 7% . Quantile based noise estimation and Wiener filtering reduce the error rate to 8 . 6% . Similar improvements are achieved in an experiment with artificial , non-stationary noise .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (speech signal) inferior than a given fixed threshold .
Quantile Based Noise Estimation For Spectral Subtraction And Wiener Filtering . Elimination of additive noise from a speech signal (noise character parameter, activity prediction parameter) is a fundamental problem in audio signal processing . In this paper we restrict our considerations to the case where only a single microphone recording of the noisy signal is available . The algorithms which we investigate proceed in two steps : First , the noise power spectrum is estimated . A method based on temporal quantiles in the power spectral domain is proposed and compared with pause detection and recursive averaging . The second step is to eliminate the estimated noise from the observed signal by spectral subtraction or Wiener filtering . The database used in the experiments comprises 6034 utterances of German digits and digit strings by 770 speakers in 10 different cars . Without noise reduction , we obtain an error rate of 11 . 7% . Quantile based noise estimation and Wiener filtering reduce the error rate to 8 . 6% . Similar improvements are achieved in an experiment with artificial , non-stationary noise .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (noise estimation) so as to produce a summed long-term correlation map .
Quantile Based Noise Estimation For Spectral Subtraction And Wiener Filtering . Elimination of additive noise from a speech signal is a fundamental problem in audio signal processing . In this paper we restrict our considerations to the case where only a single microphone recording of the noisy signal is available . The algorithms which we investigate proceed in two steps : First , the noise power spectrum is estimated . A method based on temporal quantiles in the power spectral domain is proposed and compared with pause detection and recursive averaging . The second step is to eliminate the estimated noise from the observed signal by spectral subtraction or Wiener filtering . The database used in the experiments comprises 6034 utterances of German digits and digit strings by 770 speakers in 10 different cars . Without noise reduction , we obtain an error rate of 11 . 7% . Quantile based noise estimation (frequency bins) and Wiener filtering reduce the error rate to 8 . 6% . Similar improvements are achieved in an experiment with artificial , non-stationary noise .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (spectral domain) ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
Quantile Based Noise Estimation For Spectral Subtraction And Wiener Filtering . Elimination of additive noise from a speech signal is a fundamental problem in audio signal processing . In this paper we restrict our considerations to the case where only a single microphone recording of the noisy signal is available . The algorithms which we investigate proceed in two steps : First , the noise power spectrum is estimated . A method based on temporal quantiles in the power spectral domain (background noise signal, frequency dependent signal) is proposed and compared with pause detection and recursive averaging . The second step is to eliminate the estimated noise from the observed signal by spectral subtraction or Wiener filtering . The database used in the experiments comprises 6034 utterances of German digits and digit strings by 770 speakers in 10 different cars . Without noise reduction , we obtain an error rate of 11 . 7% . Quantile based noise estimation and Wiener filtering reduce the error rate to 8 . 6% . Similar improvements are achieved in an experiment with artificial , non-stationary noise .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal (spectral domain) ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
Quantile Based Noise Estimation For Spectral Subtraction And Wiener Filtering . Elimination of additive noise from a speech signal is a fundamental problem in audio signal processing . In this paper we restrict our considerations to the case where only a single microphone recording of the noisy signal is available . The algorithms which we investigate proceed in two steps : First , the noise power spectrum is estimated . A method based on temporal quantiles in the power spectral domain (background noise signal, frequency dependent signal) is proposed and compared with pause detection and recursive averaging . The second step is to eliminate the estimated noise from the observed signal by spectral subtraction or Wiener filtering . The database used in the experiments comprises 6034 utterances of German digits and digit strings by 770 speakers in 10 different cars . Without noise reduction , we obtain an error rate of 11 . 7% . Quantile based noise estimation and Wiener filtering reduce the error rate to 8 . 6% . Similar improvements are achieved in an experiment with artificial , non-stationary noise .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal (spectral domain) and preventing update of noise energy estimates .
Quantile Based Noise Estimation For Spectral Subtraction And Wiener Filtering . Elimination of additive noise from a speech signal is a fundamental problem in audio signal processing . In this paper we restrict our considerations to the case where only a single microphone recording of the noisy signal is available . The algorithms which we investigate proceed in two steps : First , the noise power spectrum is estimated . A method based on temporal quantiles in the power spectral domain (background noise signal, frequency dependent signal) is proposed and compared with pause detection and recursive averaging . The second step is to eliminate the estimated noise from the observed signal by spectral subtraction or Wiener filtering . The database used in the experiments comprises 6034 utterances of German digits and digit strings by 770 speakers in 10 different cars . Without noise reduction , we obtain an error rate of 11 . 7% . Quantile based noise estimation and Wiener filtering reduce the error rate to 8 . 6% . Similar improvements are achieved in an experiment with artificial , non-stationary noise .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
JOURNAL OF THE AUDIO ENGINEERING SOCIETY. 47 (4): 240-251 APR 1999

Publication Year: 1999

Implementation Of A New Algorithm Using The STFT With Variable Frequency Resolution For The Time-frequency Auditory Model

Korea Advanced Institute of Science and Technology (KAIST, 한국과학기술원)

Jeong, Ih
US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin (fast Fourier transform) by frequency bin basis ;

and summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
Implementation Of A New Algorithm Using The STFT With Variable Frequency Resolution For The Time-frequency Auditory Model . A signal processing technique is proposed for the time-frequency analysis of unsteady sound signals considering the auditory perception model and is called VFR-STFT (short-time Fourier transform with variable frequency resolution) . Conventional STFT , which is commonly used for the spectral analysis of unsteady sounds , is not suitable for the auditory model because the frequency resolution of the spectral analysis within the hearing system is not constant but varies with frequency . The frequency resolution of the VFR-STFT can be adjusted to a number of analyzed frequency ranges by introducing the downsampling technique . With the VFR-STFT , calculation schemes are presented for minimizing undesirable effects , such as the distortion of the overall sound level due to nonoverlapping of the analysis windows and the impairment of partial spectra due to the finite order of antialiasing filters . In addition a procedure for equalizing time grids at all frequency ranges is included in order to describe the two-dimensional time-frequency map (TFM) having different time grids . The proposed VFR-STFT is applied to the spectral analysis of the extraction of tonal components in an unsteady sound . The results are compared to those from other time-frequency analysis methods such as STFT , VFR-FFT (fast Fourier transform (frequency bin) with variable frequency resolution) , and the wavelet packet method .

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (frequency analysis) from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
Implementation Of A New Algorithm Using The STFT With Variable Frequency Resolution For The Time-frequency Auditory Model . A signal processing technique is proposed for the time-frequency analysis (music signal) of unsteady sound signals considering the auditory perception model and is called VFR-STFT (short-time Fourier transform with variable frequency resolution) . Conventional STFT , which is commonly used for the spectral analysis of unsteady sounds , is not suitable for the auditory model because the frequency resolution of the spectral analysis within the hearing system is not constant but varies with frequency . The frequency resolution of the VFR-STFT can be adjusted to a number of analyzed frequency ranges by introducing the downsampling technique . With the VFR-STFT , calculation schemes are presented for minimizing undesirable effects , such as the distortion of the overall sound level due to nonoverlapping of the analysis windows and the impairment of partial spectra due to the finite order of antialiasing filters . In addition a procedure for equalizing time grids at all frequency ranges is included in order to describe the two-dimensional time-frequency map (TFM) having different time grids . The proposed VFR-STFT is applied to the spectral analysis of the extraction of tonal components in an unsteady sound . The results are compared to those from other time-frequency analysis methods such as STFT , VFR-FFT (fast Fourier transform with variable frequency resolution) , and the wavelet packet method .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order (analysis method) and a sixteenth order of linear prediction residual error energies (sound signals) .
Implementation Of A New Algorithm Using The STFT With Variable Frequency Resolution For The Time-frequency Auditory Model . A signal processing technique is proposed for the time-frequency analysis of unsteady sound signals (linear prediction residual error energies) considering the auditory perception model and is called VFR-STFT (short-time Fourier transform with variable frequency resolution) . Conventional STFT , which is commonly used for the spectral analysis of unsteady sounds , is not suitable for the auditory model because the frequency resolution of the spectral analysis within the hearing system is not constant but varies with frequency . The frequency resolution of the VFR-STFT can be adjusted to a number of analyzed frequency ranges by introducing the downsampling technique . With the VFR-STFT , calculation schemes are presented for minimizing undesirable effects , such as the distortion of the overall sound level due to nonoverlapping of the analysis windows and the impairment of partial spectra due to the finite order of antialiasing filters . In addition a procedure for equalizing time grids at all frequency ranges is included in order to describe the two-dimensional time-frequency map (TFM) having different time grids . The proposed VFR-STFT is applied to the spectral analysis of the extraction of tonal components in an unsteady sound . The results are compared to those from other time-frequency analysis method (second order) s such as STFT , VFR-FFT (fast Fourier transform with variable frequency resolution) , and the wavelet packet method .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal prevents updating of noise energy estimates when a music signal (frequency analysis) is detected .
Implementation Of A New Algorithm Using The STFT With Variable Frequency Resolution For The Time-frequency Auditory Model . A signal processing technique is proposed for the time-frequency analysis (music signal) of unsteady sound signals considering the auditory perception model and is called VFR-STFT (short-time Fourier transform with variable frequency resolution) . Conventional STFT , which is commonly used for the spectral analysis of unsteady sounds , is not suitable for the auditory model because the frequency resolution of the spectral analysis within the hearing system is not constant but varies with frequency . The frequency resolution of the VFR-STFT can be adjusted to a number of analyzed frequency ranges by introducing the downsampling technique . With the VFR-STFT , calculation schemes are presented for minimizing undesirable effects , such as the distortion of the overall sound level due to nonoverlapping of the analysis windows and the impairment of partial spectra due to the finite order of antialiasing filters . In addition a procedure for equalizing time grids at all frequency ranges is included in order to describe the two-dimensional time-frequency map (TFM) having different time grids . The proposed VFR-STFT is applied to the spectral analysis of the extraction of tonal components in an unsteady sound . The results are compared to those from other time-frequency analysis methods such as STFT , VFR-FFT (fast Fourier transform with variable frequency resolution) , and the wavelet packet method .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal (frequency analysis) from a background noise signal and prevent update of noise energy estimates on the music signal .
Implementation Of A New Algorithm Using The STFT With Variable Frequency Resolution For The Time-frequency Auditory Model . A signal processing technique is proposed for the time-frequency analysis (music signal) of unsteady sound signals considering the auditory perception model and is called VFR-STFT (short-time Fourier transform with variable frequency resolution) . Conventional STFT , which is commonly used for the spectral analysis of unsteady sounds , is not suitable for the auditory model because the frequency resolution of the spectral analysis within the hearing system is not constant but varies with frequency . The frequency resolution of the VFR-STFT can be adjusted to a number of analyzed frequency ranges by introducing the downsampling technique . With the VFR-STFT , calculation schemes are presented for minimizing undesirable effects , such as the distortion of the overall sound level due to nonoverlapping of the analysis windows and the impairment of partial spectra due to the finite order of antialiasing filters . In addition a procedure for equalizing time grids at all frequency ranges is included in order to describe the two-dimensional time-frequency map (TFM) having different time grids . The proposed VFR-STFT is applied to the spectral analysis of the extraction of tonal components in an unsteady sound . The results are compared to those from other time-frequency analysis methods such as STFT , VFR-FFT (fast Fourier transform with variable frequency resolution) , and the wavelet packet method .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin (fast Fourier transform) by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
Implementation Of A New Algorithm Using The STFT With Variable Frequency Resolution For The Time-frequency Auditory Model . A signal processing technique is proposed for the time-frequency analysis of unsteady sound signals considering the auditory perception model and is called VFR-STFT (short-time Fourier transform with variable frequency resolution) . Conventional STFT , which is commonly used for the spectral analysis of unsteady sounds , is not suitable for the auditory model because the frequency resolution of the spectral analysis within the hearing system is not constant but varies with frequency . The frequency resolution of the VFR-STFT can be adjusted to a number of analyzed frequency ranges by introducing the downsampling technique . With the VFR-STFT , calculation schemes are presented for minimizing undesirable effects , such as the distortion of the overall sound level due to nonoverlapping of the analysis windows and the impairment of partial spectra due to the finite order of antialiasing filters . In addition a procedure for equalizing time grids at all frequency ranges is included in order to describe the two-dimensional time-frequency map (TFM) having different time grids . The proposed VFR-STFT is applied to the spectral analysis of the extraction of tonal components in an unsteady sound . The results are compared to those from other time-frequency analysis methods such as STFT , VFR-FFT (fast Fourier transform (frequency bin) with variable frequency resolution) , and the wavelet packet method .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (frequency analysis) from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
Implementation Of A New Algorithm Using The STFT With Variable Frequency Resolution For The Time-frequency Auditory Model . A signal processing technique is proposed for the time-frequency analysis (music signal) of unsteady sound signals considering the auditory perception model and is called VFR-STFT (short-time Fourier transform with variable frequency resolution) . Conventional STFT , which is commonly used for the spectral analysis of unsteady sounds , is not suitable for the auditory model because the frequency resolution of the spectral analysis within the hearing system is not constant but varies with frequency . The frequency resolution of the VFR-STFT can be adjusted to a number of analyzed frequency ranges by introducing the downsampling technique . With the VFR-STFT , calculation schemes are presented for minimizing undesirable effects , such as the distortion of the overall sound level due to nonoverlapping of the analysis windows and the impairment of partial spectra due to the finite order of antialiasing filters . In addition a procedure for equalizing time grids at all frequency ranges is included in order to describe the two-dimensional time-frequency map (TFM) having different time grids . The proposed VFR-STFT is applied to the spectral analysis of the extraction of tonal components in an unsteady sound . The results are compared to those from other time-frequency analysis methods such as STFT , VFR-FFT (fast Fourier transform with variable frequency resolution) , and the wavelet packet method .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal (frequency analysis) from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
Implementation Of A New Algorithm Using The STFT With Variable Frequency Resolution For The Time-frequency Auditory Model . A signal processing technique is proposed for the time-frequency analysis (music signal) of unsteady sound signals considering the auditory perception model and is called VFR-STFT (short-time Fourier transform with variable frequency resolution) . Conventional STFT , which is commonly used for the spectral analysis of unsteady sounds , is not suitable for the auditory model because the frequency resolution of the spectral analysis within the hearing system is not constant but varies with frequency . The frequency resolution of the VFR-STFT can be adjusted to a number of analyzed frequency ranges by introducing the downsampling technique . With the VFR-STFT , calculation schemes are presented for minimizing undesirable effects , such as the distortion of the overall sound level due to nonoverlapping of the analysis windows and the impairment of partial spectra due to the finite order of antialiasing filters . In addition a procedure for equalizing time grids at all frequency ranges is included in order to describe the two-dimensional time-frequency map (TFM) having different time grids . The proposed VFR-STFT is applied to the spectral analysis of the extraction of tonal components in an unsteady sound . The results are compared to those from other time-frequency analysis methods such as STFT , VFR-FFT (fast Fourier transform with variable frequency resolution) , and the wavelet packet method .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal (frequency analysis) from a background noise signal and preventing update of noise energy estimates .
Implementation Of A New Algorithm Using The STFT With Variable Frequency Resolution For The Time-frequency Auditory Model . A signal processing technique is proposed for the time-frequency analysis (music signal) of unsteady sound signals considering the auditory perception model and is called VFR-STFT (short-time Fourier transform with variable frequency resolution) . Conventional STFT , which is commonly used for the spectral analysis of unsteady sounds , is not suitable for the auditory model because the frequency resolution of the spectral analysis within the hearing system is not constant but varies with frequency . The frequency resolution of the VFR-STFT can be adjusted to a number of analyzed frequency ranges by introducing the downsampling technique . With the VFR-STFT , calculation schemes are presented for minimizing undesirable effects , such as the distortion of the overall sound level due to nonoverlapping of the analysis windows and the impairment of partial spectra due to the finite order of antialiasing filters . In addition a procedure for equalizing time grids at all frequency ranges is included in order to describe the two-dimensional time-frequency map (TFM) having different time grids . The proposed VFR-STFT is applied to the spectral analysis of the extraction of tonal components in an unsteady sound . The results are compared to those from other time-frequency analysis methods such as STFT , VFR-FFT (fast Fourier transform with variable frequency resolution) , and the wavelet packet method .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6. : 1001-1004 1998

Publication Year: 1998

Noise Reduction By Paired-microphones Using Spectral Subtraction

Japan Advanced Institute of Science and Technology (JAIST 北陸先端科学技術大学院大学)

Mizumachi, Akagi, Ieee
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (microphone array) using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal (microphone array) in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (microphone array) .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (microphone array) comprises searching in the correlation map for frequency bins having a magnitude that exceeds a given fixed threshold .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (microphone array) comprises comparing the summed long-term correlation map with an adaptive threshold (noise reduction) indicative of sound activity (microphone array) in the sound signal .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction (adaptive threshold) by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 10
. A method for detecting sound activity (microphone array) in a sound signal (microphone array) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound signal (microphone array) is detected .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity (microphone array) in the sound signal (microphone array) further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (microphone array) detection comprises detecting the sound signal (microphone array) based on a frequency dependent signal-to-noise ratio (SNR) .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 14
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (microphone array) detection comprises comparing an average signal-to-noise ratio (SNR av ) to a threshold calculated as a function of a long-term signal-to-noise ratio (SNR LT ) .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity (microphone array) detection in the sound signal (microphone array) further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity (microphone array) detection further comprises updating the noise estimates for a next frame .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal (microphone array) and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 18
. A method as defined in claim 14 , comprising classifying the sound signal (microphone array) as one of an inactive sound signal and active sound signal , which comprises determining an inactive sound signal when the average signal-to-noise ratio (SNR av ) is inferior to the calculated threshold .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 19
. A method as defined in claim 14 , comprising classifying the sound signal (microphone array) as one of an inactive sound signal and active sound signal , which comprises determining an active sound signal when the average signal-to-noise ratio (SNR av ) is larger than the calculated threshold .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal (microphone array) prevents updating of noise energy estimates when a music signal is detected .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (speech signal) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array and subtracts them from the noisy speech signal (noise character parameter, activity prediction parameter) using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal (microphone array) in a current frame and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (speech signal) indicative of an activity of the sound signal (microphone array) .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal (noise character parameter, activity prediction parameter) using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (speech signal) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal (microphone array) and the complementary non-stationarity parameter .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal (noise character parameter, activity prediction parameter) using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (speech signal) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array and subtracts them from the noisy speech signal (noise character parameter, activity prediction parameter) using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (speech signal) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array and subtracts them from the noisy speech signal (noise character parameter, activity prediction parameter) using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (speech signal) inferior than a given fixed threshold .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array and subtracts them from the noisy speech signal (noise character parameter, activity prediction parameter) using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal (microphone array) using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal (microphone array) using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal (microphone array) in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 34
. A device as defined in claim 31 , further comprising a detector of strong tones in the sound signal (microphone array) .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 35
. A device for detecting sound activity (microphone array) in a sound signal (microphone array) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 36
. A device for detecting sound activity (microphone array) in a sound signal (microphone array) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 37
. A device as defined in claim 36 , further comprising a signal-to-noise ratio (SNR)-based sound activity (microphone array) detector .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity (microphone array) detector comprises a comparator of an average signal to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity (microphone array) detector .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal (microphone array) for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .

US8990073B2
CLAIM 41
. A device as defined in claim 36 , further comprising a calculator of a spectral parameter used for detecting spectral changes and spectral attacks in the sound signal (microphone array) .
Noise Reduction By Paired-microphones Using Spectral Subtraction . This paper proposes a method of noise reduction by paired microphones as a front-end processor for speech recognition systems . This method estimates noises using a subtractive microphone array (sound signal, sound activity, sound activity detector) and subtracts them from the noisy speech signal using the Spectral Subtraction (SS) . Since this method can estimate noises analytically and frame by frame , it is easy to estimate noises not depending on these acoustic properties . Therefore , this method can also reduce non stationary noises , for example sudden noises when a door has just closed , which can not be reduced by other SS methods . The results of computer simulations and experiments in a real environment show that this method can reduce LPC log spectral envelope distortions .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE. 89 (2): 43-53 2006

Publication Year: 2006

Noise Suppression With High Speech Quality Based On Weighted Noise Estimation And MMSE STSA

NEC Media & Information Research Labs

Kato, Sugiyama, Serizawa
US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (noise estimation, spectral gain) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
Noise Suppression With High Speech Quality Based On Weighted Noise Estimation And MMSE STSA . This paper proposes a high speech quality noise suppression method based on weighted noise estimation (frequency bins) and MMSE STSA . The proposed method continuously updates the noise estimate , using weighted noisy speech according to the estimated speech-to-noise ratio . In order to fully utilize the improvement offered by noise estimation , the spectral gain (frequency bins) is corrected according to the estimated speech-to-noise ratio . By using accurate noise estimation , more accurate SNR than in the conventional method is obtained , which helps to reduce distortion in the enhanced speech . In subjective speech quality evaluations , the five-stage MOS was improved by 0 . 35 and 0 . 40 at the maximum , respectively , for the cases in which the speech was encoded and was not encoded after noise suppression . The improved version , which was developed on the basis of the proposed noise suppressor , satisfies all 3GPP minimum requirements for speech quality and has been installed in a commercially available model . (c) 2005 Wiley Periodicals . Inc .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (noise estimation, spectral gain) so as to produce a summed long-term correlation map .
Noise Suppression With High Speech Quality Based On Weighted Noise Estimation And MMSE STSA . This paper proposes a high speech quality noise suppression method based on weighted noise estimation (frequency bins) and MMSE STSA . The proposed method continuously updates the noise estimate , using weighted noisy speech according to the estimated speech-to-noise ratio . In order to fully utilize the improvement offered by noise estimation , the spectral gain (frequency bins) is corrected according to the estimated speech-to-noise ratio . By using accurate noise estimation , more accurate SNR than in the conventional method is obtained , which helps to reduce distortion in the enhanced speech . In subjective speech quality evaluations , the five-stage MOS was improved by 0 . 35 and 0 . 40 at the maximum , respectively , for the cases in which the speech was encoded and was not encoded after noise suppression . The improved version , which was developed on the basis of the proposed noise suppressor , satisfies all 3GPP minimum requirements for speech quality and has been installed in a commercially available model . (c) 2005 Wiley Periodicals . Inc .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map for frequency bins (noise estimation, spectral gain) having a magnitude that exceeds a given fixed threshold .
Noise Suppression With High Speech Quality Based On Weighted Noise Estimation And MMSE STSA . This paper proposes a high speech quality noise suppression method based on weighted noise estimation (frequency bins) and MMSE STSA . The proposed method continuously updates the noise estimate , using weighted noisy speech according to the estimated speech-to-noise ratio . In order to fully utilize the improvement offered by noise estimation , the spectral gain (frequency bins) is corrected according to the estimated speech-to-noise ratio . By using accurate noise estimation , more accurate SNR than in the conventional method is obtained , which helps to reduce distortion in the enhanced speech . In subjective speech quality evaluations , the five-stage MOS was improved by 0 . 35 and 0 . 40 at the maximum , respectively , for the cases in which the speech was encoded and was not encoded after noise suppression . The improved version , which was developed on the basis of the proposed noise suppressor , satisfies all 3GPP minimum requirements for speech quality and has been installed in a commercially available model . (c) 2005 Wiley Periodicals . Inc .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates (proposed method) when a tonal sound signal is detected .
Noise Suppression With High Speech Quality Based On Weighted Noise Estimation And MMSE STSA . This paper proposes a high speech quality noise suppression method based on weighted noise estimation and MMSE STSA . The proposed method (noise energy estimates, second energy values) continuously updates the noise estimate , using weighted noisy speech according to the estimated speech-to-noise ratio . In order to fully utilize the improvement offered by noise estimation , the spectral gain is corrected according to the estimated speech-to-noise ratio . By using accurate noise estimation , more accurate SNR than in the conventional method is obtained , which helps to reduce distortion in the enhanced speech . In subjective speech quality evaluations , the five-stage MOS was improved by 0 . 35 and 0 . 40 at the maximum , respectively , for the cases in which the speech was encoded and was not encoded after noise suppression . The improved version , which was developed on the basis of the proposed noise suppressor , satisfies all 3GPP minimum requirements for speech quality and has been installed in a commercially available model . (c) 2005 Wiley Periodicals . Inc .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates (proposed method) calculated in a previous frame in a SNR calculation (noise ratio) .
Noise Suppression With High Speech Quality Based On Weighted Noise Estimation And MMSE STSA . This paper proposes a high speech quality noise suppression method based on weighted noise estimation and MMSE STSA . The proposed method (noise energy estimates, second energy values) continuously updates the noise estimate , using weighted noisy speech according to the estimated speech-to-noise ratio (noise ratio, SNR LT, SNR calculation) . In order to fully utilize the improvement offered by noise estimation , the spectral gain is corrected according to the estimated speech-to-noise ratio . By using accurate noise estimation , more accurate SNR than in the conventional method is obtained , which helps to reduce distortion in the enhanced speech . In subjective speech quality evaluations , the five-stage MOS was improved by 0 . 35 and 0 . 40 at the maximum , respectively , for the cases in which the speech was encoded and was not encoded after noise suppression . The improved version , which was developed on the basis of the proposed noise suppressor , satisfies all 3GPP minimum requirements for speech quality and has been installed in a commercially available model . (c) 2005 Wiley Periodicals . Inc .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates (proposed method) for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
Noise Suppression With High Speech Quality Based On Weighted Noise Estimation And MMSE STSA . This paper proposes a high speech quality noise suppression method based on weighted noise estimation and MMSE STSA . The proposed method (noise energy estimates, second energy values) continuously updates the noise estimate , using weighted noisy speech according to the estimated speech-to-noise ratio . In order to fully utilize the improvement offered by noise estimation , the spectral gain is corrected according to the estimated speech-to-noise ratio . By using accurate noise estimation , more accurate SNR than in the conventional method is obtained , which helps to reduce distortion in the enhanced speech . In subjective speech quality evaluations , the five-stage MOS was improved by 0 . 35 and 0 . 40 at the maximum , respectively , for the cases in which the speech was encoded and was not encoded after noise suppression . The improved version , which was developed on the basis of the proposed noise suppressor , satisfies all 3GPP minimum requirements for speech quality and has been installed in a commercially available model . (c) 2005 Wiley Periodicals . Inc .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal prevents updating of noise energy estimates (proposed method) when a music signal is detected .
Noise Suppression With High Speech Quality Based On Weighted Noise Estimation And MMSE STSA . This paper proposes a high speech quality noise suppression method based on weighted noise estimation and MMSE STSA . The proposed method (noise energy estimates, second energy values) continuously updates the noise estimate , using weighted noisy speech according to the estimated speech-to-noise ratio . In order to fully utilize the improvement offered by noise estimation , the spectral gain is corrected according to the estimated speech-to-noise ratio . By using accurate noise estimation , more accurate SNR than in the conventional method is obtained , which helps to reduce distortion in the enhanced speech . In subjective speech quality evaluations , the five-stage MOS was improved by 0 . 35 and 0 . 40 at the maximum , respectively , for the cases in which the speech was encoded and was not encoded after noise suppression . The improved version , which was developed on the basis of the proposed noise suppressor , satisfies all 3GPP minimum requirements for speech quality and has been installed in a commercially available model . (c) 2005 Wiley Periodicals . Inc .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates (proposed method) on the music signal .
Noise Suppression With High Speech Quality Based On Weighted Noise Estimation And MMSE STSA . This paper proposes a high speech quality noise suppression method based on weighted noise estimation and MMSE STSA . The proposed method (noise energy estimates, second energy values) continuously updates the noise estimate , using weighted noisy speech according to the estimated speech-to-noise ratio . In order to fully utilize the improvement offered by noise estimation , the spectral gain is corrected according to the estimated speech-to-noise ratio . By using accurate noise estimation , more accurate SNR than in the conventional method is obtained , which helps to reduce distortion in the enhanced speech . In subjective speech quality evaluations , the five-stage MOS was improved by 0 . 35 and 0 . 40 at the maximum , respectively , for the cases in which the speech was encoded and was not encoded after noise suppression . The improved version , which was developed on the basis of the proposed noise suppressor , satisfies all 3GPP minimum requirements for speech quality and has been installed in a commercially available model . (c) 2005 Wiley Periodicals . Inc .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame energy (noise suppressor) and an average frame energy .
Noise Suppression With High Speech Quality Based On Weighted Noise Estimation And MMSE STSA . This paper proposes a high speech quality noise suppression method based on weighted noise estimation and MMSE STSA . The proposed method continuously updates the noise estimate , using weighted noisy speech according to the estimated speech-to-noise ratio . In order to fully utilize the improvement offered by noise estimation , the spectral gain is corrected according to the estimated speech-to-noise ratio . By using accurate noise estimation , more accurate SNR than in the conventional method is obtained , which helps to reduce distortion in the enhanced speech . In subjective speech quality evaluations , the five-stage MOS was improved by 0 . 35 and 0 . 40 at the maximum , respectively , for the cases in which the speech was encoded and was not encoded after noise suppression . The improved version , which was developed on the basis of the proposed noise suppressor (current frame energy) , satisfies all 3GPP minimum requirements for speech quality and has been installed in a commercially available model . (c) 2005 Wiley Periodicals . Inc .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame and an energy of the sound signal in a previous frame , for frequency bands (quality evaluations) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
Noise Suppression With High Speech Quality Based On Weighted Noise Estimation And MMSE STSA . This paper proposes a high speech quality noise suppression method based on weighted noise estimation and MMSE STSA . The proposed method continuously updates the noise estimate , using weighted noisy speech according to the estimated speech-to-noise ratio . In order to fully utilize the improvement offered by noise estimation , the spectral gain is corrected according to the estimated speech-to-noise ratio . By using accurate noise estimation , more accurate SNR than in the conventional method is obtained , which helps to reduce distortion in the enhanced speech . In subjective speech quality evaluations (frequency bands, first frequency bands) , the five-stage MOS was improved by 0 . 35 and 0 . 40 at the maximum , respectively , for the cases in which the speech was encoded and was not encoded after noise suppression . The improved version , which was developed on the basis of the proposed noise suppressor , satisfies all 3GPP minimum requirements for speech quality and has been installed in a commercially available model . (c) 2005 Wiley Periodicals . Inc .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates (proposed method) is prevented in response to having simultaneously the activity prediction parameter larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
Noise Suppression With High Speech Quality Based On Weighted Noise Estimation And MMSE STSA . This paper proposes a high speech quality noise suppression method based on weighted noise estimation and MMSE STSA . The proposed method (noise energy estimates, second energy values) continuously updates the noise estimate , using weighted noisy speech according to the estimated speech-to-noise ratio . In order to fully utilize the improvement offered by noise estimation , the spectral gain is corrected according to the estimated speech-to-noise ratio . By using accurate noise estimation , more accurate SNR than in the conventional method is obtained , which helps to reduce distortion in the enhanced speech . In subjective speech quality evaluations , the five-stage MOS was improved by 0 . 35 and 0 . 40 at the maximum , respectively , for the cases in which the speech was encoded and was not encoded after noise suppression . The improved version , which was developed on the basis of the proposed noise suppressor , satisfies all 3GPP minimum requirements for speech quality and has been installed in a commercially available model . (c) 2005 Wiley Periodicals . Inc .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (quality evaluations) into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values (proposed method) so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
Noise Suppression With High Speech Quality Based On Weighted Noise Estimation And MMSE STSA . This paper proposes a high speech quality noise suppression method based on weighted noise estimation and MMSE STSA . The proposed method (noise energy estimates, second energy values) continuously updates the noise estimate , using weighted noisy speech according to the estimated speech-to-noise ratio . In order to fully utilize the improvement offered by noise estimation , the spectral gain is corrected according to the estimated speech-to-noise ratio . By using accurate noise estimation , more accurate SNR than in the conventional method is obtained , which helps to reduce distortion in the enhanced speech . In subjective speech quality evaluations (frequency bands, first frequency bands) , the five-stage MOS was improved by 0 . 35 and 0 . 40 at the maximum , respectively , for the cases in which the speech was encoded and was not encoded after noise suppression . The improved version , which was developed on the basis of the proposed noise suppressor , satisfies all 3GPP minimum requirements for speech quality and has been installed in a commercially available model . (c) 2005 Wiley Periodicals . Inc .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates (proposed method) is prevented in response to having the noise character parameter inferior than a given fixed threshold .
Noise Suppression With High Speech Quality Based On Weighted Noise Estimation And MMSE STSA . This paper proposes a high speech quality noise suppression method based on weighted noise estimation and MMSE STSA . The proposed method (noise energy estimates, second energy values) continuously updates the noise estimate , using weighted noisy speech according to the estimated speech-to-noise ratio . In order to fully utilize the improvement offered by noise estimation , the spectral gain is corrected according to the estimated speech-to-noise ratio . By using accurate noise estimation , more accurate SNR than in the conventional method is obtained , which helps to reduce distortion in the enhanced speech . In subjective speech quality evaluations , the five-stage MOS was improved by 0 . 35 and 0 . 40 at the maximum , respectively , for the cases in which the speech was encoded and was not encoded after noise suppression . The improved version , which was developed on the basis of the proposed noise suppressor , satisfies all 3GPP minimum requirements for speech quality and has been installed in a commercially available model . (c) 2005 Wiley Periodicals . Inc .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (noise estimation, spectral gain) so as to produce a summed long-term correlation map .
Noise Suppression With High Speech Quality Based On Weighted Noise Estimation And MMSE STSA . This paper proposes a high speech quality noise suppression method based on weighted noise estimation (frequency bins) and MMSE STSA . The proposed method continuously updates the noise estimate , using weighted noisy speech according to the estimated speech-to-noise ratio . In order to fully utilize the improvement offered by noise estimation , the spectral gain (frequency bins) is corrected according to the estimated speech-to-noise ratio . By using accurate noise estimation , more accurate SNR than in the conventional method is obtained , which helps to reduce distortion in the enhanced speech . In subjective speech quality evaluations , the five-stage MOS was improved by 0 . 35 and 0 . 40 at the maximum , respectively , for the cases in which the speech was encoded and was not encoded after noise suppression . The improved version , which was developed on the basis of the proposed noise suppressor , satisfies all 3GPP minimum requirements for speech quality and has been installed in a commercially available model . (c) 2005 Wiley Periodicals . Inc .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal to noise ratio (noise ratio) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
Noise Suppression With High Speech Quality Based On Weighted Noise Estimation And MMSE STSA . This paper proposes a high speech quality noise suppression method based on weighted noise estimation and MMSE STSA . The proposed method continuously updates the noise estimate , using weighted noisy speech according to the estimated speech-to-noise ratio (noise ratio, SNR LT, SNR calculation) . In order to fully utilize the improvement offered by noise estimation , the spectral gain is corrected according to the estimated speech-to-noise ratio . By using accurate noise estimation , more accurate SNR than in the conventional method is obtained , which helps to reduce distortion in the enhanced speech . In subjective speech quality evaluations , the five-stage MOS was improved by 0 . 35 and 0 . 40 at the maximum , respectively , for the cases in which the speech was encoded and was not encoded after noise suppression . The improved version , which was developed on the basis of the proposed noise suppressor , satisfies all 3GPP minimum requirements for speech quality and has been installed in a commercially available model . (c) 2005 Wiley Periodicals . Inc .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates (proposed method) in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector .
Noise Suppression With High Speech Quality Based On Weighted Noise Estimation And MMSE STSA . This paper proposes a high speech quality noise suppression method based on weighted noise estimation and MMSE STSA . The proposed method (noise energy estimates, second energy values) continuously updates the noise estimate , using weighted noisy speech according to the estimated speech-to-noise ratio . In order to fully utilize the improvement offered by noise estimation , the spectral gain is corrected according to the estimated speech-to-noise ratio . By using accurate noise estimation , more accurate SNR than in the conventional method is obtained , which helps to reduce distortion in the enhanced speech . In subjective speech quality evaluations , the five-stage MOS was improved by 0 . 35 and 0 . 40 at the maximum , respectively , for the cases in which the speech was encoded and was not encoded after noise suppression . The improved version , which was developed on the basis of the proposed noise suppressor , satisfies all 3GPP minimum requirements for speech quality and has been installed in a commercially available model . (c) 2005 Wiley Periodicals . Inc .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates (proposed method) .
Noise Suppression With High Speech Quality Based On Weighted Noise Estimation And MMSE STSA . This paper proposes a high speech quality noise suppression method based on weighted noise estimation and MMSE STSA . The proposed method (noise energy estimates, second energy values) continuously updates the noise estimate , using weighted noisy speech according to the estimated speech-to-noise ratio . In order to fully utilize the improvement offered by noise estimation , the spectral gain is corrected according to the estimated speech-to-noise ratio . By using accurate noise estimation , more accurate SNR than in the conventional method is obtained , which helps to reduce distortion in the enhanced speech . In subjective speech quality evaluations , the five-stage MOS was improved by 0 . 35 and 0 . 40 at the maximum , respectively , for the cases in which the speech was encoded and was not encoded after noise suppression . The improved version , which was developed on the basis of the proposed noise suppressor , satisfies all 3GPP minimum requirements for speech quality and has been installed in a commercially available model . (c) 2005 Wiley Periodicals . Inc .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20070088540A1

Filed: 2006-01-26     Issued: 2007-04-19

Voice data processing method and device

(Original Assignee) Fujitsu Ltd     (Current Assignee) Fujitsu Ltd

Toshiyuki Ohta, Kazuhiro Nomoto, Kano Asada, Kazunari Hirakawa
US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (signal data) .
US20070088540A1
CLAIM 1
. A voice data processing method comprising : a first step of , in a normal mode , decoding input signal data (SNR calculation) , repeating a calculation in coarse search used for a pitch detection by a predetermined frequency of loops within a required frequency of loops , based on history decode data , and holding a peak value of a normalized cross-correlation obtained by the calculation and a delay data value corresponding thereto ;
and a second step of , in a packet loss mode , executing the pitch detection by repeating a calculation of a normalized cross-correlation in the coarse search by a remaining required frequency of loops , by using the peak value of the normalized cross-correlation and the delay data value , thereby generating compensating data .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal to noise ratio (second value) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US20070088540A1
CLAIM 3
. The voice data processing method as claimed in claim 2 , wherein the first and the second step respectively include a fifth and a sixth step of invalidating and validating the third and the fourth step respectively when the predetermined frequency of loops is a first value corresponding to a suppression request of a coarse search amount in the normal mode , and of contrarily validating and invalidating the third and the fourth step when the predetermined frequency of loops is a second value (noise ratio) corresponding to a suppression request of a coarse search amount in the packet loss mode .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20060130637A1

Filed: 2005-08-01     Issued: 2006-06-22

Method for differentiated digital voice and music processing, noise filtering, creation of special effects and device for carrying out said method

(Original Assignee) Jean-Luc Crebouw     

Jean-Luc Crebouw
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (sound signal) using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US20060130637A1
CLAIM 22
. Method for the differentiated digital processing of a sound signal (sound signal) , constituted in the interval of a frame by the sum of sines of fixed amplitude and of which the frequency is modulated linearly as a function of time , this sum being modulated temporally by an envelope , the noise of said sound signal being added to said signal , prior to said sum , characterized in that it comprises : a stage of analysis making it possible to determine parameters representing said sound signal by a calculation of the envelope of the signal , a calculation of the period of the fundamental of the voice signal (pitch) and of its variation , an application to the temporal signal of the inverse variation of the pitch , a Fast Fourrier Transformation (FFT) of the pre-processed signal , an extraction of the signal frequential components and their amplitudes from the result of the Fast Fourrier Transformation , a calculation of the pitch and its validation in the frequential domain .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal (sound signal) in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US20060130637A1
CLAIM 22
. Method for the differentiated digital processing of a sound signal (sound signal) , constituted in the interval of a frame by the sum of sines of fixed amplitude and of which the frequency is modulated linearly as a function of time , this sum being modulated temporally by an envelope , the noise of said sound signal being added to said signal , prior to said sum , characterized in that it comprises : a stage of analysis making it possible to determine parameters representing said sound signal by a calculation of the envelope of the signal , a calculation of the period of the fundamental of the voice signal (pitch) and of its variation , an application to the temporal signal of the inverse variation of the pitch , a Fast Fourrier Transformation (FFT) of the pre-processed signal , an extraction of the signal frequential components and their amplitudes from the result of the Fast Fourrier Transformation , a calculation of the pitch and its validation in the frequential domain .

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (sound signal) .
US20060130637A1
CLAIM 22
. Method for the differentiated digital processing of a sound signal (sound signal) , constituted in the interval of a frame by the sum of sines of fixed amplitude and of which the frequency is modulated linearly as a function of time , this sum being modulated temporally by an envelope , the noise of said sound signal being added to said signal , prior to said sum , characterized in that it comprises : a stage of analysis making it possible to determine parameters representing said sound signal by a calculation of the envelope of the signal , a calculation of the period of the fundamental of the voice signal (pitch) and of its variation , an application to the temporal signal of the inverse variation of the pitch , a Fast Fourrier Transformation (FFT) of the pre-processed signal , an extraction of the signal frequential components and their amplitudes from the result of the Fast Fourrier Transformation , a calculation of the pitch and its validation in the frequential domain .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (sound signal) comprises searching in the correlation map for frequency bins having a magnitude that exceeds a given fixed threshold .
US20060130637A1
CLAIM 22
. Method for the differentiated digital processing of a sound signal (sound signal) , constituted in the interval of a frame by the sum of sines of fixed amplitude and of which the frequency is modulated linearly as a function of time , this sum being modulated temporally by an envelope , the noise of said sound signal being added to said signal , prior to said sum , characterized in that it comprises : a stage of analysis making it possible to determine parameters representing said sound signal by a calculation of the envelope of the signal , a calculation of the period of the fundamental of the voice signal (pitch) and of its variation , an application to the temporal signal of the inverse variation of the pitch , a Fast Fourrier Transformation (FFT) of the pre-processed signal , an extraction of the signal frequential components and their amplitudes from the result of the Fast Fourrier Transformation , a calculation of the pitch and its validation in the frequential domain .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (sound signal) comprises comparing the summed long-term correlation map with an adaptive threshold indicative of sound activity in the sound signal .
US20060130637A1
CLAIM 22
. Method for the differentiated digital processing of a sound signal (sound signal) , constituted in the interval of a frame by the sum of sines of fixed amplitude and of which the frequency is modulated linearly as a function of time , this sum being modulated temporally by an envelope , the noise of said sound signal being added to said signal , prior to said sum , characterized in that it comprises : a stage of analysis making it possible to determine parameters representing said sound signal by a calculation of the envelope of the signal , a calculation of the period of the fundamental of the voice signal (pitch) and of its variation , an application to the temporal signal of the inverse variation of the pitch , a Fast Fourrier Transformation (FFT) of the pre-processed signal , an extraction of the signal frequential components and their amplitudes from the result of the Fast Fourrier Transformation , a calculation of the pitch and its validation in the frequential domain .

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal (sound signal) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (first coefficient) ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US20060130637A1
CLAIM 22
. Method for the differentiated digital processing of a sound signal (sound signal) , constituted in the interval of a frame by the sum of sines of fixed amplitude and of which the frequency is modulated linearly as a function of time , this sum being modulated temporally by an envelope , the noise of said sound signal being added to said signal , prior to said sum , characterized in that it comprises : a stage of analysis making it possible to determine parameters representing said sound signal by a calculation of the envelope of the signal , a calculation of the period of the fundamental of the voice signal (pitch) and of its variation , an application to the temporal signal of the inverse variation of the pitch , a Fast Fourrier Transformation (FFT) of the pre-processed signal , an extraction of the signal frequential components and their amplitudes from the result of the Fast Fourrier Transformation , a calculation of the pitch and its validation in the frequential domain .

US20060130637A1
CLAIM 29
. Method according to claim 28 , characterized in that said shifted signals are multiplied by a same coefficient , and the original signal by a second coefficient , the sum of said first coefficient (background noise signal) , added to itself , and of said second coefficient is equal to 1 , reduced in order to retain an equivalent level of the resultant signal .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound signal (sound signal) is detected .
US20060130637A1
CLAIM 22
. Method for the differentiated digital processing of a sound signal (sound signal) , constituted in the interval of a frame by the sum of sines of fixed amplitude and of which the frequency is modulated linearly as a function of time , this sum being modulated temporally by an envelope , the noise of said sound signal being added to said signal , prior to said sum , characterized in that it comprises : a stage of analysis making it possible to determine parameters representing said sound signal by a calculation of the envelope of the signal , a calculation of the period of the fundamental of the voice signal (pitch) and of its variation , an application to the temporal signal of the inverse variation of the pitch , a Fast Fourrier Transformation (FFT) of the pre-processed signal , an extraction of the signal frequential components and their amplitudes from the result of the Fast Fourrier Transformation , a calculation of the pitch and its validation in the frequential domain .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity in the sound signal (sound signal) further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
US20060130637A1
CLAIM 22
. Method for the differentiated digital processing of a sound signal (sound signal) , constituted in the interval of a frame by the sum of sines of fixed amplitude and of which the frequency is modulated linearly as a function of time , this sum being modulated temporally by an envelope , the noise of said sound signal being added to said signal , prior to said sum , characterized in that it comprises : a stage of analysis making it possible to determine parameters representing said sound signal by a calculation of the envelope of the signal , a calculation of the period of the fundamental of the voice signal (pitch) and of its variation , an application to the temporal signal of the inverse variation of the pitch , a Fast Fourrier Transformation (FFT) of the pre-processed signal , an extraction of the signal frequential components and their amplitudes from the result of the Fast Fourrier Transformation , a calculation of the pitch and its validation in the frequential domain .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection comprises detecting the sound signal (sound signal) based on a frequency dependent signal-to-noise ratio (SNR) .
US20060130637A1
CLAIM 22
. Method for the differentiated digital processing of a sound signal (sound signal) , constituted in the interval of a frame by the sum of sines of fixed amplitude and of which the frequency is modulated linearly as a function of time , this sum being modulated temporally by an envelope , the noise of said sound signal being added to said signal , prior to said sum , characterized in that it comprises : a stage of analysis making it possible to determine parameters representing said sound signal by a calculation of the envelope of the signal , a calculation of the period of the fundamental of the voice signal (pitch) and of its variation , an application to the temporal signal of the inverse variation of the pitch , a Fast Fourrier Transformation (FFT) of the pre-processed signal , an extraction of the signal frequential components and their amplitudes from the result of the Fast Fourrier Transformation , a calculation of the pitch and its validation in the frequential domain .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal (sound signal) further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
US20060130637A1
CLAIM 22
. Method for the differentiated digital processing of a sound signal (sound signal) , constituted in the interval of a frame by the sum of sines of fixed amplitude and of which the frequency is modulated linearly as a function of time , this sum being modulated temporally by an envelope , the noise of said sound signal being added to said signal , prior to said sum , characterized in that it comprises : a stage of analysis making it possible to determine parameters representing said sound signal by a calculation of the envelope of the signal , a calculation of the period of the fundamental of the voice signal (pitch) and of its variation , an application to the temporal signal of the inverse variation of the pitch , a Fast Fourrier Transformation (FFT) of the pre-processed signal , an extraction of the signal frequential components and their amplitudes from the result of the Fast Fourrier Transformation , a calculation of the pitch and its validation in the frequential domain .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal (sound signal) and a ratio between a second order and a sixteenth order of linear prediction (ambient noise) residual error energies .
US20060130637A1
CLAIM 22
. Method for the differentiated digital processing of a sound signal (sound signal) , constituted in the interval of a frame by the sum of sines of fixed amplitude and of which the frequency is modulated linearly as a function of time , this sum being modulated temporally by an envelope , the noise of said sound signal being added to said signal , prior to said sum , characterized in that it comprises : a stage of analysis making it possible to determine parameters representing said sound signal by a calculation of the envelope of the signal , a calculation of the period of the fundamental of the voice signal (pitch) and of its variation , an application to the temporal signal of the inverse variation of the pitch , a Fast Fourrier Transformation (FFT) of the pre-processed signal , an extraction of the signal frequential components and their amplitudes from the result of the Fast Fourrier Transformation , a calculation of the pitch and its validation in the frequential domain .

US20060130637A1
CLAIM 35
. Device according to claim 34 , characterized in that said means of analysis comprise : means of calculation of the envelope of the signal , means of calculation of the pitch and of its variation , means of application of the inverse variation of the pitch to the temporal signal , means for the Fast Fourrier Transformation (FFT) of the preprocessed signal , means of extraction of the frequential components and their amplitudes from said signal , from the result of the Fast Fourrier Transformation , means of optional elimination of the ambient noise (linear prediction) by selective filtering before coding .

US8990073B2
CLAIM 18
. A method as defined in claim 14 , comprising classifying the sound signal (sound signal) as one of an inactive sound signal and active sound signal , which comprises determining an inactive sound signal when the average signal-to-noise ratio (SNR av ) is inferior to the calculated threshold .
US20060130637A1
CLAIM 22
. Method for the differentiated digital processing of a sound signal (sound signal) , constituted in the interval of a frame by the sum of sines of fixed amplitude and of which the frequency is modulated linearly as a function of time , this sum being modulated temporally by an envelope , the noise of said sound signal being added to said signal , prior to said sum , characterized in that it comprises : a stage of analysis making it possible to determine parameters representing said sound signal by a calculation of the envelope of the signal , a calculation of the period of the fundamental of the voice signal (pitch) and of its variation , an application to the temporal signal of the inverse variation of the pitch , a Fast Fourrier Transformation (FFT) of the pre-processed signal , an extraction of the signal frequential components and their amplitudes from the result of the Fast Fourrier Transformation , a calculation of the pitch and its validation in the frequential domain .

US8990073B2
CLAIM 19
. A method as defined in claim 14 , comprising classifying the sound signal (sound signal) as one of an inactive sound signal and active sound signal , which comprises determining an active sound signal when the average signal-to-noise ratio (SNR av ) is larger than the calculated threshold .
US20060130637A1
CLAIM 22
. Method for the differentiated digital processing of a sound signal (sound signal) , constituted in the interval of a frame by the sum of sines of fixed amplitude and of which the frequency is modulated linearly as a function of time , this sum being modulated temporally by an envelope , the noise of said sound signal being added to said signal , prior to said sum , characterized in that it comprises : a stage of analysis making it possible to determine parameters representing said sound signal by a calculation of the envelope of the signal , a calculation of the period of the fundamental of the voice signal (pitch) and of its variation , an application to the temporal signal of the inverse variation of the pitch , a Fast Fourrier Transformation (FFT) of the pre-processed signal , an extraction of the signal frequential components and their amplitudes from the result of the Fast Fourrier Transformation , a calculation of the pitch and its validation in the frequential domain .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal (sound signal) prevents updating of noise energy estimates when a music signal is detected .
US20060130637A1
CLAIM 22
. Method for the differentiated digital processing of a sound signal (sound signal) , constituted in the interval of a frame by the sum of sines of fixed amplitude and of which the frequency is modulated linearly as a function of time , this sum being modulated temporally by an envelope , the noise of said sound signal being added to said signal , prior to said sum , characterized in that it comprises : a stage of analysis making it possible to determine parameters representing said sound signal by a calculation of the envelope of the signal , a calculation of the period of the fundamental of the voice signal (pitch) and of its variation , an application to the temporal signal of the inverse variation of the pitch , a Fast Fourrier Transformation (FFT) of the pre-processed signal , an extraction of the signal frequential components and their amplitudes from the result of the Fast Fourrier Transformation , a calculation of the pitch and its validation in the frequential domain .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal (first coefficient) and prevent update of noise energy estimates on the music signal .
US20060130637A1
CLAIM 29
. Method according to claim 28 , characterized in that said shifted signals are multiplied by a same coefficient , and the original signal by a second coefficient , the sum of said first coefficient (background noise signal) , added to itself , and of said second coefficient is equal to 1 , reduced in order to retain an equivalent level of the resultant signal .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal (sound signal) in a current frame and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US20060130637A1
CLAIM 22
. Method for the differentiated digital processing of a sound signal (sound signal) , constituted in the interval of a frame by the sum of sines of fixed amplitude and of which the frequency is modulated linearly as a function of time , this sum being modulated temporally by an envelope , the noise of said sound signal being added to said signal , prior to said sum , characterized in that it comprises : a stage of analysis making it possible to determine parameters representing said sound signal by a calculation of the envelope of the signal , a calculation of the period of the fundamental of the voice signal (pitch) and of its variation , an application to the temporal signal of the inverse variation of the pitch , a Fast Fourrier Transformation (FFT) of the pre-processed signal , an extraction of the signal frequential components and their amplitudes from the result of the Fast Fourrier Transformation , a calculation of the pitch and its validation in the frequential domain .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (negative value) indicative of an activity of the sound signal (sound signal) .
US20060130637A1
CLAIM 22
. Method for the differentiated digital processing of a sound signal (sound signal) , constituted in the interval of a frame by the sum of sines of fixed amplitude and of which the frequency is modulated linearly as a function of time , this sum being modulated temporally by an envelope , the noise of said sound signal being added to said signal , prior to said sum , characterized in that it comprises : a stage of analysis making it possible to determine parameters representing said sound signal by a calculation of the envelope of the signal , a calculation of the period of the fundamental of the voice signal (pitch) and of its variation , an application to the temporal signal of the inverse variation of the pitch , a Fast Fourrier Transformation (FFT) of the pre-processed signal , an extraction of the signal frequential components and their amplitudes from the result of the Fast Fourrier Transformation , a calculation of the pitch and its validation in the frequential domain .

US20060130637A1
CLAIM 28
. Method according to claim 25 , characterized in that said stage of filtering of the noise and said stage of generation of special effects , from the analysis , without passing though the synthesis , comprise a sum of the original signal , of the original signal shifted by one pitch in positive value and of the original signal shifted by one pitch in negative value (activity prediction parameter) .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (negative value) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal (sound signal) and the complementary non-stationarity parameter .
US20060130637A1
CLAIM 22
. Method for the differentiated digital processing of a sound signal (sound signal) , constituted in the interval of a frame by the sum of sines of fixed amplitude and of which the frequency is modulated linearly as a function of time , this sum being modulated temporally by an envelope , the noise of said sound signal being added to said signal , prior to said sum , characterized in that it comprises : a stage of analysis making it possible to determine parameters representing said sound signal by a calculation of the envelope of the signal , a calculation of the period of the fundamental of the voice signal (pitch) and of its variation , an application to the temporal signal of the inverse variation of the pitch , a Fast Fourrier Transformation (FFT) of the pre-processed signal , an extraction of the signal frequential components and their amplitudes from the result of the Fast Fourrier Transformation , a calculation of the pitch and its validation in the frequential domain .

US20060130637A1
CLAIM 28
. Method according to claim 25 , characterized in that said stage of filtering of the noise and said stage of generation of special effects , from the analysis , without passing though the synthesis , comprise a sum of the original signal , of the original signal shifted by one pitch in positive value and of the original signal shifted by one pitch in negative value (activity prediction parameter) .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (negative value) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US20060130637A1
CLAIM 28
. Method according to claim 25 , characterized in that said stage of filtering of the noise and said stage of generation of special effects , from the analysis , without passing though the synthesis , comprise a sum of the original signal , of the original signal shifted by one pitch in positive value and of the original signal shifted by one pitch in negative value (activity prediction parameter) .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal (sound signal) using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20060130637A1
CLAIM 22
. Method for the differentiated digital processing of a sound signal (sound signal) , constituted in the interval of a frame by the sum of sines of fixed amplitude and of which the frequency is modulated linearly as a function of time , this sum being modulated temporally by an envelope , the noise of said sound signal being added to said signal , prior to said sum , characterized in that it comprises : a stage of analysis making it possible to determine parameters representing said sound signal by a calculation of the envelope of the signal , a calculation of the period of the fundamental of the voice signal (pitch) and of its variation , an application to the temporal signal of the inverse variation of the pitch , a Fast Fourrier Transformation (FFT) of the pre-processed signal , an extraction of the signal frequential components and their amplitudes from the result of the Fast Fourrier Transformation , a calculation of the pitch and its validation in the frequential domain .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal (sound signal) using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20060130637A1
CLAIM 22
. Method for the differentiated digital processing of a sound signal (sound signal) , constituted in the interval of a frame by the sum of sines of fixed amplitude and of which the frequency is modulated linearly as a function of time , this sum being modulated temporally by an envelope , the noise of said sound signal being added to said signal , prior to said sum , characterized in that it comprises : a stage of analysis making it possible to determine parameters representing said sound signal by a calculation of the envelope of the signal , a calculation of the period of the fundamental of the voice signal (pitch) and of its variation , an application to the temporal signal of the inverse variation of the pitch , a Fast Fourrier Transformation (FFT) of the pre-processed signal , an extraction of the signal frequential components and their amplitudes from the result of the Fast Fourrier Transformation , a calculation of the pitch and its validation in the frequential domain .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal (sound signal) in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US20060130637A1
CLAIM 22
. Method for the differentiated digital processing of a sound signal (sound signal) , constituted in the interval of a frame by the sum of sines of fixed amplitude and of which the frequency is modulated linearly as a function of time , this sum being modulated temporally by an envelope , the noise of said sound signal being added to said signal , prior to said sum , characterized in that it comprises : a stage of analysis making it possible to determine parameters representing said sound signal by a calculation of the envelope of the signal , a calculation of the period of the fundamental of the voice signal (pitch) and of its variation , an application to the temporal signal of the inverse variation of the pitch , a Fast Fourrier Transformation (FFT) of the pre-processed signal , an extraction of the signal frequential components and their amplitudes from the result of the Fast Fourrier Transformation , a calculation of the pitch and its validation in the frequential domain .

US8990073B2
CLAIM 34
. A device as defined in claim 31 , further comprising a detector of strong tones in the sound signal (sound signal) .
US20060130637A1
CLAIM 22
. Method for the differentiated digital processing of a sound signal (sound signal) , constituted in the interval of a frame by the sum of sines of fixed amplitude and of which the frequency is modulated linearly as a function of time , this sum being modulated temporally by an envelope , the noise of said sound signal being added to said signal , prior to said sum , characterized in that it comprises : a stage of analysis making it possible to determine parameters representing said sound signal by a calculation of the envelope of the signal , a calculation of the period of the fundamental of the voice signal (pitch) and of its variation , an application to the temporal signal of the inverse variation of the pitch , a Fast Fourrier Transformation (FFT) of the pre-processed signal , an extraction of the signal frequential components and their amplitudes from the result of the Fast Fourrier Transformation , a calculation of the pitch and its validation in the frequential domain .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal (sound signal) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (first coefficient) ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US20060130637A1
CLAIM 22
. Method for the differentiated digital processing of a sound signal (sound signal) , constituted in the interval of a frame by the sum of sines of fixed amplitude and of which the frequency is modulated linearly as a function of time , this sum being modulated temporally by an envelope , the noise of said sound signal being added to said signal , prior to said sum , characterized in that it comprises : a stage of analysis making it possible to determine parameters representing said sound signal by a calculation of the envelope of the signal , a calculation of the period of the fundamental of the voice signal (pitch) and of its variation , an application to the temporal signal of the inverse variation of the pitch , a Fast Fourrier Transformation (FFT) of the pre-processed signal , an extraction of the signal frequential components and their amplitudes from the result of the Fast Fourrier Transformation , a calculation of the pitch and its validation in the frequential domain .

US20060130637A1
CLAIM 29
. Method according to claim 28 , characterized in that said shifted signals are multiplied by a same coefficient , and the original signal by a second coefficient , the sum of said first coefficient (background noise signal) , added to itself , and of said second coefficient is equal to 1 , reduced in order to retain an equivalent level of the resultant signal .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal (sound signal) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal (first coefficient) ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US20060130637A1
CLAIM 22
. Method for the differentiated digital processing of a sound signal (sound signal) , constituted in the interval of a frame by the sum of sines of fixed amplitude and of which the frequency is modulated linearly as a function of time , this sum being modulated temporally by an envelope , the noise of said sound signal being added to said signal , prior to said sum , characterized in that it comprises : a stage of analysis making it possible to determine parameters representing said sound signal by a calculation of the envelope of the signal , a calculation of the period of the fundamental of the voice signal (pitch) and of its variation , an application to the temporal signal of the inverse variation of the pitch , a Fast Fourrier Transformation (FFT) of the pre-processed signal , an extraction of the signal frequential components and their amplitudes from the result of the Fast Fourrier Transformation , a calculation of the pitch and its validation in the frequential domain .

US20060130637A1
CLAIM 29
. Method according to claim 28 , characterized in that said shifted signals are multiplied by a same coefficient , and the original signal by a second coefficient , the sum of said first coefficient (background noise signal) , added to itself , and of said second coefficient is equal to 1 , reduced in order to retain an equivalent level of the resultant signal .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator (pre-processed signal) for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector .
US20060130637A1
CLAIM 22
. Method for the differentiated digital processing of a sound signal , constituted in the interval of a frame by the sum of sines of fixed amplitude and of which the frequency is modulated linearly as a function of time , this sum being modulated temporally by an envelope , the noise of said sound signal being added to said signal , prior to said sum , characterized in that it comprises : a stage of analysis making it possible to determine parameters representing said sound signal by a calculation of the envelope of the signal , a calculation of the period of the fundamental of the voice signal (pitch) and of its variation , an application to the temporal signal of the inverse variation of the pitch , a Fast Fourrier Transformation (FFT) of the pre-processed signal (noise estimator) , an extraction of the signal frequential components and their amplitudes from the result of the Fast Fourrier Transformation , a calculation of the pitch and its validation in the frequential domain .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal (sound signal) for distinguishing a music signal from a background noise signal (first coefficient) and preventing update of noise energy estimates .
US20060130637A1
CLAIM 22
. Method for the differentiated digital processing of a sound signal (sound signal) , constituted in the interval of a frame by the sum of sines of fixed amplitude and of which the frequency is modulated linearly as a function of time , this sum being modulated temporally by an envelope , the noise of said sound signal being added to said signal , prior to said sum , characterized in that it comprises : a stage of analysis making it possible to determine parameters representing said sound signal by a calculation of the envelope of the signal , a calculation of the period of the fundamental of the voice signal (pitch) and of its variation , an application to the temporal signal of the inverse variation of the pitch , a Fast Fourrier Transformation (FFT) of the pre-processed signal , an extraction of the signal frequential components and their amplitudes from the result of the Fast Fourrier Transformation , a calculation of the pitch and its validation in the frequential domain .

US20060130637A1
CLAIM 29
. Method according to claim 28 , characterized in that said shifted signals are multiplied by a same coefficient , and the original signal by a second coefficient , the sum of said first coefficient (background noise signal) , added to itself , and of said second coefficient is equal to 1 , reduced in order to retain an equivalent level of the resultant signal .

US8990073B2
CLAIM 41
. A device as defined in claim 36 , further comprising a calculator of a spectral parameter used for detecting spectral changes and spectral attacks in the sound signal (sound signal) .
US20060130637A1
CLAIM 22
. Method for the differentiated digital processing of a sound signal (sound signal) , constituted in the interval of a frame by the sum of sines of fixed amplitude and of which the frequency is modulated linearly as a function of time , this sum being modulated temporally by an envelope , the noise of said sound signal being added to said signal , prior to said sum , characterized in that it comprises : a stage of analysis making it possible to determine parameters representing said sound signal by a calculation of the envelope of the signal , a calculation of the period of the fundamental of the voice signal (pitch) and of its variation , an application to the temporal signal of the inverse variation of the pitch , a Fast Fourrier Transformation (FFT) of the pre-processed signal , an extraction of the signal frequential components and their amplitudes from the result of the Fast Fourrier Transformation , a calculation of the pitch and its validation in the frequential domain .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
JP2007025290A

Filed: 2005-07-15     Issued: 2007-02-01

マルチチャンネル音響コーデックにおける残響を制御する装置

(Original Assignee) Matsushita Electric Ind Co Ltd; 松下電器産業株式会社     

Akihisa Kawamura, Sen Chon Kok, Shuji Miyasaka, Takeshi Norimatsu, Koshiro Ono, Yoshiaki Takagi, セン・チョン コク, 武志 則松, 修二 宮阪, 耕司郎 小野, 明久 川村, 良明 高木
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (スペクトル領域) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
JP2007025290A
CLAIM 4
前記パラメータは前記ダウンミックス信号の異なったスペクトル領域 (frequency spectrum) におけるトーナリティを示し、それぞれパラメータは、同一のスペクトル領域に対応する前記オールパスフィルタに適用される ことを特徴とする請求項1および2記載の方法。

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (スペクトル領域) of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
JP2007025290A
CLAIM 4
前記パラメータは前記ダウンミックス信号の異なったスペクトル領域 (frequency spectrum) におけるトーナリティを示し、それぞれパラメータは、同一のスペクトル領域に対応する前記オールパスフィルタに適用される ことを特徴とする請求項1および2記載の方法。

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (スペクトル領域) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
JP2007025290A
CLAIM 4
前記パラメータは前記ダウンミックス信号の異なったスペクトル領域 (frequency spectrum) におけるトーナリティを示し、それぞれパラメータは、同一のスペクトル領域に対応する前記オールパスフィルタに適用される ことを特徴とする請求項1および2記載の方法。

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (スペクトル領域) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
JP2007025290A
CLAIM 4
前記パラメータは前記ダウンミックス信号の異なったスペクトル領域 (frequency spectrum) におけるトーナリティを示し、それぞれパラメータは、同一のスペクトル領域に対応する前記オールパスフィルタに適用される ことを特徴とする請求項1および2記載の方法。

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (スペクトル領域) of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
JP2007025290A
CLAIM 4
前記パラメータは前記ダウンミックス信号の異なったスペクトル領域 (frequency spectrum) におけるトーナリティを示し、それぞれパラメータは、同一のスペクトル領域に対応する前記オールパスフィルタに適用される ことを特徴とする請求項1および2記載の方法。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20060224381A1

Filed: 2005-04-04     Issued: 2006-10-05

Detecting speech frames belonging to a low energy sequence

(Original Assignee) Nokia Oyj     (Current Assignee) Nokia Oyj

Jari Makinen
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (speech encoder) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long term correlation map .
US20060224381A1
CLAIM 8
. The method according to claim 1 , wherein in case said estimated speech energy level exceeds said nominal speech level at least by a predetermined amount , said determined speech energy for said current frame (current frame) is scaled to a lower value , and wherein in case said estimated speech energy level falls short of said nominal speech level at least by a predetermined amount , said determined speech energy for said current frame is scaled to a higher value .

US20060224381A1
CLAIM 20
. The encoding module according to claim 14 , further comprising a speech encoder (frequency spectrum, noise estimator) adapted to encode a current speech frame , wherein said current speech frame is encoded with a dedicated low bit rate coding mode , in case said current speech frame is detected to belong to a low energy sequence .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (speech encoder) of the sound signal in the current frame (current frame) ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US20060224381A1
CLAIM 8
. The method according to claim 1 , wherein in case said estimated speech energy level exceeds said nominal speech level at least by a predetermined amount , said determined speech energy for said current frame (current frame) is scaled to a lower value , and wherein in case said estimated speech energy level falls short of said nominal speech level at least by a predetermined amount , said determined speech energy for said current frame is scaled to a higher value .

US20060224381A1
CLAIM 20
. The encoding module according to claim 14 , further comprising a speech encoder (frequency spectrum, noise estimator) adapted to encode a current speech frame , wherein said current speech frame is encoded with a dedicated low bit rate coding mode , in case said current speech frame is detected to belong to a low energy sequence .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis (energy levels) ;

and summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US20060224381A1
CLAIM 9
. The method according to claim 1 , wherein said scaling is performed based on one of a plurality of correction functions , each correction function being valid for another range of speech energy levels (frequency bin basis) .

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (first coefficient) ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US20060224381A1
CLAIM 5
. The method according to claim 4 , wherein in case said determined speech energy in said current speech frame is higher than a speech energy in said preceding speech frame , said available speech energy level is weighted with a first coefficient (background noise signal) , wherein in case said determined speech energy in said current speech frame is lower than said speech energy in said preceding speech frame , said available speech energy level is weighted with a second coefficient , and wherein said first coefficient is higher than said second coefficient .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection further comprises updating the noise estimates for a next frame (when r) .
US20060224381A1
CLAIM 25
. A software program product in which a software code for detecting speech frames belonging to a low energy sequence of a speech signal is stored , said software code realizing the following steps when r (next frame) unning in a processing unit of an electronic device : determining a speech energy in a current speech frame ;
estimating a speech energy level based on a speech energy in a plurality of speech frames ;
if said estimated speech energy level deviates at least by a predetermined amount from a predetermined nominal speech energy level , scaling said determined speech energy in said current speech frame ;
and deciding that said current speech frame belongs to a low energy sequence , if said , potentially scaled , frame energy is lower than a predetermined low energy threshold value .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame (when r) comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order (following steps) and a sixteenth order of linear prediction residual error (energy threshold) energies .
US20060224381A1
CLAIM 1
. A method for detecting speech frames belonging to a low energy sequence of a speech signal , said method comprising : determining a speech energy in a current speech frame ;
estimating a speech energy level based on a speech energy in a plurality of speech frames ;
if said estimated speech energy level deviates at least by a predetermined amount from a predetermined nominal speech energy level , scaling said determined speech energy in said current speech frame ;
and deciding that said current speech frame belongs to a low energy sequence , if said , potentially scaled , frame energy is lower than a predetermined low energy threshold (residual error) value .

US20060224381A1
CLAIM 25
. A software program product in which a software code for detecting speech frames belonging to a low energy sequence of a speech signal is stored , said software code realizing the following steps (second order) when r (next frame) unning in a processing unit of an electronic device : determining a speech energy in a current speech frame ;
estimating a speech energy level based on a speech energy in a plurality of speech frames ;
if said estimated speech energy level deviates at least by a predetermined amount from a predetermined nominal speech energy level , scaling said determined speech energy in said current speech frame ;
and deciding that said current speech frame belongs to a low energy sequence , if said , potentially scaled , frame energy is lower than a predetermined low energy threshold value .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (speech signal) in order to distinguish a music signal from a background noise signal (first coefficient) and prevent update of noise energy estimates on the music signal .
US20060224381A1
CLAIM 1
. A method for detecting speech frames belonging to a low energy sequence of a speech signal (noise character parameter, activity prediction parameter) , said method comprising : determining a speech energy in a current speech frame ;
estimating a speech energy level based on a speech energy in a plurality of speech frames ;
if said estimated speech energy level deviates at least by a predetermined amount from a predetermined nominal speech energy level , scaling said determined speech energy in said current speech frame ;
and deciding that said current speech frame belongs to a low energy sequence , if said , potentially scaled , frame energy is lower than a predetermined low energy threshold value .

US20060224381A1
CLAIM 5
. The method according to claim 4 , wherein in case said determined speech energy in said current speech frame is higher than a speech energy in said preceding speech frame , said available speech energy level is weighted with a first coefficient (background noise signal) , wherein in case said determined speech energy in said current speech frame is lower than said speech energy in said preceding speech frame , said available speech energy level is weighted with a second coefficient , and wherein said first coefficient is higher than said second coefficient .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame (current frame) energy and an average frame energy .
US20060224381A1
CLAIM 8
. The method according to claim 1 , wherein in case said estimated speech energy level exceeds said nominal speech level at least by a predetermined amount , said determined speech energy for said current frame (current frame) is scaled to a lower value , and wherein in case said estimated speech energy level falls short of said nominal speech level at least by a predetermined amount , said determined speech energy for said current frame is scaled to a higher value .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame (current frame) and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US20060224381A1
CLAIM 8
. The method according to claim 1 , wherein in case said estimated speech energy level exceeds said nominal speech level at least by a predetermined amount , said determined speech energy for said current frame (current frame) is scaled to a lower value , and wherein in case said estimated speech energy level falls short of said nominal speech level at least by a predetermined amount , said determined speech energy for said current frame is scaled to a higher value .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (speech signal) indicative of an activity of the sound signal .
US20060224381A1
CLAIM 1
. A method for detecting speech frames belonging to a low energy sequence of a speech signal (noise character parameter, activity prediction parameter) , said method comprising : determining a speech energy in a current speech frame ;
estimating a speech energy level based on a speech energy in a plurality of speech frames ;
if said estimated speech energy level deviates at least by a predetermined amount from a predetermined nominal speech energy level , scaling said determined speech energy in said current speech frame ;
and deciding that said current speech frame belongs to a low energy sequence , if said , potentially scaled , frame energy is lower than a predetermined low energy threshold value .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (speech signal) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
US20060224381A1
CLAIM 1
. A method for detecting speech frames belonging to a low energy sequence of a speech signal (noise character parameter, activity prediction parameter) , said method comprising : determining a speech energy in a current speech frame ;
estimating a speech energy level based on a speech energy in a plurality of speech frames ;
if said estimated speech energy level deviates at least by a predetermined amount from a predetermined nominal speech energy level , scaling said determined speech energy in said current speech frame ;
and deciding that said current speech frame belongs to a low energy sequence , if said , potentially scaled , frame energy is lower than a predetermined low energy threshold value .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (speech signal) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US20060224381A1
CLAIM 1
. A method for detecting speech frames belonging to a low energy sequence of a speech signal (noise character parameter, activity prediction parameter) , said method comprising : determining a speech energy in a current speech frame ;
estimating a speech energy level based on a speech energy in a plurality of speech frames ;
if said estimated speech energy level deviates at least by a predetermined amount from a predetermined nominal speech energy level , scaling said determined speech energy in said current speech frame ;
and deciding that said current speech frame belongs to a low energy sequence , if said , potentially scaled , frame energy is lower than a predetermined low energy threshold value .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (speech signal) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20060224381A1
CLAIM 1
. A method for detecting speech frames belonging to a low energy sequence of a speech signal (noise character parameter, activity prediction parameter) , said method comprising : determining a speech energy in a current speech frame ;
estimating a speech energy level based on a speech energy in a plurality of speech frames ;
if said estimated speech energy level deviates at least by a predetermined amount from a predetermined nominal speech energy level , scaling said determined speech energy in said current speech frame ;
and deciding that said current speech frame belongs to a low energy sequence , if said , potentially scaled , frame energy is lower than a predetermined low energy threshold value .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (speech signal) inferior than a given fixed threshold .
US20060224381A1
CLAIM 1
. A method for detecting speech frames belonging to a low energy sequence of a speech signal (noise character parameter, activity prediction parameter) , said method comprising : determining a speech energy in a current speech frame ;
estimating a speech energy level based on a speech energy in a plurality of speech frames ;
if said estimated speech energy level deviates at least by a predetermined amount from a predetermined nominal speech energy level , scaling said determined speech energy in said current speech frame ;
and deciding that said current speech frame belongs to a low energy sequence , if said , potentially scaled , frame energy is lower than a predetermined low energy threshold value .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (speech encoder) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long-term correlation map .
US20060224381A1
CLAIM 8
. The method according to claim 1 , wherein in case said estimated speech energy level exceeds said nominal speech level at least by a predetermined amount , said determined speech energy for said current frame (current frame) is scaled to a lower value , and wherein in case said estimated speech energy level falls short of said nominal speech level at least by a predetermined amount , said determined speech energy for said current frame is scaled to a higher value .

US20060224381A1
CLAIM 20
. The encoding module according to claim 14 , further comprising a speech encoder (frequency spectrum, noise estimator) adapted to encode a current speech frame , wherein said current speech frame is encoded with a dedicated low bit rate coding mode , in case said current speech frame is detected to belong to a low energy sequence .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (speech encoder) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long-term correlation map .
US20060224381A1
CLAIM 8
. The method according to claim 1 , wherein in case said estimated speech energy level exceeds said nominal speech level at least by a predetermined amount , said determined speech energy for said current frame (current frame) is scaled to a lower value , and wherein in case said estimated speech energy level falls short of said nominal speech level at least by a predetermined amount , said determined speech energy for said current frame is scaled to a higher value .

US20060224381A1
CLAIM 20
. The encoding module according to claim 14 , further comprising a speech encoder (frequency spectrum, noise estimator) adapted to encode a current speech frame , wherein said current speech frame is encoded with a dedicated low bit rate coding mode , in case said current speech frame is detected to belong to a low energy sequence .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (speech encoder) of the sound signal in the current frame (current frame) ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US20060224381A1
CLAIM 8
. The method according to claim 1 , wherein in case said estimated speech energy level exceeds said nominal speech level at least by a predetermined amount , said determined speech energy for said current frame (current frame) is scaled to a lower value , and wherein in case said estimated speech energy level falls short of said nominal speech level at least by a predetermined amount , said determined speech energy for said current frame is scaled to a higher value .

US20060224381A1
CLAIM 20
. The encoding module according to claim 14 , further comprising a speech encoder (frequency spectrum, noise estimator) adapted to encode a current speech frame , wherein said current speech frame is encoded with a dedicated low bit rate coding mode , in case said current speech frame is detected to belong to a low energy sequence .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis (energy levels) ;

and an adder for summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US20060224381A1
CLAIM 9
. The method according to claim 1 , wherein said scaling is performed based on one of a plurality of correction functions , each correction function being valid for another range of speech energy levels (frequency bin basis) .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (first coefficient) ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US20060224381A1
CLAIM 5
. The method according to claim 4 , wherein in case said determined speech energy in said current speech frame is higher than a speech energy in said preceding speech frame , said available speech energy level is weighted with a first coefficient (background noise signal) , wherein in case said determined speech energy in said current speech frame is lower than said speech energy in said preceding speech frame , said available speech energy level is weighted with a second coefficient , and wherein said first coefficient is higher than said second coefficient .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal (first coefficient) ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US20060224381A1
CLAIM 5
. The method according to claim 4 , wherein in case said determined speech energy in said current speech frame is higher than a speech energy in said preceding speech frame , said available speech energy level is weighted with a first coefficient (background noise signal) , wherein in case said determined speech energy in said current speech frame is lower than said speech energy in said preceding speech frame , said available speech energy level is weighted with a second coefficient , and wherein said first coefficient is higher than said second coefficient .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator (speech encoder) for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector .
US20060224381A1
CLAIM 20
. The encoding module according to claim 14 , further comprising a speech encoder (frequency spectrum, noise estimator) adapted to encode a current speech frame , wherein said current speech frame is encoded with a dedicated low bit rate coding mode , in case said current speech frame is detected to belong to a low energy sequence .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal (first coefficient) and preventing update of noise energy estimates .
US20060224381A1
CLAIM 5
. The method according to claim 4 , wherein in case said determined speech energy in said current speech frame is higher than a speech energy in said preceding speech frame , said available speech energy level is weighted with a first coefficient (background noise signal) , wherein in case said determined speech energy in said current speech frame is lower than said speech energy in said preceding speech frame , said available speech energy level is weighted with a second coefficient , and wherein said first coefficient is higher than said second coefficient .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20050143989A1

Filed: 2004-12-22     Issued: 2005-06-30

Method and device for speech enhancement in the presence of background noise

(Original Assignee) Nokia Oyj     (Current Assignee) Nokia Technologies Oy

Milan Jelinek
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (domain representation) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US20050143989A1
CLAIM 1
. A method for noise suppression of a speech signal , comprising : for a speech signal having a frequency domain representation (frequency spectrum) dividable into a plurality of frequency bins , determining a value of a scaling gain for at least some of said frequency bins ;
and calculating smoothed scaling gain values , comprising for said at least some of said frequency bins combining a currently determined value of the scaling gain and a previously determined value of the smoothed scaling gain .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (domain representation) of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US20050143989A1
CLAIM 1
. A method for noise suppression of a speech signal , comprising : for a speech signal having a frequency domain representation (frequency spectrum) dividable into a plurality of frequency bins , determining a value of a scaling gain for at least some of said frequency bins ;
and calculating smoothed scaling gain values , comprising for said at least some of said frequency bins combining a currently determined value of the scaling gain and a previously determined value of the smoothed scaling gain .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates (noise energy estimates) when a tonal sound signal is detected .
US20050143989A1
CLAIM 29
. A method as in claim 26 , where a decision whether to update noise energy estimates (noise energy estimates) per critical band during inactive speech periods is based on parameters substantially independent of a signal-to-noise ratio (SNR) per critical band .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates (noise energy estimates) calculated in a previous frame in a SNR calculation (noise ratio) .
US20050143989A1
CLAIM 2
. A method as in claim 1 , where determining the value of the scaling gain comprises using a signal-to-noise ratio (noise ratio, SNR LT, SNR calculation) (SNR) .

US20050143989A1
CLAIM 29
. A method as in claim 26 , where a decision whether to update noise energy estimates (noise energy estimates) per critical band during inactive speech periods is based on parameters substantially independent of a signal-to-noise ratio (SNR) per critical band .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates (noise energy estimates) for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
US20050143989A1
CLAIM 29
. A method as in claim 26 , where a decision whether to update noise energy estimates (noise energy estimates) per critical band during inactive speech periods is based on parameters substantially independent of a signal-to-noise ratio (SNR) per critical band .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal prevents updating of noise energy estimates (noise energy estimates) when a music signal is detected .
US20050143989A1
CLAIM 29
. A method as in claim 26 , where a decision whether to update noise energy estimates (noise energy estimates) per critical band during inactive speech periods is based on parameters substantially independent of a signal-to-noise ratio (SNR) per critical band .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates (noise energy estimates) on the music signal .
US20050143989A1
CLAIM 29
. A method as in claim 26 , where a decision whether to update noise energy estimates (noise energy estimates) per critical band during inactive speech periods is based on parameters substantially independent of a signal-to-noise ratio (SNR) per critical band .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates (noise energy estimates) is prevented in response to having simultaneously the activity prediction parameter larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US20050143989A1
CLAIM 29
. A method as in claim 26 , where a decision whether to update noise energy estimates (noise energy estimates) per critical band during inactive speech periods is based on parameters substantially independent of a signal-to-noise ratio (SNR) per critical band .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates (noise energy estimates) is prevented in response to having the noise character parameter inferior than a given fixed threshold .
US20050143989A1
CLAIM 29
. A method as in claim 26 , where a decision whether to update noise energy estimates (noise energy estimates) per critical band during inactive speech periods is based on parameters substantially independent of a signal-to-noise ratio (SNR) per critical band .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (domain representation) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20050143989A1
CLAIM 1
. A method for noise suppression of a speech signal , comprising : for a speech signal having a frequency domain representation (frequency spectrum) dividable into a plurality of frequency bins , determining a value of a scaling gain for at least some of said frequency bins ;
and calculating smoothed scaling gain values , comprising for said at least some of said frequency bins combining a currently determined value of the scaling gain and a previously determined value of the smoothed scaling gain .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (domain representation) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20050143989A1
CLAIM 1
. A method for noise suppression of a speech signal , comprising : for a speech signal having a frequency domain representation (frequency spectrum) dividable into a plurality of frequency bins , determining a value of a scaling gain for at least some of said frequency bins ;
and calculating smoothed scaling gain values , comprising for said at least some of said frequency bins combining a currently determined value of the scaling gain and a previously determined value of the smoothed scaling gain .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (domain representation) of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US20050143989A1
CLAIM 1
. A method for noise suppression of a speech signal , comprising : for a speech signal having a frequency domain representation (frequency spectrum) dividable into a plurality of frequency bins , determining a value of a scaling gain for at least some of said frequency bins ;
and calculating smoothed scaling gain values , comprising for said at least some of said frequency bins combining a currently determined value of the scaling gain and a previously determined value of the smoothed scaling gain .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal to noise ratio (noise ratio) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US20050143989A1
CLAIM 2
. A method as in claim 1 , where determining the value of the scaling gain comprises using a signal-to-noise ratio (noise ratio, SNR LT, SNR calculation) (SNR) .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates (noise energy estimates) in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector .
US20050143989A1
CLAIM 29
. A method as in claim 26 , where a decision whether to update noise energy estimates (noise energy estimates) per critical band during inactive speech periods is based on parameters substantially independent of a signal-to-noise ratio (SNR) per critical band .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates (noise energy estimates) .
US20050143989A1
CLAIM 29
. A method as in claim 26 , where a decision whether to update noise energy estimates (noise energy estimates) per critical band during inactive speech periods is based on parameters substantially independent of a signal-to-noise ratio (SNR) per critical band .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20050065781A1

Filed: 2004-11-15     Issued: 2005-03-24

Method for analysing audio signals

(Original Assignee) Empire Interactive Europe Ltd     (Current Assignee) Empire Interactive Europe Ltd

Andreas Tell, Bernhard Throll
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (analysis filterbank, band signals) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US20050065781A1
CLAIM 14
. The method according to claim 1 , wherein the frequency streams are calculated as a development according to the band signals (first frequency, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) of a filterbank , the coefficients being given by projections of a frequency evaluation onto the frequency responses of the filterbank .

US20050065781A1
CLAIM 16
. The method according to claim 15 , wherein the frequency evaluation is carried out with an FFT filter or an analysis filterbank (first frequency, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (analysis filterbank, band signals) of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US20050065781A1
CLAIM 14
. The method according to claim 1 , wherein the frequency streams are calculated as a development according to the band signals (first frequency, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) of a filterbank , the coefficients being given by projections of a frequency evaluation onto the frequency responses of the filterbank .

US20050065781A1
CLAIM 16
. The method according to claim 15 , wherein the frequency evaluation is carried out with an FFT filter or an analysis filterbank (first frequency, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) .

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum comprises locating a maximum between each pair of two consecutive minima (extracted signal, local maxima) of the current residual spectrum .
US20050065781A1
CLAIM 7
. The method according to claim 3 , wherein the separation of the frequency streams is achieved by searching for time-coherent local maxima (background noise signal, noise ratio, term signal, noise estimator, consecutive minima, two consecutive minima, correlation value) and calculation of the pitch data as a time series .

US20050065781A1
CLAIM 12
. The method according to claim 10 wherein the extracted signal (background noise signal, noise ratio, term signal, noise estimator, consecutive minima, two consecutive minima, correlation value) is multiplied by a complex-valued envelope to adapt the phase with an optimization method .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value (extracted signal, local maxima) with the previous residual spectrum , over frequency bins (analysis filterbank, band signals) between two consecutive minima (extracted signal, local maxima) in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US20050065781A1
CLAIM 7
. The method according to claim 3 , wherein the separation of the frequency streams is achieved by searching for time-coherent local maxima (background noise signal, noise ratio, term signal, noise estimator, consecutive minima, two consecutive minima, correlation value) and calculation of the pitch data as a time series .

US20050065781A1
CLAIM 12
. The method according to claim 10 wherein the extracted signal (background noise signal, noise ratio, term signal, noise estimator, consecutive minima, two consecutive minima, correlation value) is multiplied by a complex-valued envelope to adapt the phase with an optimization method .

US20050065781A1
CLAIM 14
. The method according to claim 1 , wherein the frequency streams are calculated as a development according to the band signals (first frequency, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) of a filterbank , the coefficients being given by projections of a frequency evaluation onto the frequency responses of the filterbank .

US20050065781A1
CLAIM 16
. The method according to claim 15 , wherein the frequency evaluation is carried out with an FFT filter or an analysis filterbank (first frequency, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin (analysis filterbank, band signals) by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (analysis filterbank, band signals) so as to produce a summed long-term correlation map .
US20050065781A1
CLAIM 14
. The method according to claim 1 , wherein the frequency streams are calculated as a development according to the band signals (first frequency, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) of a filterbank , the coefficients being given by projections of a frequency evaluation onto the frequency responses of the filterbank .

US20050065781A1
CLAIM 16
. The method according to claim 15 , wherein the frequency evaluation is carried out with an FFT filter or an analysis filterbank (first frequency, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map for frequency bins (analysis filterbank, band signals) having a magnitude that exceeds a given fixed threshold .
US20050065781A1
CLAIM 14
. The method according to claim 1 , wherein the frequency streams are calculated as a development according to the band signals (first frequency, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) of a filterbank , the coefficients being given by projections of a frequency evaluation onto the frequency responses of the filterbank .

US20050065781A1
CLAIM 16
. The method according to claim 15 , wherein the frequency evaluation is carried out with an FFT filter or an analysis filterbank (first frequency, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) .

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (frequency analysis) from a background noise signal (extracted signal, local maxima) ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US20050065781A1
CLAIM 7
. The method according to claim 3 , wherein the separation of the frequency streams is achieved by searching for time-coherent local maxima (background noise signal, noise ratio, term signal, noise estimator, consecutive minima, two consecutive minima, correlation value) and calculation of the pitch data as a time series .

US20050065781A1
CLAIM 12
. The method according to claim 10 wherein the extracted signal (background noise signal, noise ratio, term signal, noise estimator, consecutive minima, two consecutive minima, correlation value) is multiplied by a complex-valued envelope to adapt the phase with an optimization method .

US20050065781A1
CLAIM 18
. The method according to claim 17 , wherein several bands with frequency-localized noise are used for modeling , the bands being added according to a frequency analysis (music signal) with a time-dependent weighting .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal prevents updating (frequency noise) of noise energy estimates when a music signal (frequency analysis) is detected .
US20050065781A1
CLAIM 8
. The method according to claim 1 , wherein the mapping into the rhythm excitation layer consists of a linear mapping for frequency noise (average signal, sound signal prevents updating) suppression and for time correlation , which is applied to the logarithm of the spectral magnitude .

US20050065781A1
CLAIM 18
. The method according to claim 17 , wherein several bands with frequency-localized noise are used for modeling , the bands being added according to a frequency analysis (music signal) with a time-dependent weighting .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal (frequency analysis) from a background noise signal (extracted signal, local maxima) and prevent update of noise energy estimates on the music signal .
US20050065781A1
CLAIM 7
. The method according to claim 3 , wherein the separation of the frequency streams is achieved by searching for time-coherent local maxima (background noise signal, noise ratio, term signal, noise estimator, consecutive minima, two consecutive minima, correlation value) and calculation of the pitch data as a time series .

US20050065781A1
CLAIM 12
. The method according to claim 10 wherein the extracted signal (background noise signal, noise ratio, term signal, noise estimator, consecutive minima, two consecutive minima, correlation value) is multiplied by a complex-valued envelope to adapt the phase with an optimization method .

US20050065781A1
CLAIM 18
. The method according to claim 17 , wherein several bands with frequency-localized noise are used for modeling , the bands being added according to a frequency analysis (music signal) with a time-dependent weighting .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame energy and an average frame energy (predetermined time interval) .
US20050065781A1
CLAIM 19
. The method according to claim 17 , wherein the residual signal is modeled by calculating a distribution function from the statistic moments at predetermined time interval (average frame energy) s .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame and an energy of the sound signal in a previous frame , for frequency bands (analysis filterbank, band signals) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US20050065781A1
CLAIM 14
. The method according to claim 1 , wherein the frequency streams are calculated as a development according to the band signals (first frequency, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) of a filterbank , the coefficients being given by projections of a frequency evaluation onto the frequency responses of the filterbank .

US20050065781A1
CLAIM 16
. The method according to claim 15 , wherein the frequency evaluation is carried out with an FFT filter or an analysis filterbank (first frequency, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (analysis filterbank, band signals) into a first group (analysis filterbank, band signals) of a certain number of first frequency (analysis filterbank, band signals) bands and a second group (steps a) of a rest of the frequency bands ;

calculating a first energy (analysis filterbank, band signals) value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20050065781A1
CLAIM 14
. The method according to claim 1 , wherein the frequency streams are calculated as a development according to the band signals (first frequency, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) of a filterbank , the coefficients being given by projections of a frequency evaluation onto the frequency responses of the filterbank .

US20050065781A1
CLAIM 16
. The method according to claim 15 , wherein the frequency evaluation is carried out with an FFT filter or an analysis filterbank (first frequency, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) .

US20050065781A1
CLAIM 22
. The method according to claim 21 , wherein compression comprises the steps of : a) adaptive double-differential coding of the PEL streams , b) time-localized coding of the REL events , c) adaptive differential coding of the residual signal , d) statistic compression of the data from steps a (second group) ) , b) and c) by entropy maximization .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (analysis filterbank, band signals) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20050065781A1
CLAIM 14
. The method according to claim 1 , wherein the frequency streams are calculated as a development according to the band signals (first frequency, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) of a filterbank , the coefficients being given by projections of a frequency evaluation onto the frequency responses of the filterbank .

US20050065781A1
CLAIM 16
. The method according to claim 15 , wherein the frequency evaluation is carried out with an FFT filter or an analysis filterbank (first frequency, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (analysis filterbank, band signals) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20050065781A1
CLAIM 14
. The method according to claim 1 , wherein the frequency streams are calculated as a development according to the band signals (first frequency, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) of a filterbank , the coefficients being given by projections of a frequency evaluation onto the frequency responses of the filterbank .

US20050065781A1
CLAIM 16
. The method according to claim 15 , wherein the frequency evaluation is carried out with an FFT filter or an analysis filterbank (first frequency, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (analysis filterbank, band signals) of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US20050065781A1
CLAIM 14
. The method according to claim 1 , wherein the frequency streams are calculated as a development according to the band signals (first frequency, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) of a filterbank , the coefficients being given by projections of a frequency evaluation onto the frequency responses of the filterbank .

US20050065781A1
CLAIM 16
. The method according to claim 15 , wherein the frequency evaluation is carried out with an FFT filter or an analysis filterbank (first frequency, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin (analysis filterbank, band signals) by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (analysis filterbank, band signals) so as to produce a summed long-term correlation map .
US20050065781A1
CLAIM 14
. The method according to claim 1 , wherein the frequency streams are calculated as a development according to the band signals (first frequency, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) of a filterbank , the coefficients being given by projections of a frequency evaluation onto the frequency responses of the filterbank .

US20050065781A1
CLAIM 16
. The method according to claim 15 , wherein the frequency evaluation is carried out with an FFT filter or an analysis filterbank (first frequency, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (frequency analysis) from a background noise signal (extracted signal, local maxima) ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US20050065781A1
CLAIM 7
. The method according to claim 3 , wherein the separation of the frequency streams is achieved by searching for time-coherent local maxima (background noise signal, noise ratio, term signal, noise estimator, consecutive minima, two consecutive minima, correlation value) and calculation of the pitch data as a time series .

US20050065781A1
CLAIM 12
. The method according to claim 10 wherein the extracted signal (background noise signal, noise ratio, term signal, noise estimator, consecutive minima, two consecutive minima, correlation value) is multiplied by a complex-valued envelope to adapt the phase with an optimization method .

US20050065781A1
CLAIM 18
. The method according to claim 17 , wherein several bands with frequency-localized noise are used for modeling , the bands being added according to a frequency analysis (music signal) with a time-dependent weighting .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal (frequency analysis) from a background noise signal (extracted signal, local maxima) ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US20050065781A1
CLAIM 7
. The method according to claim 3 , wherein the separation of the frequency streams is achieved by searching for time-coherent local maxima (background noise signal, noise ratio, term signal, noise estimator, consecutive minima, two consecutive minima, correlation value) and calculation of the pitch data as a time series .

US20050065781A1
CLAIM 12
. The method according to claim 10 wherein the extracted signal (background noise signal, noise ratio, term signal, noise estimator, consecutive minima, two consecutive minima, correlation value) is multiplied by a complex-valued envelope to adapt the phase with an optimization method .

US20050065781A1
CLAIM 18
. The method according to claim 17 , wherein several bands with frequency-localized noise are used for modeling , the bands being added according to a frequency analysis (music signal) with a time-dependent weighting .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal (frequency noise) to noise ratio (extracted signal, local maxima) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US20050065781A1
CLAIM 7
. The method according to claim 3 , wherein the separation of the frequency streams is achieved by searching for time-coherent local maxima (background noise signal, noise ratio, term signal, noise estimator, consecutive minima, two consecutive minima, correlation value) and calculation of the pitch data as a time series .

US20050065781A1
CLAIM 8
. The method according to claim 1 , wherein the mapping into the rhythm excitation layer consists of a linear mapping for frequency noise (average signal, sound signal prevents updating) suppression and for time correlation , which is applied to the logarithm of the spectral magnitude .

US20050065781A1
CLAIM 12
. The method according to claim 10 wherein the extracted signal (background noise signal, noise ratio, term signal, noise estimator, consecutive minima, two consecutive minima, correlation value) is multiplied by a complex-valued envelope to adapt the phase with an optimization method .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator (extracted signal, local maxima) for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector .
US20050065781A1
CLAIM 7
. The method according to claim 3 , wherein the separation of the frequency streams is achieved by searching for time-coherent local maxima (background noise signal, noise ratio, term signal, noise estimator, consecutive minima, two consecutive minima, correlation value) and calculation of the pitch data as a time series .

US20050065781A1
CLAIM 12
. The method according to claim 10 wherein the extracted signal (background noise signal, noise ratio, term signal, noise estimator, consecutive minima, two consecutive minima, correlation value) is multiplied by a complex-valued envelope to adapt the phase with an optimization method .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal (frequency analysis) from a background noise signal (extracted signal, local maxima) and preventing update of noise energy estimates .
US20050065781A1
CLAIM 7
. The method according to claim 3 , wherein the separation of the frequency streams is achieved by searching for time-coherent local maxima (background noise signal, noise ratio, term signal, noise estimator, consecutive minima, two consecutive minima, correlation value) and calculation of the pitch data as a time series .

US20050065781A1
CLAIM 12
. The method according to claim 10 wherein the extracted signal (background noise signal, noise ratio, term signal, noise estimator, consecutive minima, two consecutive minima, correlation value) is multiplied by a complex-valued envelope to adapt the phase with an optimization method .

US20050065781A1
CLAIM 18
. The method according to claim 17 , wherein several bands with frequency-localized noise are used for modeling , the bands being added according to a frequency analysis (music signal) with a time-dependent weighting .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20040133424A1

Filed: 2003-10-22     Issued: 2004-07-08

Processing speech signals

(Original Assignee) Motorola Solutions Inc     (Current Assignee) Motorola Solutions Inc

Douglas Ealey, Holly Louise Kelleher, David Pearce
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (frequency spectrum) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (following group) , the correlation map of a current frame , and an initial value of the long term correlation map .
US20040133424A1
CLAIM 1
. A method of processing a speech signal in noise , comprising : determining a frequency spectrum (frequency spectrum) of a frame of the speech signal ;
determining a value of the pitch of the frame of the speech signal ;
characterised by : identifying peaks (12 , 14 , 16 , 22 , 28 , 32) in the spectrum ;
and evaluating the peaks (12 , 14 , 16 , 22 , 28 , 32) individually to determine respective scores for the peaks (12 , 14 , 16 , 22 , 28 , 32) , the score for a peak (12 , 14 , 16 , 22 , 28 , 32) being a measure of the likelihood that the peak (12 , 14 , 16 , 22 , 28 , 32) is a harmonic band of the speech signal .

US20040133424A1
CLAIM 23
. A method according to any preceding claim , further comprising using the resulting harmonic band data in at least one of the following group (update factor) of processes : (i) automatic speech recognition ;
(ii) front-end processing in distributed automatic speech recognition ;
(iii) speech enhancement ;
(iv) echo cancellation ;
(v) speech coding .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (frequency spectrum) of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US20040133424A1
CLAIM 1
. A method of processing a speech signal in noise , comprising : determining a frequency spectrum (frequency spectrum) of a frame of the speech signal ;
determining a value of the pitch of the frame of the speech signal ;
characterised by : identifying peaks (12 , 14 , 16 , 22 , 28 , 32) in the spectrum ;
and evaluating the peaks (12 , 14 , 16 , 22 , 28 , 32) individually to determine respective scores for the peaks (12 , 14 , 16 , 22 , 28 , 32) , the score for a peak (12 , 14 , 16 , 22 , 28 , 32) being a measure of the likelihood that the peak (12 , 14 , 16 , 22 , 28 , 32) is a harmonic band of the speech signal .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (frequency bins) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US20040133424A1
CLAIM 4
. A method according to claim 3 , wherein the evaluating step comprises : selecting a first peak (22) at a first frequency position (24) ;
calculating a first calculated frequency position (26) separated from the first frequency position in frequency by the pitch value ;
identifying any second peak (28) within a given number of frequency bins (frequency bins) of the first calculated frequency position (26) ;
and allocating a score to the first peak (22) dependent upon the relative frequency position of the second peak (28) compared to the first calculated frequency position (26) .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (frequency bins) so as to produce a summed long-term correlation map .
US20040133424A1
CLAIM 4
. A method according to claim 3 , wherein the evaluating step comprises : selecting a first peak (22) at a first frequency position (24) ;
calculating a first calculated frequency position (26) separated from the first frequency position in frequency by the pitch value ;
identifying any second peak (28) within a given number of frequency bins (frequency bins) of the first calculated frequency position (26) ;
and allocating a score to the first peak (22) dependent upon the relative frequency position of the second peak (28) compared to the first calculated frequency position (26) .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map for frequency bins (frequency bins) having a magnitude that exceeds a given fixed threshold .
US20040133424A1
CLAIM 4
. A method according to claim 3 , wherein the evaluating step comprises : selecting a first peak (22) at a first frequency position (24) ;
calculating a first calculated frequency position (26) separated from the first frequency position in frequency by the pitch value ;
identifying any second peak (28) within a given number of frequency bins (frequency bins) of the first calculated frequency position (26) ;
and allocating a score to the first peak (22) dependent upon the relative frequency position of the second peak (28) compared to the first calculated frequency position (26) .

US8990073B2
CLAIM 10
. A method for detecting sound activity (frequency value) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US20040133424A1
CLAIM 7
. A method according to claim 6 , wherein the given number of frequency bins from the first and second calculated frequency positions within which any second or third peak is identified is ± one frequency bin , where + represents increasing/decreasing frequency value (detecting sound activity) , such that the second or third peak may be either (i) one bin higher , (ii) at the correct bin or (iii) one bin lower than the respective calculated frequency position , and (iv) if no peaks are identified within ± one frequency bin then there is respectively no identified second or third peak ;
and the score is allocated as follows in terms of the second and third peaks : if both the peaks are at the correct bin , the score is ‘6’ ;
if one of the peaks is at the correct bin and the other peak is one bin higher or one bin lower , the score is ‘5’ ;
if both peaks are one bin higher or both peaks are one bin lower , the score is ‘4’ ;
if one peak is one bin higher and the other peak is one bin lower , the score is ‘3’ ;
if one peak is correct and there is no other peak identified , the score is ‘2’ ;
if one peak is one bin higher or one bin lower , and there is no other peak identified , the score is ‘1’ ;
and if neither peak is identified , the score is ‘0’ .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity in the sound signal further comprises using a signal-to-noise ratio (SNR)-based sound activity detection (speech detector) .
US20040133424A1
CLAIM 19
. A method according to claim 18 , further comprising using a separate speech/non-speech detector (sound activity detection) to estimate whether the frame is speech or non-speech , and wherein the threshold value is varied according to whether the estimate is speech or non-speech .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection (speech detector) comprises detecting the sound signal based on a frequency dependent signal-to-noise ratio (SNR) .
US20040133424A1
CLAIM 19
. A method according to claim 18 , further comprising using a separate speech/non-speech detector (sound activity detection) to estimate whether the frame is speech or non-speech , and wherein the threshold value is varied according to whether the estimate is speech or non-speech .

US8990073B2
CLAIM 14
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection (speech detector) comprises comparing an average signal-to-noise ratio (SNR av ) to a threshold calculated as a function of a long-term signal-to-noise ratio (SNR LT ) .
US20040133424A1
CLAIM 19
. A method according to claim 18 , further comprising using a separate speech/non-speech detector (sound activity detection) to estimate whether the frame is speech or non-speech , and wherein the threshold value is varied according to whether the estimate is speech or non-speech .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection (speech detector) in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
US20040133424A1
CLAIM 19
. A method according to claim 18 , further comprising using a separate speech/non-speech detector (sound activity detection) to estimate whether the frame is speech or non-speech , and wherein the threshold value is varied according to whether the estimate is speech or non-speech .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection (speech detector) further comprises updating the noise estimates for a next frame .
US20040133424A1
CLAIM 19
. A method according to claim 18 , further comprising using a separate speech/non-speech detector (sound activity detection) to estimate whether the frame is speech or non-speech , and wherein the threshold value is varied according to whether the estimate is speech or non-speech .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (speech signal) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
US20040133424A1
CLAIM 1
. A method of processing a speech signal (noise character parameter, activity prediction parameter) in noise , comprising : determining a frequency spectrum of a frame of the speech signal ;
determining a value of the pitch of the frame of the speech signal ;
characterised by : identifying peaks (12 , 14 , 16 , 22 , 28 , 32) in the spectrum ;
and evaluating the peaks (12 , 14 , 16 , 22 , 28 , 32) individually to determine respective scores for the peaks (12 , 14 , 16 , 22 , 28 , 32) , the score for a peak (12 , 14 , 16 , 22 , 28 , 32) being a measure of the likelihood that the peak (12 , 14 , 16 , 22 , 28 , 32) is a harmonic band of the speech signal .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame and an energy of the sound signal in a previous frame , for frequency bands (band data) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US20040133424A1
CLAIM 23
. A method according to any preceding claim , further comprising using the resulting harmonic band data (frequency bands) in at least one of the following group of processes : (i) automatic speech recognition ;
(ii) front-end processing in distributed automatic speech recognition ;
(iii) speech enhancement ;
(iv) echo cancellation ;
(v) speech coding .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (speech signal) indicative of an activity of the sound signal .
US20040133424A1
CLAIM 1
. A method of processing a speech signal (noise character parameter, activity prediction parameter) in noise , comprising : determining a frequency spectrum of a frame of the speech signal ;
determining a value of the pitch of the frame of the speech signal ;
characterised by : identifying peaks (12 , 14 , 16 , 22 , 28 , 32) in the spectrum ;
and evaluating the peaks (12 , 14 , 16 , 22 , 28 , 32) individually to determine respective scores for the peaks (12 , 14 , 16 , 22 , 28 , 32) , the score for a peak (12 , 14 , 16 , 22 , 28 , 32) being a measure of the likelihood that the peak (12 , 14 , 16 , 22 , 28 , 32) is a harmonic band of the speech signal .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (speech signal) comprises : calculating a long-term value of a binary decision (significant speech) obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
US20040133424A1
CLAIM 1
. A method of processing a speech signal (noise character parameter, activity prediction parameter) in noise , comprising : determining a frequency spectrum of a frame of the speech signal ;
determining a value of the pitch of the frame of the speech signal ;
characterised by : identifying peaks (12 , 14 , 16 , 22 , 28 , 32) in the spectrum ;
and evaluating the peaks (12 , 14 , 16 , 22 , 28 , 32) individually to determine respective scores for the peaks (12 , 14 , 16 , 22 , 28 , 32) , the score for a peak (12 , 14 , 16 , 22 , 28 , 32) being a measure of the likelihood that the peak (12 , 14 , 16 , 22 , 28 , 32) is a harmonic band of the speech signal .

US20040133424A1
CLAIM 22
. A method according to any preceding claim , wherein the step of identifying peaks in the spectrum comprises differentiating the frequency spectrum with respect to frequency using two scales , the first scale being over a higher number of frequency bins than the second scale , and weighting the results from the two scales such that the differentiation using the first scale identifies significant speech (binary decision) peaks and the differentiation using the second scale improves the precision of the calculation of the frequency position of the identified peak .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (speech signal) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US20040133424A1
CLAIM 1
. A method of processing a speech signal (noise character parameter, activity prediction parameter) in noise , comprising : determining a frequency spectrum of a frame of the speech signal ;
determining a value of the pitch of the frame of the speech signal ;
characterised by : identifying peaks (12 , 14 , 16 , 22 , 28 , 32) in the spectrum ;
and evaluating the peaks (12 , 14 , 16 , 22 , 28 , 32) individually to determine respective scores for the peaks (12 , 14 , 16 , 22 , 28 , 32) , the score for a peak (12 , 14 , 16 , 22 , 28 , 32) being a measure of the likelihood that the peak (12 , 14 , 16 , 22 , 28 , 32) is a harmonic band of the speech signal .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (speech signal) comprises : dividing a plurality of frequency bands (band data) into a first group of a certain number of first frequency (first frequency) bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values (given number) so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20040133424A1
CLAIM 1
. A method of processing a speech signal (noise character parameter, activity prediction parameter) in noise , comprising : determining a frequency spectrum of a frame of the speech signal ;
determining a value of the pitch of the frame of the speech signal ;
characterised by : identifying peaks (12 , 14 , 16 , 22 , 28 , 32) in the spectrum ;
and evaluating the peaks (12 , 14 , 16 , 22 , 28 , 32) individually to determine respective scores for the peaks (12 , 14 , 16 , 22 , 28 , 32) , the score for a peak (12 , 14 , 16 , 22 , 28 , 32) being a measure of the likelihood that the peak (12 , 14 , 16 , 22 , 28 , 32) is a harmonic band of the speech signal .

US20040133424A1
CLAIM 4
. A method according to claim 3 , wherein the evaluating step comprises : selecting a first peak (22) at a first frequency (first frequency) position (24) ;
calculating a first calculated frequency position (26) separated from the first frequency position in frequency by the pitch value ;
identifying any second peak (28) within a given number (second energy values) of frequency bins of the first calculated frequency position (26) ;
and allocating a score to the first peak (22) dependent upon the relative frequency position of the second peak (28) compared to the first calculated frequency position (26) .

US20040133424A1
CLAIM 23
. A method according to any preceding claim , further comprising using the resulting harmonic band data (frequency bands) in at least one of the following group of processes : (i) automatic speech recognition ;
(ii) front-end processing in distributed automatic speech recognition ;
(iii) speech enhancement ;
(iv) echo cancellation ;
(v) speech coding .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (speech signal) inferior than a given fixed threshold .
US20040133424A1
CLAIM 1
. A method of processing a speech signal (noise character parameter, activity prediction parameter) in noise , comprising : determining a frequency spectrum of a frame of the speech signal ;
determining a value of the pitch of the frame of the speech signal ;
characterised by : identifying peaks (12 , 14 , 16 , 22 , 28 , 32) in the spectrum ;
and evaluating the peaks (12 , 14 , 16 , 22 , 28 , 32) individually to determine respective scores for the peaks (12 , 14 , 16 , 22 , 28 , 32) , the score for a peak (12 , 14 , 16 , 22 , 28 , 32) being a measure of the likelihood that the peak (12 , 14 , 16 , 22 , 28 , 32) is a harmonic band of the speech signal .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (frequency spectrum) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (following group) , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20040133424A1
CLAIM 1
. A method of processing a speech signal in noise , comprising : determining a frequency spectrum (frequency spectrum) of a frame of the speech signal ;
determining a value of the pitch of the frame of the speech signal ;
characterised by : identifying peaks (12 , 14 , 16 , 22 , 28 , 32) in the spectrum ;
and evaluating the peaks (12 , 14 , 16 , 22 , 28 , 32) individually to determine respective scores for the peaks (12 , 14 , 16 , 22 , 28 , 32) , the score for a peak (12 , 14 , 16 , 22 , 28 , 32) being a measure of the likelihood that the peak (12 , 14 , 16 , 22 , 28 , 32) is a harmonic band of the speech signal .

US20040133424A1
CLAIM 23
. A method according to any preceding claim , further comprising using the resulting harmonic band data in at least one of the following group (update factor) of processes : (i) automatic speech recognition ;
(ii) front-end processing in distributed automatic speech recognition ;
(iii) speech enhancement ;
(iv) echo cancellation ;
(v) speech coding .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (frequency spectrum) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (following group) , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20040133424A1
CLAIM 1
. A method of processing a speech signal in noise , comprising : determining a frequency spectrum (frequency spectrum) of a frame of the speech signal ;
determining a value of the pitch of the frame of the speech signal ;
characterised by : identifying peaks (12 , 14 , 16 , 22 , 28 , 32) in the spectrum ;
and evaluating the peaks (12 , 14 , 16 , 22 , 28 , 32) individually to determine respective scores for the peaks (12 , 14 , 16 , 22 , 28 , 32) , the score for a peak (12 , 14 , 16 , 22 , 28 , 32) being a measure of the likelihood that the peak (12 , 14 , 16 , 22 , 28 , 32) is a harmonic band of the speech signal .

US20040133424A1
CLAIM 23
. A method according to any preceding claim , further comprising using the resulting harmonic band data in at least one of the following group (update factor) of processes : (i) automatic speech recognition ;
(ii) front-end processing in distributed automatic speech recognition ;
(iii) speech enhancement ;
(iv) echo cancellation ;
(v) speech coding .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (frequency spectrum) of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US20040133424A1
CLAIM 1
. A method of processing a speech signal in noise , comprising : determining a frequency spectrum (frequency spectrum) of a frame of the speech signal ;
determining a value of the pitch of the frame of the speech signal ;
characterised by : identifying peaks (12 , 14 , 16 , 22 , 28 , 32) in the spectrum ;
and evaluating the peaks (12 , 14 , 16 , 22 , 28 , 32) individually to determine respective scores for the peaks (12 , 14 , 16 , 22 , 28 , 32) , the score for a peak (12 , 14 , 16 , 22 , 28 , 32) being a measure of the likelihood that the peak (12 , 14 , 16 , 22 , 28 , 32) is a harmonic band of the speech signal .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (frequency bins) so as to produce a summed long-term correlation map .
US20040133424A1
CLAIM 4
. A method according to claim 3 , wherein the evaluating step comprises : selecting a first peak (22) at a first frequency position (24) ;
calculating a first calculated frequency position (26) separated from the first frequency position in frequency by the pitch value ;
identifying any second peak (28) within a given number of frequency bins (frequency bins) of the first calculated frequency position (26) ;
and allocating a score to the first peak (22) dependent upon the relative frequency position of the second peak (28) compared to the first calculated frequency position (26) .

US8990073B2
CLAIM 35
. A device for detecting sound activity (frequency value) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US20040133424A1
CLAIM 7
. A method according to claim 6 , wherein the given number of frequency bins from the first and second calculated frequency positions within which any second or third peak is identified is ± one frequency bin , where + represents increasing/decreasing frequency value (detecting sound activity) , such that the second or third peak may be either (i) one bin higher , (ii) at the correct bin or (iii) one bin lower than the respective calculated frequency position , and (iv) if no peaks are identified within ± one frequency bin then there is respectively no identified second or third peak ;
and the score is allocated as follows in terms of the second and third peaks : if both the peaks are at the correct bin , the score is ‘6’ ;
if one of the peaks is at the correct bin and the other peak is one bin higher or one bin lower , the score is ‘5’ ;
if both peaks are one bin higher or both peaks are one bin lower , the score is ‘4’ ;
if one peak is one bin higher and the other peak is one bin lower , the score is ‘3’ ;
if one peak is correct and there is no other peak identified , the score is ‘2’ ;
if one peak is one bin higher or one bin lower , and there is no other peak identified , the score is ‘1’ ;
and if neither peak is identified , the score is ‘0’ .

US8990073B2
CLAIM 36
. A device for detecting sound activity (frequency value) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US20040133424A1
CLAIM 7
. A method according to claim 6 , wherein the given number of frequency bins from the first and second calculated frequency positions within which any second or third peak is identified is ± one frequency bin , where + represents increasing/decreasing frequency value (detecting sound activity) , such that the second or third peak may be either (i) one bin higher , (ii) at the correct bin or (iii) one bin lower than the respective calculated frequency position , and (iv) if no peaks are identified within ± one frequency bin then there is respectively no identified second or third peak ;
and the score is allocated as follows in terms of the second and third peaks : if both the peaks are at the correct bin , the score is ‘6’ ;
if one of the peaks is at the correct bin and the other peak is one bin higher or one bin lower , the score is ‘5’ ;
if both peaks are one bin higher or both peaks are one bin lower , the score is ‘4’ ;
if one peak is one bin higher and the other peak is one bin lower , the score is ‘3’ ;
if one peak is correct and there is no other peak identified , the score is ‘2’ ;
if one peak is one bin higher or one bin lower , and there is no other peak identified , the score is ‘1’ ;
and if neither peak is identified , the score is ‘0’ .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal (respective peak) to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US20040133424A1
CLAIM 12
. A method according to any of claims 8 to 11 , wherein the given number of frequency bins which the respective peak (average signal) s are required to be within the respective calculated frequency position is ± one frequency bin , where ± represents increasing/decreasing frequency value , such that the respective peak may be either at the respective calculated frequency position in which case the peak is allocated a relatively higher score or ± one frequency bin of the respective calculated frequency position in which case the peak is allocated a relatively lower score .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6988064B2

Filed: 2003-03-31     Issued: 2006-01-17

System and method for combined frequency-domain and time-domain pitch extraction for speech signals

(Original Assignee) International Business Machines Corp; Motorola Solutions Inc     (Current Assignee) International Business Machines Corp ; Google Technology Holdings LLC

Tenkasi V. Ramabadran, Alexander Sorin
US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value (correlation value) with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US6988064B2
CLAIM 1
. A method comprising : sampling a speech signal ;
dividing the sampled speech signal into overlapping frames ;
extracting first pitch information from a frame using frequency domain analysis ;
providing at least one pitch candidate , each being coupled with a spectral score , from the first pitch information , each of the at least one pitch candidate representing a possible pitch estimate for the frame ;
determining second pitch information for the frame by calculating time domain correlation value (correlation value) s at lag values selected based upon each of the at least one pitch candidate ;
providing a correlation score for each of the at least one pitch candidate within the second pitch information ;
and selecting one of the at least one pitch candidate as a pitch estimate of the frame .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (previous frame) indicative of an activity of the sound signal .
US6988064B2
CLAIM 3
. The method of claim 2 , wherein the selecting comprises : computing a corresponding match measure for each of the at least one of pitch candidate and a selected pitch estimate for a previous frame (activity prediction parameter) ;
and ;
selecting the pitch estimate as the at least one pitch candidate that is associated with the best combination of spectral score , correlation score and match measure , thereby indicating the one pitch candidate with the best probability of matching the pitch of the frame .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (previous frame) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
US6988064B2
CLAIM 3
. The method of claim 2 , wherein the selecting comprises : computing a corresponding match measure for each of the at least one of pitch candidate and a selected pitch estimate for a previous frame (activity prediction parameter) ;
and ;
selecting the pitch estimate as the at least one pitch candidate that is associated with the best combination of spectral score , correlation score and match measure , thereby indicating the one pitch candidate with the best probability of matching the pitch of the frame .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (previous frame) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US6988064B2
CLAIM 3
. The method of claim 2 , wherein the selecting comprises : computing a corresponding match measure for each of the at least one of pitch candidate and a selected pitch estimate for a previous frame (activity prediction parameter) ;
and ;
selecting the pitch estimate as the at least one pitch candidate that is associated with the best combination of spectral score , correlation score and match measure , thereby indicating the one pitch candidate with the best probability of matching the pitch of the frame .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20040181393A1

Filed: 2003-03-14     Issued: 2004-09-16

Tonal analysis for perceptual audio coding using a compressed spectral representation

(Original Assignee) Agere Systems LLC     (Current Assignee) MUCH SHELIST FREED DENENBERG ARNENT & RUBENSTEIN PC ; Avago Technologies International Sales Pte Ltd

Frank Baumgarte
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (domain representation) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US20040181393A1
CLAIM 13
. The method of claim 1 , wherein step (b) further comprises : performing a first frequency transformation of the sampled input audio signal into a frequency domain representation (frequency spectrum) ;
applying a logarithmic operation to the frequency domain representation to form a logarithmic representation ;
and performing a second frequency transformation of the logarithmic representation to form the compressed spectral representation .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (domain representation) of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US20040181393A1
CLAIM 13
. The method of claim 1 , wherein step (b) further comprises : performing a first frequency transformation of the sampled input audio signal into a frequency domain representation (frequency spectrum) ;
applying a logarithmic operation to the frequency domain representation to form a logarithmic representation ;
and performing a second frequency transformation of the logarithmic representation to form the compressed spectral representation .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error energies (inverse discrete cosine) .
US20040181393A1
CLAIM 16
. The method of claim 13 , wherein the second frequency transformation is an inverse Fourier transformation , an inverse Fast Fourier Transformation (FFT) , an inverse discrete cosine (linear prediction residual error energies) transformation , or an inverse z-transformation .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency (first frequency) bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20040181393A1
CLAIM 13
. The method of claim 1 , wherein step (b) further comprises : performing a first frequency (first frequency) transformation of the sampled input audio signal into a frequency domain representation ;
applying a logarithmic operation to the frequency domain representation to form a logarithmic representation ;
and performing a second frequency transformation of the logarithmic representation to form the compressed spectral representation .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (domain representation) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20040181393A1
CLAIM 13
. The method of claim 1 , wherein step (b) further comprises : performing a first frequency transformation of the sampled input audio signal into a frequency domain representation (frequency spectrum) ;
applying a logarithmic operation to the frequency domain representation to form a logarithmic representation ;
and performing a second frequency transformation of the logarithmic representation to form the compressed spectral representation .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (domain representation) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20040181393A1
CLAIM 13
. The method of claim 1 , wherein step (b) further comprises : performing a first frequency transformation of the sampled input audio signal into a frequency domain representation (frequency spectrum) ;
applying a logarithmic operation to the frequency domain representation to form a logarithmic representation ;
and performing a second frequency transformation of the logarithmic representation to form the compressed spectral representation .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (domain representation) of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US20040181393A1
CLAIM 13
. The method of claim 1 , wherein step (b) further comprises : performing a first frequency transformation of the sampled input audio signal into a frequency domain representation (frequency spectrum) ;
applying a logarithmic operation to the frequency domain representation to form a logarithmic representation ;
and performing a second frequency transformation of the logarithmic representation to form the compressed spectral representation .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US7124075B2

Filed: 2002-05-07     Issued: 2006-10-17

Methods and apparatus for pitch determination

(Original Assignee) Dmitry Edward Terez     

Dmitry Edward Terez
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum (sample values) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US7124075B2
CLAIM 5
. The method of claim 4 , further comprising normalizing sample values (current residual spectrum) to a predetermined range of values prior to performing said time-delay embedding .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum (sample values) comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US7124075B2
CLAIM 5
. The method of claim 4 , further comprising normalizing sample values (current residual spectrum) to a predetermined range of values prior to performing said time-delay embedding .

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum (sample values) comprises locating a maximum between each pair of two consecutive minima (singular value decomposition) of the current residual spectrum .
US7124075B2
CLAIM 5
. The method of claim 4 , further comprising normalizing sample values (current residual spectrum) to a predetermined range of values prior to performing said time-delay embedding .

US7124075B2
CLAIM 9
. The method of claim 1 , wherein said embedding is singular value decomposition (consecutive minima, two consecutive minima) embedding .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum (sample values) , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (said subset) between two consecutive minima (singular value decomposition) in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US7124075B2
CLAIM 5
. The method of claim 4 , further comprising normalizing sample values (current residual spectrum) to a predetermined range of values prior to performing said time-delay embedding .

US7124075B2
CLAIM 9
. The method of claim 1 , wherein said embedding is singular value decomposition (consecutive minima, two consecutive minima) embedding .

US7124075B2
CLAIM 12
. The method of claim 1 , wherein said plurality of possible pairs of m-dimensional vectors is a sub-set of all possible non-repeating combinations of two vectors from said sequence of m-dimensional vectors , wherein said subset (frequency bins) is generated by : selecting a subsequence of vectors from said sequence of m-dimensional vectors , said subsequence including a predetermined number of vectors less than the number of vectors in said sequence of m-dimensional vectors ;
shifting said subsequence relative to said sequence of m-dimensional vectors by each of a plurality of possible time separation values ;
and matching vectors in said shifted subsequence with vectors in said sequence of m-dimensional vectors to form pairs of m-dimensional vectors , one element of each pair being from the shifted subsequence and one element being from said sequence of m-dimensional vectors .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (said subset) so as to produce a summed long-term correlation map .
US7124075B2
CLAIM 12
. The method of claim 1 , wherein said plurality of possible pairs of m-dimensional vectors is a sub-set of all possible non-repeating combinations of two vectors from said sequence of m-dimensional vectors , wherein said subset (frequency bins) is generated by : selecting a subsequence of vectors from said sequence of m-dimensional vectors , said subsequence including a predetermined number of vectors less than the number of vectors in said sequence of m-dimensional vectors ;
shifting said subsequence relative to said sequence of m-dimensional vectors by each of a plurality of possible time separation values ;
and matching vectors in said shifted subsequence with vectors in said sequence of m-dimensional vectors to form pairs of m-dimensional vectors , one element of each pair being from the shifted subsequence and one element being from said sequence of m-dimensional vectors .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map for frequency bins (said subset) having a magnitude that exceeds a given fixed threshold .
US7124075B2
CLAIM 12
. The method of claim 1 , wherein said plurality of possible pairs of m-dimensional vectors is a sub-set of all possible non-repeating combinations of two vectors from said sequence of m-dimensional vectors , wherein said subset (frequency bins) is generated by : selecting a subsequence of vectors from said sequence of m-dimensional vectors , said subsequence including a predetermined number of vectors less than the number of vectors in said sequence of m-dimensional vectors ;
shifting said subsequence relative to said sequence of m-dimensional vectors by each of a plurality of possible time separation values ;
and matching vectors in said shifted subsequence with vectors in said sequence of m-dimensional vectors to form pairs of m-dimensional vectors , one element of each pair being from the shifted subsequence and one element being from said sequence of m-dimensional vectors .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (predetermined threshold value) indicative of sound activity in the sound signal .
US7124075B2
CLAIM 25
. The method of claim 1 , wherein said step of locating at least a highest peak further comprises : locating all peaks exceeding a predetermined threshold value (adaptive threshold) .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (ordered set) .
US7124075B2
CLAIM 20
. The method of claim 19 , wherein said step of selecting a predetermined number of vector pairs further comprises : computing a distance between m-dimensional vectors for each pair of vectors in the plurality of possible pairs of m-dimensional vectors ;
ordering the pairs as a function of the computed distances to form an ordered set (SNR calculation) ;
and selecting the predetermined number of vector pairs from the ordered set .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection further comprises updating the noise estimates for a next frame (selected pairs) .
US7124075B2
CLAIM 51
. A method for determining a fundamental period of a portion of a signal , comprising the steps of : forming m-dimensional vectors x(i) from a sequence of signal samples , where i is an integer index ;
selecting pairs of vectors {x(i) , x(i+k)} with smallest distances D[x(i) , x(i+k)] between vectors from a plurality of possible pairs of said m-dimensional vectors , where k is an integer time separation value ;
computing a histogram of the distribution of the time separation values k for the selected pairs (next frame) of vectors ;
and searching said histogram for at least one peak to determine the fundamental period of said portion of said signal .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame (selected pairs) comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
US7124075B2
CLAIM 51
. A method for determining a fundamental period of a portion of a signal , comprising the steps of : forming m-dimensional vectors x(i) from a sequence of signal samples , where i is an integer index ;
selecting pairs of vectors {x(i) , x(i+k)} with smallest distances D[x(i) , x(i+k)] between vectors from a plurality of possible pairs of said m-dimensional vectors , where k is an integer time separation value ;
computing a histogram of the distribution of the time separation values k for the selected pairs (next frame) of vectors ;
and searching said histogram for at least one peak to determine the fundamental period of said portion of said signal .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character (Euclidean distance) parameter (linear transformation, speech signal, said time) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
US7124075B2
CLAIM 1
. A method for determining the pitch of a sampled digitized speech signal (noise character parameter, activity prediction parameter) , comprising the steps of : embedding a portion of the sampled digitized speech signal into an m-dimensional state space to obtain a sequence of m-dimensional vectors ;
selecting closest pairs of vectors in state space from a plurality of possible pairs of m-dimensional vectors in said sequence of m-dimensional vectors ;
accumulating a total number of the selected closest pairs of vectors for each of a plurality of time separation values to produce a histogram of accumulated numbers ;
and locating at least a highest peak in a portion of said histogram to obtain a pitch period value for said portion of the sampled digitized speech signal .

US7124075B2
CLAIM 5
. The method of claim 4 , further comprising normalizing sample values to a predetermined range of values prior to performing said time (noise character parameter, activity prediction parameter) -delay embedding .

US7124075B2
CLAIM 14
. The method of claim 1 , wherein said sequence of m-dimensional vectors defines a trajectory in m-dimensional state space , the method further comprising the step of : performing a linear transformation (noise character parameter, activity prediction parameter) on each dimension of said trajectory to scale said trajectory to a predetermined size prior to performing said selecting step .

US7124075B2
CLAIM 17
. The method of claim 13 , wherein said distance between vectors is one of a Euclidean distance (noise character) and a squared Euclidean distance in m-dimensional space .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (linear transformation, speech signal, said time) indicative of an activity of the sound signal .
US7124075B2
CLAIM 1
. A method for determining the pitch of a sampled digitized speech signal (noise character parameter, activity prediction parameter) , comprising the steps of : embedding a portion of the sampled digitized speech signal into an m-dimensional state space to obtain a sequence of m-dimensional vectors ;
selecting closest pairs of vectors in state space from a plurality of possible pairs of m-dimensional vectors in said sequence of m-dimensional vectors ;
accumulating a total number of the selected closest pairs of vectors for each of a plurality of time separation values to produce a histogram of accumulated numbers ;
and locating at least a highest peak in a portion of said histogram to obtain a pitch period value for said portion of the sampled digitized speech signal .

US7124075B2
CLAIM 5
. The method of claim 4 , further comprising normalizing sample values to a predetermined range of values prior to performing said time (noise character parameter, activity prediction parameter) -delay embedding .

US7124075B2
CLAIM 14
. The method of claim 1 , wherein said sequence of m-dimensional vectors defines a trajectory in m-dimensional state space , the method further comprising the step of : performing a linear transformation (noise character parameter, activity prediction parameter) on each dimension of said trajectory to scale said trajectory to a predetermined size prior to performing said selecting step .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (linear transformation, speech signal, said time) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
US7124075B2
CLAIM 1
. A method for determining the pitch of a sampled digitized speech signal (noise character parameter, activity prediction parameter) , comprising the steps of : embedding a portion of the sampled digitized speech signal into an m-dimensional state space to obtain a sequence of m-dimensional vectors ;
selecting closest pairs of vectors in state space from a plurality of possible pairs of m-dimensional vectors in said sequence of m-dimensional vectors ;
accumulating a total number of the selected closest pairs of vectors for each of a plurality of time separation values to produce a histogram of accumulated numbers ;
and locating at least a highest peak in a portion of said histogram to obtain a pitch period value for said portion of the sampled digitized speech signal .

US7124075B2
CLAIM 5
. The method of claim 4 , further comprising normalizing sample values to a predetermined range of values prior to performing said time (noise character parameter, activity prediction parameter) -delay embedding .

US7124075B2
CLAIM 14
. The method of claim 1 , wherein said sequence of m-dimensional vectors defines a trajectory in m-dimensional state space , the method further comprising the step of : performing a linear transformation (noise character parameter, activity prediction parameter) on each dimension of said trajectory to scale said trajectory to a predetermined size prior to performing said selecting step .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (linear transformation, speech signal, said time) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US7124075B2
CLAIM 1
. A method for determining the pitch of a sampled digitized speech signal (noise character parameter, activity prediction parameter) , comprising the steps of : embedding a portion of the sampled digitized speech signal into an m-dimensional state space to obtain a sequence of m-dimensional vectors ;
selecting closest pairs of vectors in state space from a plurality of possible pairs of m-dimensional vectors in said sequence of m-dimensional vectors ;
accumulating a total number of the selected closest pairs of vectors for each of a plurality of time separation values to produce a histogram of accumulated numbers ;
and locating at least a highest peak in a portion of said histogram to obtain a pitch period value for said portion of the sampled digitized speech signal .

US7124075B2
CLAIM 5
. The method of claim 4 , further comprising normalizing sample values to a predetermined range of values prior to performing said time (noise character parameter, activity prediction parameter) -delay embedding .

US7124075B2
CLAIM 14
. The method of claim 1 , wherein said sequence of m-dimensional vectors defines a trajectory in m-dimensional state space , the method further comprising the step of : performing a linear transformation (noise character parameter, activity prediction parameter) on each dimension of said trajectory to scale said trajectory to a predetermined size prior to performing said selecting step .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character (Euclidean distance) parameter (linear transformation, speech signal, said time) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US7124075B2
CLAIM 1
. A method for determining the pitch of a sampled digitized speech signal (noise character parameter, activity prediction parameter) , comprising the steps of : embedding a portion of the sampled digitized speech signal into an m-dimensional state space to obtain a sequence of m-dimensional vectors ;
selecting closest pairs of vectors in state space from a plurality of possible pairs of m-dimensional vectors in said sequence of m-dimensional vectors ;
accumulating a total number of the selected closest pairs of vectors for each of a plurality of time separation values to produce a histogram of accumulated numbers ;
and locating at least a highest peak in a portion of said histogram to obtain a pitch period value for said portion of the sampled digitized speech signal .

US7124075B2
CLAIM 5
. The method of claim 4 , further comprising normalizing sample values to a predetermined range of values prior to performing said time (noise character parameter, activity prediction parameter) -delay embedding .

US7124075B2
CLAIM 14
. The method of claim 1 , wherein said sequence of m-dimensional vectors defines a trajectory in m-dimensional state space , the method further comprising the step of : performing a linear transformation (noise character parameter, activity prediction parameter) on each dimension of said trajectory to scale said trajectory to a predetermined size prior to performing said selecting step .

US7124075B2
CLAIM 17
. The method of claim 13 , wherein said distance between vectors is one of a Euclidean distance (noise character) and a squared Euclidean distance in m-dimensional space .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character (Euclidean distance) parameter (linear transformation, speech signal, said time) inferior than a given fixed threshold .
US7124075B2
CLAIM 1
. A method for determining the pitch of a sampled digitized speech signal (noise character parameter, activity prediction parameter) , comprising the steps of : embedding a portion of the sampled digitized speech signal into an m-dimensional state space to obtain a sequence of m-dimensional vectors ;
selecting closest pairs of vectors in state space from a plurality of possible pairs of m-dimensional vectors in said sequence of m-dimensional vectors ;
accumulating a total number of the selected closest pairs of vectors for each of a plurality of time separation values to produce a histogram of accumulated numbers ;
and locating at least a highest peak in a portion of said histogram to obtain a pitch period value for said portion of the sampled digitized speech signal .

US7124075B2
CLAIM 5
. The method of claim 4 , further comprising normalizing sample values to a predetermined range of values prior to performing said time (noise character parameter, activity prediction parameter) -delay embedding .

US7124075B2
CLAIM 14
. The method of claim 1 , wherein said sequence of m-dimensional vectors defines a trajectory in m-dimensional state space , the method further comprising the step of : performing a linear transformation (noise character parameter, activity prediction parameter) on each dimension of said trajectory to scale said trajectory to a predetermined size prior to performing said selecting step .

US7124075B2
CLAIM 17
. The method of claim 13 , wherein said distance between vectors is one of a Euclidean distance (noise character) and a squared Euclidean distance in m-dimensional space .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum (sample values) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US7124075B2
CLAIM 5
. The method of claim 4 , further comprising normalizing sample values (current residual spectrum) to a predetermined range of values prior to performing said time-delay embedding .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum (sample values) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US7124075B2
CLAIM 5
. The method of claim 4 , further comprising normalizing sample values (current residual spectrum) to a predetermined range of values prior to performing said time-delay embedding .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum (sample values) comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US7124075B2
CLAIM 5
. The method of claim 4 , further comprising normalizing sample values (current residual spectrum) to a predetermined range of values prior to performing said time-delay embedding .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (said subset) so as to produce a summed long-term correlation map .
US7124075B2
CLAIM 12
. The method of claim 1 , wherein said plurality of possible pairs of m-dimensional vectors is a sub-set of all possible non-repeating combinations of two vectors from said sequence of m-dimensional vectors , wherein said subset (frequency bins) is generated by : selecting a subsequence of vectors from said sequence of m-dimensional vectors , said subsequence including a predetermined number of vectors less than the number of vectors in said sequence of m-dimensional vectors ;
shifting said subsequence relative to said sequence of m-dimensional vectors by each of a plurality of possible time separation values ;
and matching vectors in said shifted subsequence with vectors in said sequence of m-dimensional vectors to form pairs of m-dimensional vectors , one element of each pair being from the shifted subsequence and one element being from said sequence of m-dimensional vectors .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character (Euclidean distance) of the sound signal for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates .
US7124075B2
CLAIM 17
. The method of claim 13 , wherein said distance between vectors is one of a Euclidean distance (noise character) and a squared Euclidean distance in m-dimensional space .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
CN1543639A

Filed: 2001-12-04     Issued: 2004-11-03

强壮语音分类方法和装置

(Original Assignee) 高通股份有限公司     

P・黄, P·黄
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (一种语) , the correlation map of a current frame , and an initial value of the long term correlation map .
CN1543639A
CLAIM 1
. 一种语 (update factor) 音分类方法,其特征在于包括:从外部组件将分类参数输入到语音分类器;在语音分类器内,从至少一个输入参数产生内部分类参数;设定标准化的自相关系数函数阀值并根据信号环境选择参数分析器;以及分析输入参数和内部参数以产生语音模式分类。

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update (经更新) of noise energy estimates when a tonal sound signal is detected .
CN1543639A
CLAIM 28
. 如权利要求27所述的方法,其特征在于,所述经更新 (update decision, preventing update) 的参数包括音调参数处标准化的自相关系数函数。

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision (经更新) based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error energies (音调信息) .
CN1543639A
CLAIM 7
. 如权利要求1所述的方法,其特征在于,所述输入参数包括音调信息 (linear prediction residual error energies) 处的标准化的自相关系数函数。

CN1543639A
CLAIM 28
. 如权利要求27所述的方法,其特征在于,所述经更新 (update decision, preventing update) 的参数包括音调参数处标准化的自相关系数函数。

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (至少一个参数) indicative of an activity of the sound signal .
CN1543639A
CLAIM 27
. 如权利要求1所述的方法,其特征在于还包括更新至少一个参数 (activity prediction parameter)

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (至少一个参数) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
CN1543639A
CLAIM 27
. 如权利要求1所述的方法,其特征在于还包括更新至少一个参数 (activity prediction parameter)

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (至少一个参数) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
CN1543639A
CLAIM 27
. 如权利要求1所述的方法,其特征在于还包括更新至少一个参数 (activity prediction parameter)

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (一种语) , the correlation map of a current frame , and an initial value of the long-term correlation map .
CN1543639A
CLAIM 1
. 一种语 (update factor) 音分类方法,其特征在于包括:从外部组件将分类参数输入到语音分类器;在语音分类器内,从至少一个输入参数产生内部分类参数;设定标准化的自相关系数函数阀值并根据信号环境选择参数分析器;以及分析输入参数和内部参数以产生语音模式分类。

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (一种语) , the correlation map of a current frame , and an initial value of the long-term correlation map .
CN1543639A
CLAIM 1
. 一种语 (update factor) 音分类方法,其特征在于包括:从外部组件将分类参数输入到语音分类器;在语音分类器内,从至少一个输入参数产生内部分类参数;设定标准化的自相关系数函数阀值并根据信号环境选择参数分析器;以及分析输入参数和内部参数以产生语音模式分类。

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal and preventing update (经更新) of noise energy estimates .
CN1543639A
CLAIM 28
. 如权利要求27所述的方法,其特征在于,所述经更新 (update decision, preventing update) 的参数包括音调参数处标准化的自相关系数函数。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
CN1447963A

Filed: 2001-08-17     Issued: 2003-10-08

语音编码中噪音鲁棒分类方法

(Original Assignee) 康奈克森特系统公司     

J·塞斯
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (一种语) , the correlation map of a current frame , and an initial value of the long term correlation map .
CN1447963A
CLAIM 19
. 一种语 (update factor) 音编码的方法,通过该方法提供一组均一参数用于对信号分类,该组参数不受背景噪音的影响。

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (一个阈值) indicative of sound activity in the sound signal .
CN1447963A
CLAIM 8
. 一种用于对语音分类的方法,包括以下步骤:(a)在处理单元接收一语音相关信号;(b)提供至少一个参数以用于对该信号分类;(c)估计该参数的噪音分量;(d)除去该参数的噪音分量;(e)比较该参数与包括至少一个阈值 (adaptive threshold) 的一组阈值;以及(f)响应该比较步骤将该信号与一类别相关联。

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (语音处理) .
CN1447963A
CLAIM 20
. 一种用于语音通信的方法,通过该方法可降低来自语音相关噪音的影响,该方法包括以下步骤:(a)在语音处理 (SNR calculation) 装置接收数字语音相关信号;(b)形成一组均一参数;(c)比较这些参数与一阈值;以及(d)对该信号分类。

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (这些参数) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
CN1447963A
CLAIM 15
. 一种用于在具有至少一个处理模块的语音编码装置中对语音信号进行感知匹配的方法,该方法包括以下步骤:(a)在该语音编码装置接收该信号;(b)在该处理模块中推导多个信号参数;(c)对这些参数 (noise character parameter) 进行加权;(d)将特定的信号特征与这些信号参数相关联;(e)当该特征被识别出时在该处理模块中设置一标志;(f)比较这些标志;以及(g)根据比较步骤或推导步骤之一对信号分类。

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame energy and an average frame energy (中设置) .
CN1447963A
CLAIM 15
. 一种用于在具有至少一个处理模块的语音编码装置中对语音信号进行感知匹配的方法,该方法包括以下步骤:(a)在该语音编码装置接收该信号;(b)在该处理模块中推导多个信号参数;(c)对这些参数进行加权;(d)将特定的信号特征与这些信号参数相关联;(e)当该特征被识别出时在该处理模块中设置 (average frame energy) 一标志;(f)比较这些标志;以及(g)根据比较步骤或推导步骤之一对信号分类。

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (至少一个参数) indicative of an activity of the sound signal .
CN1447963A
CLAIM 8
. 一种用于对语音分类的方法,包括以下步骤:(a)在处理单元接收一语音相关信号;(b)提供至少一个参数 (activity prediction parameter) 以用于对该信号分类;(c)估计该参数的噪音分量;(d)除去该参数的噪音分量;(e)比较该参数与包括至少一个阈值的一组阈值;以及(f)响应该比较步骤将该信号与一类别相关联。

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (至少一个参数) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
CN1447963A
CLAIM 8
. 一种用于对语音分类的方法,包括以下步骤:(a)在处理单元接收一语音相关信号;(b)提供至少一个参数 (activity prediction parameter) 以用于对该信号分类;(c)估计该参数的噪音分量;(d)除去该参数的噪音分量;(e)比较该参数与包括至少一个阈值的一组阈值;以及(f)响应该比较步骤将该信号与一类别相关联。

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (至少一个参数) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
CN1447963A
CLAIM 8
. 一种用于对语音分类的方法,包括以下步骤:(a)在处理单元接收一语音相关信号;(b)提供至少一个参数 (activity prediction parameter) 以用于对该信号分类;(c)估计该参数的噪音分量;(d)除去该参数的噪音分量;(e)比较该参数与包括至少一个阈值的一组阈值;以及(f)响应该比较步骤将该信号与一类别相关联。

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (这些参数) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
CN1447963A
CLAIM 15
. 一种用于在具有至少一个处理模块的语音编码装置中对语音信号进行感知匹配的方法,该方法包括以下步骤:(a)在该语音编码装置接收该信号;(b)在该处理模块中推导多个信号参数;(c)对这些参数 (noise character parameter) 进行加权;(d)将特定的信号特征与这些信号参数相关联;(e)当该特征被识别出时在该处理模块中设置一标志;(f)比较这些标志;以及(g)根据比较步骤或推导步骤之一对信号分类。

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (这些参数) inferior than a given fixed threshold .
CN1447963A
CLAIM 15
. 一种用于在具有至少一个处理模块的语音编码装置中对语音信号进行感知匹配的方法,该方法包括以下步骤:(a)在该语音编码装置接收该信号;(b)在该处理模块中推导多个信号参数;(c)对这些参数 (noise character parameter) 进行加权;(d)将特定的信号特征与这些信号参数相关联;(e)当该特征被识别出时在该处理模块中设置一标志;(f)比较这些标志;以及(g)根据比较步骤或推导步骤之一对信号分类。

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (一种语) , the correlation map of a current frame , and an initial value of the long-term correlation map .
CN1447963A
CLAIM 19
. 一种语 (update factor) 音编码的方法,通过该方法提供一组均一参数用于对信号分类,该组参数不受背景噪音的影响。

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (一种语) , the correlation map of a current frame , and an initial value of the long-term correlation map .
CN1447963A
CLAIM 19
. 一种语 (update factor) 音编码的方法,通过该方法提供一组均一参数用于对信号分类,该组参数不受背景噪音的影响。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
CN1624766A

Filed: 2001-08-17     Issued: 2005-06-08

语音编码中噪音鲁棒分类方法

(Original Assignee) 康奈克森特系统公司     

J·塞斯
US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (预定阈值, 阈值进行) indicative of sound activity in the sound signal .
CN1624766A
CLAIM 1
. 一种用于对包含具有背景噪音水平的背景噪音部分的语音信号分类的方法,该方法包括以下步骤:从该语音信号中提取参数;估计该参数的噪音分量;从该参数除去该噪音分量以产生无噪音参数;选择预定阈值 (adaptive threshold) ,其中选择所述预定阈值的步骤不受所述背景噪音水平的影响;比较该无噪音参数与预定阈值;以及响应该比较步骤将该语音信号与一个类别相关联。

CN1624766A
CLAIM 17
. 一种用于对包含具有背景噪音水平的背景噪音部分的语音信号分类的语音编码装置,该语音编码装置包括:参数提取模块,配置成从语音信号提取参数以用于对该语音信号分类;参数估计模块,配置成估计所述参数的噪音分量;噪音除去模块,配置成从所述参数除去语音分量以产生无噪音参数;比较模块,配置成将无噪音参数与预定阈值进行 (adaptive threshold) 比较,其中所述预定阈值不受所述背景噪音水平影响;以及分类模块,配置成响应所述比较模块将所述语音信号与一个类别相关联。

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (多个参数) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
CN1624766A
CLAIM 16
. 根据权利要求14的方法,其特征在于,所述多个参数 (noise character parameter) 包括谱倾斜参数、音调相关参数和绝对最大值参数。

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (至少一个参数) indicative of an activity of the sound signal .
CN1624766A
CLAIM 4
. 根据权利要求1的方法,其特征在于,推导至少一个参数 (activity prediction parameter) 来对该信号分类。

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (至少一个参数) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
CN1624766A
CLAIM 4
. 根据权利要求1的方法,其特征在于,推导至少一个参数 (activity prediction parameter) 来对该信号分类。

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (至少一个参数) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
CN1624766A
CLAIM 4
. 根据权利要求1的方法,其特征在于,推导至少一个参数 (activity prediction parameter) 来对该信号分类。

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (多个参数) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
CN1624766A
CLAIM 16
. 根据权利要求14的方法,其特征在于,所述多个参数 (noise character parameter) 包括谱倾斜参数、音调相关参数和绝对最大值参数。

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (多个参数) inferior than a given fixed threshold .
CN1624766A
CLAIM 16
. 根据权利要求14的方法,其特征在于,所述多个参数 (noise character parameter) 包括谱倾斜参数、音调相关参数和绝对最大值参数。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6636829B1

Filed: 2000-07-14     Issued: 2003-10-21

Speech communication system and method for handling lost frames

(Original Assignee) Mindspeed Technologies LLC     (Current Assignee) HTC Corp ; WIAV Solutions LLC

Adil Benyassine, Eyal Shlomot, Huan-Yu Su
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (frequency spectrum) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long term correlation map (decoding method) .
US6636829B1
CLAIM 10
. The decoder of claim 9 wherein the frame recovery logic sets the minimum spacing for the lost frame based also at least in part on the frequency spectrum (frequency spectrum) of the speech signal .

US6636829B1
CLAIM 46
. The decoder of claim 29 wherein if the current frame (current frame) being processed by the decoder is the first frame to be lost after the decoder received a frame , the frame recovery logic sets the adaptive gain parameter of the first subframe of the lost frame to an arbitrarily high number .

US6636829B1
CLAIM 63
. A decoding method (second group, term correlation map, second energy values) comprising the steps of : receiving parameters of a speech signal on a frame-by-frame basis , the parameters including a line spectral frequency (LSF) for each frame ;
decoding the parameters on the frame-by-frame basis to reproduce the speech signal , wherein the decoding step uses a minimum spacing indicative of a minimum difference required between the LSFs of consecutive frames ;
detecting a lost frame ;
and setting the minimum spacing for the lost frame to a first value which is greater than the minimum spacing for the previously received frame .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (frequency spectrum) of the sound signal in the current frame (current frame) ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US6636829B1
CLAIM 10
. The decoder of claim 9 wherein the frame recovery logic sets the minimum spacing for the lost frame based also at least in part on the frequency spectrum (frequency spectrum) of the speech signal .

US6636829B1
CLAIM 46
. The decoder of claim 29 wherein if the current frame (current frame) being processed by the decoder is the first frame to be lost after the decoder received a frame , the frame recovery logic sets the adaptive gain parameter of the first subframe of the lost frame to an arbitrarily high number .

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum comprises locating a maximum between each pair of two consecutive minima (minimum difference) of the current residual spectrum .
US6636829B1
CLAIM 1
. A decoder for a speech communication system , the decoder comprising : a receiver that receives parameters of a speech signal to be decoded , the parameters being received on a frame-by-frame basis , the parameters including a line spectral frequency (LSF) for each frame ;
a control logic coupled to the receiver for decoding the parameters and for resynthesizing the speech signal , the control logic including a minimum spacing indicative of a minimum difference (consecutive minima) required between the LSFs of consecutive frames ;
a lost frame detector that detects a lost frame ;
and a frame recovery logic that , when the lost frame detector detects the lost frame , sets the minimum spacing for the lost frame to a first value which is greater than the minimum spacing for the previously received frame .

US6636829B1
CLAIM 31
. The decoder of claim 29 further comprising a periodic signal detector (two consecutive minima) that determines whether the speech signal is periodic , wherein if the lost frame contained nonperiodic-like speech and the gain parameter of the subframe of the lost frame is a fixed codebook gain parameter , then the frame recovery logic sets the fixed codebook gain parameter of the first subframe of the lost frame to zero .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins between two consecutive minima (minimum difference) in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US6636829B1
CLAIM 1
. A decoder for a speech communication system , the decoder comprising : a receiver that receives parameters of a speech signal to be decoded , the parameters being received on a frame-by-frame basis , the parameters including a line spectral frequency (LSF) for each frame ;
a control logic coupled to the receiver for decoding the parameters and for resynthesizing the speech signal , the control logic including a minimum spacing indicative of a minimum difference (consecutive minima) required between the LSFs of consecutive frames ;
a lost frame detector that detects a lost frame ;
and a frame recovery logic that , when the lost frame detector detects the lost frame , sets the minimum spacing for the lost frame to a first value which is greater than the minimum spacing for the previously received frame .

US6636829B1
CLAIM 31
. The decoder of claim 29 further comprising a periodic signal detector (two consecutive minima) that determines whether the speech signal is periodic , wherein if the lost frame contained nonperiodic-like speech and the gain parameter of the subframe of the lost frame is a fixed codebook gain parameter , then the frame recovery logic sets the fixed codebook gain parameter of the first subframe of the lost frame to zero .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin (periodic signal) by frequency bin basis ;

and summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US6636829B1
CLAIM 31
. The decoder of claim 29 further comprising a periodic signal (frequency bin) detector that determines whether the speech signal is periodic , wherein if the lost frame contained nonperiodic-like speech and the gain parameter of the subframe of the lost frame is a fixed codebook gain parameter , then the frame recovery logic sets the fixed codebook gain parameter of the first subframe of the lost frame to zero .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold indicative of sound activity (gain codebook) in the sound signal .
US6636829B1
CLAIM 74
. The method of claim 73 wherein the setting step sets the fixed codebook gain codebook (sound activity, sound activity detection, sound activity detector) parameter of all of the plurality of subframes of the lost frame to zero .

US8990073B2
CLAIM 10
. A method for detecting sound activity (gain codebook) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US6636829B1
CLAIM 74
. The method of claim 73 wherein the setting step sets the fixed codebook gain codebook (sound activity, sound activity detection, sound activity detector) parameter of all of the plurality of subframes of the lost frame to zero .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity (gain codebook) in the sound signal further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
US6636829B1
CLAIM 74
. The method of claim 73 wherein the setting step sets the fixed codebook gain codebook (sound activity, sound activity detection, sound activity detector) parameter of all of the plurality of subframes of the lost frame to zero .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (gain codebook) detection comprises detecting the sound signal based on a frequency dependent signal-to-noise ratio (SNR) .
US6636829B1
CLAIM 74
. The method of claim 73 wherein the setting step sets the fixed codebook gain codebook (sound activity, sound activity detection, sound activity detector) parameter of all of the plurality of subframes of the lost frame to zero .

US8990073B2
CLAIM 14
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (gain codebook) detection comprises comparing an average signal-to-noise ratio (SNR av ) to a threshold calculated as a function of a long-term signal-to-noise ratio (SNR LT ) .
US6636829B1
CLAIM 74
. The method of claim 73 wherein the setting step sets the fixed codebook gain codebook (sound activity, sound activity detection, sound activity detector) parameter of all of the plurality of subframes of the lost frame to zero .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity (gain codebook) detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
US6636829B1
CLAIM 74
. The method of claim 73 wherein the setting step sets the fixed codebook gain codebook (sound activity, sound activity detection, sound activity detector) parameter of all of the plurality of subframes of the lost frame to zero .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity (gain codebook) detection further comprises updating the noise estimates for a next frame .
US6636829B1
CLAIM 74
. The method of claim 73 wherein the setting step sets the fixed codebook gain codebook (sound activity, sound activity detection, sound activity detector) parameter of all of the plurality of subframes of the lost frame to zero .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame (current frame) energy and an average frame energy .
US6636829B1
CLAIM 46
. The decoder of claim 29 wherein if the current frame (current frame) being processed by the decoder is the first frame to be lost after the decoder received a frame , the frame recovery logic sets the adaptive gain parameter of the first subframe of the lost frame to an arbitrarily high number .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame (current frame) and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US6636829B1
CLAIM 46
. The decoder of claim 29 wherein if the current frame (current frame) being processed by the decoder is the first frame to be lost after the decoder received a frame , the frame recovery logic sets the adaptive gain parameter of the first subframe of the lost frame to an arbitrarily high number .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group (decoding method) of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values (decoding method) so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US6636829B1
CLAIM 63
. A decoding method (second group, term correlation map, second energy values) comprising the steps of : receiving parameters of a speech signal on a frame-by-frame basis , the parameters including a line spectral frequency (LSF) for each frame ;
decoding the parameters on the frame-by-frame basis to reproduce the speech signal , wherein the decoding step uses a minimum spacing indicative of a minimum difference required between the LSFs of consecutive frames ;
detecting a lost frame ;
and setting the minimum spacing for the lost frame to a first value which is greater than the minimum spacing for the previously received frame .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (frequency spectrum) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long-term correlation map .
US6636829B1
CLAIM 10
. The decoder of claim 9 wherein the frame recovery logic sets the minimum spacing for the lost frame based also at least in part on the frequency spectrum (frequency spectrum) of the speech signal .

US6636829B1
CLAIM 46
. The decoder of claim 29 wherein if the current frame (current frame) being processed by the decoder is the first frame to be lost after the decoder received a frame , the frame recovery logic sets the adaptive gain parameter of the first subframe of the lost frame to an arbitrarily high number .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (frequency spectrum) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long-term correlation map .
US6636829B1
CLAIM 10
. The decoder of claim 9 wherein the frame recovery logic sets the minimum spacing for the lost frame based also at least in part on the frequency spectrum (frequency spectrum) of the speech signal .

US6636829B1
CLAIM 46
. The decoder of claim 29 wherein if the current frame (current frame) being processed by the decoder is the first frame to be lost after the decoder received a frame , the frame recovery logic sets the adaptive gain parameter of the first subframe of the lost frame to an arbitrarily high number .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (frequency spectrum) of the sound signal in the current frame (current frame) ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US6636829B1
CLAIM 10
. The decoder of claim 9 wherein the frame recovery logic sets the minimum spacing for the lost frame based also at least in part on the frequency spectrum (frequency spectrum) of the speech signal .

US6636829B1
CLAIM 46
. The decoder of claim 29 wherein if the current frame (current frame) being processed by the decoder is the first frame to be lost after the decoder received a frame , the frame recovery logic sets the adaptive gain parameter of the first subframe of the lost frame to an arbitrarily high number .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin (periodic signal) by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US6636829B1
CLAIM 31
. The decoder of claim 29 further comprising a periodic signal (frequency bin) detector that determines whether the speech signal is periodic , wherein if the lost frame contained nonperiodic-like speech and the gain parameter of the subframe of the lost frame is a fixed codebook gain parameter , then the frame recovery logic sets the fixed codebook gain parameter of the first subframe of the lost frame to zero .

US8990073B2
CLAIM 35
. A device for detecting sound activity (gain codebook) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US6636829B1
CLAIM 74
. The method of claim 73 wherein the setting step sets the fixed codebook gain codebook (sound activity, sound activity detection, sound activity detector) parameter of all of the plurality of subframes of the lost frame to zero .

US8990073B2
CLAIM 36
. A device for detecting sound activity (gain codebook) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US6636829B1
CLAIM 74
. The method of claim 73 wherein the setting step sets the fixed codebook gain codebook (sound activity, sound activity detection, sound activity detector) parameter of all of the plurality of subframes of the lost frame to zero .

US8990073B2
CLAIM 37
. A device as defined in claim 36 , further comprising a signal-to-noise ratio (SNR)-based sound activity (gain codebook) detector .
US6636829B1
CLAIM 74
. The method of claim 73 wherein the setting step sets the fixed codebook gain codebook (sound activity, sound activity detection, sound activity detector) parameter of all of the plurality of subframes of the lost frame to zero .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity (gain codebook) detector comprises a comparator of an average signal to noise ratio (second value) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US6636829B1
CLAIM 5
. The decoder of claim 1 wherein the frame recovery logic sets the minimum spacing for the frame received after the lost frame to a second value (noise ratio) , the second value being greater than the minimum spacing for the frame received immediately before the lost frame and less than the minimum spacing for the lost frame .

US6636829B1
CLAIM 74
. The method of claim 73 wherein the setting step sets the fixed codebook gain codebook (sound activity, sound activity detection, sound activity detector) parameter of all of the plurality of subframes of the lost frame to zero .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity (gain codebook) detector .
US6636829B1
CLAIM 74
. The method of claim 73 wherein the setting step sets the fixed codebook gain codebook (sound activity, sound activity detection, sound activity detector) parameter of all of the plurality of subframes of the lost frame to zero .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20010023395A1

Filed: 1998-09-18     Issued: 2001-09-20

Speech encoder adaptively applying pitch preprocessing with warping of target signal

(Original Assignee) Lakestar Semi Inc     (Current Assignee) Samsung Electronics Co Ltd

Huan-Yu Su, Yang Gao
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (adaptive codebook) using a frequency spectrum (speech encoder) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US20010023395A1
CLAIM 8
. A speech encoder (frequency spectrum, noise estimator) using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoder comprising : an encoder processing circuit that adaptively selects a first long term prediction mode or a second long term prediction mode ;
the first long term prediction mode comprises pitch preprocessing ;
and an adaptive codebook (sound signal) coupled to the encoder to the encoder processing circuit .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (speech encoder) of the sound signal (adaptive codebook) in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US20010023395A1
CLAIM 8
. A speech encoder (frequency spectrum, noise estimator) using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoder comprising : an encoder processing circuit that adaptively selects a first long term prediction mode or a second long term prediction mode ;
the first long term prediction mode comprises pitch preprocessing ;
and an adaptive codebook (sound signal) coupled to the encoder to the encoder processing circuit .

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (adaptive codebook) .
US20010023395A1
CLAIM 8
. A speech encoder using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoder comprising : an encoder processing circuit that adaptively selects a first long term prediction mode or a second long term prediction mode ;
the first long term prediction mode comprises pitch preprocessing ;
and an adaptive codebook (sound signal) coupled to the encoder to the encoder processing circuit .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (adaptive codebook) comprises searching in the correlation map for frequency bins having a magnitude that exceeds a given fixed threshold .
US20010023395A1
CLAIM 8
. A speech encoder using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoder comprising : an encoder processing circuit that adaptively selects a first long term prediction mode or a second long term prediction mode ;
the first long term prediction mode comprises pitch preprocessing ;
and an adaptive codebook (sound signal) coupled to the encoder to the encoder processing circuit .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (adaptive codebook) comprises comparing the summed long-term correlation map with an adaptive threshold indicative of sound activity (second encoding) in the sound signal .
US20010023395A1
CLAIM 1
. A speech encoding system using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoding system comprising : an encoder processing circuit that adaptively selects a first encoding scheme or a second encoding (sound activity, detecting sound activity) scheme ;
and the first encoding scheme comprises pitch preprocessing that employs continuous warping .

US20010023395A1
CLAIM 8
. A speech encoder using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoder comprising : an encoder processing circuit that adaptively selects a first long term prediction mode or a second long term prediction mode ;
the first long term prediction mode comprises pitch preprocessing ;
and an adaptive codebook (sound signal) coupled to the encoder to the encoder processing circuit .

US8990073B2
CLAIM 10
. A method for detecting sound activity (second encoding) in a sound signal (adaptive codebook) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US20010023395A1
CLAIM 1
. A speech encoding system using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoding system comprising : an encoder processing circuit that adaptively selects a first encoding scheme or a second encoding (sound activity, detecting sound activity) scheme ;
and the first encoding scheme comprises pitch preprocessing that employs continuous warping .

US20010023395A1
CLAIM 8
. A speech encoder using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoder comprising : an encoder processing circuit that adaptively selects a first long term prediction mode or a second long term prediction mode ;
the first long term prediction mode comprises pitch preprocessing ;
and an adaptive codebook (sound signal) coupled to the encoder to the encoder processing circuit .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound signal (adaptive codebook) is detected .
US20010023395A1
CLAIM 8
. A speech encoder using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoder comprising : an encoder processing circuit that adaptively selects a first long term prediction mode or a second long term prediction mode ;
the first long term prediction mode comprises pitch preprocessing ;
and an adaptive codebook (sound signal) coupled to the encoder to the encoder processing circuit .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity (second encoding) in the sound signal (adaptive codebook) further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
US20010023395A1
CLAIM 1
. A speech encoding system using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoding system comprising : an encoder processing circuit that adaptively selects a first encoding scheme or a second encoding (sound activity, detecting sound activity) scheme ;
and the first encoding scheme comprises pitch preprocessing that employs continuous warping .

US20010023395A1
CLAIM 8
. A speech encoder using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoder comprising : an encoder processing circuit that adaptively selects a first long term prediction mode or a second long term prediction mode ;
the first long term prediction mode comprises pitch preprocessing ;
and an adaptive codebook (sound signal) coupled to the encoder to the encoder processing circuit .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (second encoding) detection comprises detecting the sound signal (adaptive codebook) based on a frequency dependent signal-to-noise ratio (SNR) .
US20010023395A1
CLAIM 1
. A speech encoding system using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoding system comprising : an encoder processing circuit that adaptively selects a first encoding scheme or a second encoding (sound activity, detecting sound activity) scheme ;
and the first encoding scheme comprises pitch preprocessing that employs continuous warping .

US20010023395A1
CLAIM 8
. A speech encoder using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoder comprising : an encoder processing circuit that adaptively selects a first long term prediction mode or a second long term prediction mode ;
the first long term prediction mode comprises pitch preprocessing ;
and an adaptive codebook (sound signal) coupled to the encoder to the encoder processing circuit .

US8990073B2
CLAIM 14
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (second encoding) detection comprises comparing an average signal-to-noise ratio (SNR av ) to a threshold calculated as a function of a long-term signal-to-noise ratio (SNR LT ) .
US20010023395A1
CLAIM 1
. A speech encoding system using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoding system comprising : an encoder processing circuit that adaptively selects a first encoding scheme or a second encoding (sound activity, detecting sound activity) scheme ;
and the first encoding scheme comprises pitch preprocessing that employs continuous warping .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity (second encoding) detection in the sound signal (adaptive codebook) further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
US20010023395A1
CLAIM 1
. A speech encoding system using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoding system comprising : an encoder processing circuit that adaptively selects a first encoding scheme or a second encoding (sound activity, detecting sound activity) scheme ;
and the first encoding scheme comprises pitch preprocessing that employs continuous warping .

US20010023395A1
CLAIM 8
. A speech encoder using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoder comprising : an encoder processing circuit that adaptively selects a first long term prediction mode or a second long term prediction mode ;
the first long term prediction mode comprises pitch preprocessing ;
and an adaptive codebook (sound signal) coupled to the encoder to the encoder processing circuit .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity (second encoding) detection further comprises updating the noise estimates for a next frame .
US20010023395A1
CLAIM 1
. A speech encoding system using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoding system comprising : an encoder processing circuit that adaptively selects a first encoding scheme or a second encoding (sound activity, detecting sound activity) scheme ;
and the first encoding scheme comprises pitch preprocessing that employs continuous warping .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal (adaptive codebook) and a ratio between a second order and a sixteenth order of linear prediction (linear prediction) residual error energies .
US20010023395A1
CLAIM 4
. The speech encoding system of claim 1 , wherein the second encoding scheme comprises code-excited linear prediction (linear prediction, residual error) .

US20010023395A1
CLAIM 8
. A speech encoder using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoder comprising : an encoder processing circuit that adaptively selects a first long term prediction mode or a second long term prediction mode ;
the first long term prediction mode comprises pitch preprocessing ;
and an adaptive codebook (sound signal) coupled to the encoder to the encoder processing circuit .

US8990073B2
CLAIM 18
. A method as defined in claim 14 , comprising classifying the sound signal (adaptive codebook) as one of an inactive sound signal and active sound signal , which comprises determining an inactive sound signal when the average signal-to-noise ratio (SNR av ) is inferior to the calculated threshold .
US20010023395A1
CLAIM 8
. A speech encoder using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoder comprising : an encoder processing circuit that adaptively selects a first long term prediction mode or a second long term prediction mode ;
the first long term prediction mode comprises pitch preprocessing ;
and an adaptive codebook (sound signal) coupled to the encoder to the encoder processing circuit .

US8990073B2
CLAIM 19
. A method as defined in claim 14 , comprising classifying the sound signal (adaptive codebook) as one of an inactive sound signal and active sound signal , which comprises determining an active sound signal when the average signal-to-noise ratio (SNR av ) is larger than the calculated threshold .
US20010023395A1
CLAIM 8
. A speech encoder using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoder comprising : an encoder processing circuit that adaptively selects a first long term prediction mode or a second long term prediction mode ;
the first long term prediction mode comprises pitch preprocessing ;
and an adaptive codebook (sound signal) coupled to the encoder to the encoder processing circuit .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal (adaptive codebook) prevents updating of noise energy estimates when a music signal is detected .
US20010023395A1
CLAIM 8
. A speech encoder using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoder comprising : an encoder processing circuit that adaptively selects a first long term prediction mode or a second long term prediction mode ;
the first long term prediction mode comprises pitch preprocessing ;
and an adaptive codebook (sound signal) coupled to the encoder to the encoder processing circuit .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (speech signal) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
US20010023395A1
CLAIM 1
. A speech encoding system using an analysis by synthesis approach on a speech signal (noise character parameter, activity prediction parameter) having varying characteristics , the speech encoding system comprising : an encoder processing circuit that adaptively selects a first encoding scheme or a second encoding scheme ;
and the first encoding scheme comprises pitch preprocessing that employs continuous warping .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal (adaptive codebook) in a current frame and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US20010023395A1
CLAIM 8
. A speech encoder using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoder comprising : an encoder processing circuit that adaptively selects a first long term prediction mode or a second long term prediction mode ;
the first long term prediction mode comprises pitch preprocessing ;
and an adaptive codebook (sound signal) coupled to the encoder to the encoder processing circuit .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (speech signal) indicative of an activity of the sound signal (adaptive codebook) .
US20010023395A1
CLAIM 1
. A speech encoding system using an analysis by synthesis approach on a speech signal (noise character parameter, activity prediction parameter) having varying characteristics , the speech encoding system comprising : an encoder processing circuit that adaptively selects a first encoding scheme or a second encoding scheme ;
and the first encoding scheme comprises pitch preprocessing that employs continuous warping .

US20010023395A1
CLAIM 8
. A speech encoder using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoder comprising : an encoder processing circuit that adaptively selects a first long term prediction mode or a second long term prediction mode ;
the first long term prediction mode comprises pitch preprocessing ;
and an adaptive codebook (sound signal) coupled to the encoder to the encoder processing circuit .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (speech signal) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal (adaptive codebook) and the complementary non-stationarity parameter .
US20010023395A1
CLAIM 1
. A speech encoding system using an analysis by synthesis approach on a speech signal (noise character parameter, activity prediction parameter) having varying characteristics , the speech encoding system comprising : an encoder processing circuit that adaptively selects a first encoding scheme or a second encoding scheme ;
and the first encoding scheme comprises pitch preprocessing that employs continuous warping .

US20010023395A1
CLAIM 8
. A speech encoder using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoder comprising : an encoder processing circuit that adaptively selects a first long term prediction mode or a second long term prediction mode ;
the first long term prediction mode comprises pitch preprocessing ;
and an adaptive codebook (sound signal) coupled to the encoder to the encoder processing circuit .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (speech signal) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US20010023395A1
CLAIM 1
. A speech encoding system using an analysis by synthesis approach on a speech signal (noise character parameter, activity prediction parameter) having varying characteristics , the speech encoding system comprising : an encoder processing circuit that adaptively selects a first encoding scheme or a second encoding scheme ;
and the first encoding scheme comprises pitch preprocessing that employs continuous warping .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (speech signal) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20010023395A1
CLAIM 1
. A speech encoding system using an analysis by synthesis approach on a speech signal (noise character parameter, activity prediction parameter) having varying characteristics , the speech encoding system comprising : an encoder processing circuit that adaptively selects a first encoding scheme or a second encoding scheme ;
and the first encoding scheme comprises pitch preprocessing that employs continuous warping .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (speech signal) inferior than a given fixed threshold .
US20010023395A1
CLAIM 1
. A speech encoding system using an analysis by synthesis approach on a speech signal (noise character parameter, activity prediction parameter) having varying characteristics , the speech encoding system comprising : an encoder processing circuit that adaptively selects a first encoding scheme or a second encoding scheme ;
and the first encoding scheme comprises pitch preprocessing that employs continuous warping .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal (adaptive codebook) using a frequency spectrum (speech encoder) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20010023395A1
CLAIM 8
. A speech encoder (frequency spectrum, noise estimator) using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoder comprising : an encoder processing circuit that adaptively selects a first long term prediction mode or a second long term prediction mode ;
the first long term prediction mode comprises pitch preprocessing ;
and an adaptive codebook (sound signal) coupled to the encoder to the encoder processing circuit .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal (adaptive codebook) using a frequency spectrum (speech encoder) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20010023395A1
CLAIM 8
. A speech encoder (frequency spectrum, noise estimator) using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoder comprising : an encoder processing circuit that adaptively selects a first long term prediction mode or a second long term prediction mode ;
the first long term prediction mode comprises pitch preprocessing ;
and an adaptive codebook (sound signal) coupled to the encoder to the encoder processing circuit .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (speech encoder) of the sound signal (adaptive codebook) in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US20010023395A1
CLAIM 8
. A speech encoder (frequency spectrum, noise estimator) using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoder comprising : an encoder processing circuit that adaptively selects a first long term prediction mode or a second long term prediction mode ;
the first long term prediction mode comprises pitch preprocessing ;
and an adaptive codebook (sound signal) coupled to the encoder to the encoder processing circuit .

US8990073B2
CLAIM 34
. A device as defined in claim 31 , further comprising a detector of strong tones in the sound signal (adaptive codebook) .
US20010023395A1
CLAIM 8
. A speech encoder using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoder comprising : an encoder processing circuit that adaptively selects a first long term prediction mode or a second long term prediction mode ;
the first long term prediction mode comprises pitch preprocessing ;
and an adaptive codebook (sound signal) coupled to the encoder to the encoder processing circuit .

US8990073B2
CLAIM 35
. A device for detecting sound activity (second encoding) in a sound signal (adaptive codebook) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US20010023395A1
CLAIM 1
. A speech encoding system using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoding system comprising : an encoder processing circuit that adaptively selects a first encoding scheme or a second encoding (sound activity, detecting sound activity) scheme ;
and the first encoding scheme comprises pitch preprocessing that employs continuous warping .

US20010023395A1
CLAIM 8
. A speech encoder using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoder comprising : an encoder processing circuit that adaptively selects a first long term prediction mode or a second long term prediction mode ;
the first long term prediction mode comprises pitch preprocessing ;
and an adaptive codebook (sound signal) coupled to the encoder to the encoder processing circuit .

US8990073B2
CLAIM 36
. A device for detecting sound activity (second encoding) in a sound signal (adaptive codebook) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US20010023395A1
CLAIM 1
. A speech encoding system using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoding system comprising : an encoder processing circuit that adaptively selects a first encoding scheme or a second encoding (sound activity, detecting sound activity) scheme ;
and the first encoding scheme comprises pitch preprocessing that employs continuous warping .

US20010023395A1
CLAIM 8
. A speech encoder using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoder comprising : an encoder processing circuit that adaptively selects a first long term prediction mode or a second long term prediction mode ;
the first long term prediction mode comprises pitch preprocessing ;
and an adaptive codebook (sound signal) coupled to the encoder to the encoder processing circuit .

US8990073B2
CLAIM 37
. A device as defined in claim 36 , further comprising a signal-to-noise ratio (SNR)-based sound activity (second encoding) detector .
US20010023395A1
CLAIM 1
. A speech encoding system using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoding system comprising : an encoder processing circuit that adaptively selects a first encoding scheme or a second encoding (sound activity, detecting sound activity) scheme ;
and the first encoding scheme comprises pitch preprocessing that employs continuous warping .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity (second encoding) detector comprises a comparator of an average signal to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US20010023395A1
CLAIM 1
. A speech encoding system using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoding system comprising : an encoder processing circuit that adaptively selects a first encoding scheme or a second encoding (sound activity, detecting sound activity) scheme ;
and the first encoding scheme comprises pitch preprocessing that employs continuous warping .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator (speech encoder) for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity (second encoding) detector .
US20010023395A1
CLAIM 1
. A speech encoding system using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoding system comprising : an encoder processing circuit that adaptively selects a first encoding scheme or a second encoding (sound activity, detecting sound activity) scheme ;
and the first encoding scheme comprises pitch preprocessing that employs continuous warping .

US20010023395A1
CLAIM 8
. A speech encoder (frequency spectrum, noise estimator) using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoder comprising : an encoder processing circuit that adaptively selects a first long term prediction mode or a second long term prediction mode ;
the first long term prediction mode comprises pitch preprocessing ;
and an adaptive codebook coupled to the encoder to the encoder processing circuit .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal (adaptive codebook) for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates .
US20010023395A1
CLAIM 8
. A speech encoder using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoder comprising : an encoder processing circuit that adaptively selects a first long term prediction mode or a second long term prediction mode ;
the first long term prediction mode comprises pitch preprocessing ;
and an adaptive codebook (sound signal) coupled to the encoder to the encoder processing circuit .

US8990073B2
CLAIM 41
. A device as defined in claim 36 , further comprising a calculator of a spectral parameter used for detecting spectral changes and spectral attacks in the sound signal (adaptive codebook) .
US20010023395A1
CLAIM 8
. A speech encoder using an analysis by synthesis approach on a speech signal having varying characteristics , the speech encoder comprising : an encoder processing circuit that adaptively selects a first long term prediction mode or a second long term prediction mode ;
the first long term prediction mode comprises pitch preprocessing ;
and an adaptive codebook (sound signal) coupled to the encoder to the encoder processing circuit .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
CN1159639A

Filed: 1996-12-06     Issued: 1997-09-17

可变速率声码器

(Original Assignee) 夸尔柯姆股份有限公司     

保罗·E·雅各布, 威廉·R·加德纳, 冲·U·李, 克莱恩·S·吉豪森, S·凯瑟琳·兰姆, 民昌·蔡
US8990073B2
CLAIM 10
. A method for detecting sound activity (声音信号) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (背景噪声) ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
CN1159639A
CLAIM 1
. 在编码激励线性预测(CELP)编码器中,对于主要由语音及背景噪声 (background noise signal) 组成的声信号的数字化取样的输入帧进行可变速率编码的方法,其特征在于,它包括:对于数字化语音取样的一系列输入帧中的每一个计算线性预测编码系数(LPC);依据至少一个所述LPC系数从一组数据包速率中为每个帧选出一个输出数据包速率;将表示LPC系数的位数限制为由所述选中速率确定的预定数量;对每个帧的一组成分音调分析子帧的各个音调子帧确定音调参数,其中,每个帧的音调子帧数量由所述选中速率确定,每个音调子帧的所述音调参数由所述选中速率所确定的位数来表示;对于每个帧的一组成分码书分析子帧的各个码书子帧确定码书参数,其中,每个帧的码书分析子帧的数量由所述选中速率确定,各码书子帧的所述码书参数由依据所述选中速率而确定的位数表示;和为每个帧提供一个对应的表示所述LPC系数的位的输出数据包,为每个相应的音调和码书子帧提供音调参数和码书参数。

CN1159639A
CLAIM 2
. 一种可变速率编码激励线性预测(CELP)编码器,用于对主要由语音和背景噪声组成的声音信号 (detecting sound activity) 的数字化取样的输入帧进行可变速率编码,其特征在于,它包括:对一个声音信号的数字化取样的一系列输入帧中的每一个计算线性预测编码系数(LPC)的装置;依据至少一个所述LPC系数,从一组数据包速率中为每个帧选取一个输出数据包速率的装置;将表示所述LPC系数的位数限制为由所述选中速率确定的一个预定数量的装置;为每个帧的一组成分音调分析子帧中的每个音调子帧确定音调参数的装置,其中,每个帧的音调子帧数量由所述选中速率确定,各音调子帧的所述音调参数由所述选中速率所确定的位数来表示;为每个帧的一组成分码书分析子帧中的每个码书子帧确定码书参数的装置,其中,每个帧的码书分析子帧的数量由所述选中速率确定,每个码书子帧的所述码书参数由所述选中速率所确定的位数表示;和其中,在所述选中速率下,为每个帧提供一个代表所述LPC系数的位的对应输出数据包,和各个音调和码书子帧的所述音调参数和码书参数。

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction (预测编码) residual error energies .
CN1159639A
CLAIM 1
. 在编码激励线性预测(CELP)编码器中,对于主要由语音及背景噪声组成的声信号的数字化取样的输入帧进行可变速率编码的方法,其特征在于,它包括:对于数字化语音取样的一系列输入帧中的每一个计算线性预测编码 (linear prediction) 系数(LPC);依据至少一个所述LPC系数从一组数据包速率中为每个帧选出一个输出数据包速率;将表示LPC系数的位数限制为由所述选中速率确定的预定数量;对每个帧的一组成分音调分析子帧的各个音调子帧确定音调参数,其中,每个帧的音调子帧数量由所述选中速率确定,每个音调子帧的所述音调参数由所述选中速率所确定的位数来表示;对于每个帧的一组成分码书分析子帧的各个码书子帧确定码书参数,其中,每个帧的码书分析子帧的数量由所述选中速率确定,各码书子帧的所述码书参数由依据所述选中速率而确定的位数表示;和为每个帧提供一个对应的表示所述LPC系数的位的输出数据包,为每个相应的音调和码书子帧提供音调参数和码书参数。

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal (背景噪声) and prevent update of noise energy estimates on the music signal .
CN1159639A
CLAIM 1
. 在编码激励线性预测(CELP)编码器中,对于主要由语音及背景噪声 (background noise signal) 组成的声信号的数字化取样的输入帧进行可变速率编码的方法,其特征在于,它包括:对于数字化语音取样的一系列输入帧中的每一个计算线性预测编码系数(LPC);依据至少一个所述LPC系数从一组数据包速率中为每个帧选出一个输出数据包速率;将表示LPC系数的位数限制为由所述选中速率确定的预定数量;对每个帧的一组成分音调分析子帧的各个音调子帧确定音调参数,其中,每个帧的音调子帧数量由所述选中速率确定,每个音调子帧的所述音调参数由所述选中速率所确定的位数来表示;对于每个帧的一组成分码书分析子帧的各个码书子帧确定码书参数,其中,每个帧的码书分析子帧的数量由所述选中速率确定,各码书子帧的所述码书参数由依据所述选中速率而确定的位数表示;和为每个帧提供一个对应的表示所述LPC系数的位的输出数据包,为每个相应的音调和码书子帧提供音调参数和码书参数。

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values (一个输出) so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
CN1159639A
CLAIM 1
. 在编码激励线性预测(CELP)编码器中,对于主要由语音及背景噪声组成的声信号的数字化取样的输入帧进行可变速率编码的方法,其特征在于,它包括:对于数字化语音取样的一系列输入帧中的每一个计算线性预测编码系数(LPC);依据至少一个所述LPC系数从一组数据包速率中为每个帧选出一个输出 (second energy values) 数据包速率;将表示LPC系数的位数限制为由所述选中速率确定的预定数量;对每个帧的一组成分音调分析子帧的各个音调子帧确定音调参数,其中,每个帧的音调子帧数量由所述选中速率确定,每个音调子帧的所述音调参数由所述选中速率所确定的位数来表示;对于每个帧的一组成分码书分析子帧的各个码书子帧确定码书参数,其中,每个帧的码书分析子帧的数量由所述选中速率确定,各码书子帧的所述码书参数由依据所述选中速率而确定的位数表示;和为每个帧提供一个对应的表示所述LPC系数的位的输出数据包,为每个相应的音调和码书子帧提供音调参数和码书参数。

US8990073B2
CLAIM 35
. A device for detecting sound activity (声音信号) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (背景噪声) ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
CN1159639A
CLAIM 1
. 在编码激励线性预测(CELP)编码器中,对于主要由语音及背景噪声 (background noise signal) 组成的声信号的数字化取样的输入帧进行可变速率编码的方法,其特征在于,它包括:对于数字化语音取样的一系列输入帧中的每一个计算线性预测编码系数(LPC);依据至少一个所述LPC系数从一组数据包速率中为每个帧选出一个输出数据包速率;将表示LPC系数的位数限制为由所述选中速率确定的预定数量;对每个帧的一组成分音调分析子帧的各个音调子帧确定音调参数,其中,每个帧的音调子帧数量由所述选中速率确定,每个音调子帧的所述音调参数由所述选中速率所确定的位数来表示;对于每个帧的一组成分码书分析子帧的各个码书子帧确定码书参数,其中,每个帧的码书分析子帧的数量由所述选中速率确定,各码书子帧的所述码书参数由依据所述选中速率而确定的位数表示;和为每个帧提供一个对应的表示所述LPC系数的位的输出数据包,为每个相应的音调和码书子帧提供音调参数和码书参数。

CN1159639A
CLAIM 2
. 一种可变速率编码激励线性预测(CELP)编码器,用于对主要由语音和背景噪声组成的声音信号 (detecting sound activity) 的数字化取样的输入帧进行可变速率编码,其特征在于,它包括:对一个声音信号的数字化取样的一系列输入帧中的每一个计算线性预测编码系数(LPC)的装置;依据至少一个所述LPC系数,从一组数据包速率中为每个帧选取一个输出数据包速率的装置;将表示所述LPC系数的位数限制为由所述选中速率确定的一个预定数量的装置;为每个帧的一组成分音调分析子帧中的每个音调子帧确定音调参数的装置,其中,每个帧的音调子帧数量由所述选中速率确定,各音调子帧的所述音调参数由所述选中速率所确定的位数来表示;为每个帧的一组成分码书分析子帧中的每个码书子帧确定码书参数的装置,其中,每个帧的码书分析子帧的数量由所述选中速率确定,每个码书子帧的所述码书参数由所述选中速率所确定的位数表示;和其中,在所述选中速率下,为每个帧提供一个代表所述LPC系数的位的对应输出数据包,和各个音调和码书子帧的所述音调参数和码书参数。

US8990073B2
CLAIM 36
. A device for detecting sound activity (声音信号) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal (背景噪声) ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
CN1159639A
CLAIM 1
. 在编码激励线性预测(CELP)编码器中,对于主要由语音及背景噪声 (background noise signal) 组成的声信号的数字化取样的输入帧进行可变速率编码的方法,其特征在于,它包括:对于数字化语音取样的一系列输入帧中的每一个计算线性预测编码系数(LPC);依据至少一个所述LPC系数从一组数据包速率中为每个帧选出一个输出数据包速率;将表示LPC系数的位数限制为由所述选中速率确定的预定数量;对每个帧的一组成分音调分析子帧的各个音调子帧确定音调参数,其中,每个帧的音调子帧数量由所述选中速率确定,每个音调子帧的所述音调参数由所述选中速率所确定的位数来表示;对于每个帧的一组成分码书分析子帧的各个码书子帧确定码书参数,其中,每个帧的码书分析子帧的数量由所述选中速率确定,各码书子帧的所述码书参数由依据所述选中速率而确定的位数表示;和为每个帧提供一个对应的表示所述LPC系数的位的输出数据包,为每个相应的音调和码书子帧提供音调参数和码书参数。

CN1159639A
CLAIM 2
. 一种可变速率编码激励线性预测(CELP)编码器,用于对主要由语音和背景噪声组成的声音信号 (detecting sound activity) 的数字化取样的输入帧进行可变速率编码,其特征在于,它包括:对一个声音信号的数字化取样的一系列输入帧中的每一个计算线性预测编码系数(LPC)的装置;依据至少一个所述LPC系数,从一组数据包速率中为每个帧选取一个输出数据包速率的装置;将表示所述LPC系数的位数限制为由所述选中速率确定的一个预定数量的装置;为每个帧的一组成分音调分析子帧中的每个音调子帧确定音调参数的装置,其中,每个帧的音调子帧数量由所述选中速率确定,各音调子帧的所述音调参数由所述选中速率所确定的位数来表示;为每个帧的一组成分码书分析子帧中的每个码书子帧确定码书参数的装置,其中,每个帧的码书分析子帧的数量由所述选中速率确定,每个码书子帧的所述码书参数由所述选中速率所确定的位数表示;和其中,在所述选中速率下,为每个帧提供一个代表所述LPC系数的位的对应输出数据包,和各个音调和码书子帧的所述音调参数和码书参数。

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal (背景噪声) and preventing update of noise energy estimates .
CN1159639A
CLAIM 1
. 在编码激励线性预测(CELP)编码器中,对于主要由语音及背景噪声 (background noise signal) 组成的声信号的数字化取样的输入帧进行可变速率编码的方法,其特征在于,它包括:对于数字化语音取样的一系列输入帧中的每一个计算线性预测编码系数(LPC);依据至少一个所述LPC系数从一组数据包速率中为每个帧选出一个输出数据包速率;将表示LPC系数的位数限制为由所述选中速率确定的预定数量;对每个帧的一组成分音调分析子帧的各个音调子帧确定音调参数,其中,每个帧的音调子帧数量由所述选中速率确定,每个音调子帧的所述音调参数由所述选中速率所确定的位数来表示;对于每个帧的一组成分码书分析子帧的各个码书子帧确定码书参数,其中,每个帧的码书分析子帧的数量由所述选中速率确定,各码书子帧的所述码书参数由依据所述选中速率而确定的位数表示;和为每个帧提供一个对应的表示所述LPC系数的位的输出数据包,为每个相应的音调和码书子帧提供音调参数和码书参数。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US5848388A

Filed: 1995-12-19     Issued: 1998-12-08

Speech recognition with sequence parsing, rejection and pause detection options

(Original Assignee) British Telecommunications PLC     (Current Assignee) British Telecommunications PLC

Kevin Joseph Power, Stephen Howard Johnson, Francis James Scahill, Simon Patrick Ringland, John Edward Talintyre
US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis (energy levels) ;

and summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US5848388A
CLAIM 24
. A recognition system according to claim 1 , wherein said signal parameter is derived from a plurality of energy levels (frequency bin basis) provided by an energy averager , said energy averager comprising : means for storing an energy level relating to previous energy levels of the speech signal ;
means for comparing the difference between the speech signal energy and said energy level with a threshold ;
means for varying the stored energy level in response to the difference exceeding the threshold ;
means for varying the threshold depending upon the difference ;
and output means for providing said energy level .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (minimum values) indicative of sound activity in the sound signal .
US5848388A
CLAIM 12
. A system as in claim 11 in which said variation detecting means is arranged to derive maximum and minimum values (adaptive threshold) of said parameter or derived parameter , and to derive said measure so as to depend upon the ratio between .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (probability distributions) indicative of an activity of the sound signal .
US5848388A
CLAIM 22
. A system as in claim 21 in which the recognition processing means comprises : means for storing data defining a plurality of continuous probability distributions (activity prediction parameter) corresponding to different states , and means for applying said distribution data to said speech signal to calculate a measure of the correspondence between the speech signal and each said state .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (probability distributions) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
US5848388A
CLAIM 22
. A system as in claim 21 in which the recognition processing means comprises : means for storing data defining a plurality of continuous probability distributions (activity prediction parameter) corresponding to different states , and means for applying said distribution data to said speech signal to calculate a measure of the correspondence between the speech signal and each said state .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (probability distributions) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US5848388A
CLAIM 22
. A system as in claim 21 in which the recognition processing means comprises : means for storing data defining a plurality of continuous probability distributions (activity prediction parameter) corresponding to different states , and means for applying said distribution data to said speech signal to calculate a measure of the correspondence between the speech signal and each said state .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value (different states) for the first group of frequency bands and a second energy value (different states) of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US5848388A
CLAIM 22
. A system as in claim 21 in which the recognition processing means comprises : means for storing data defining a plurality of continuous probability distributions corresponding to different states (first energy value, second energy value) , and means for applying said distribution data to said speech signal to calculate a measure of the correspondence between the speech signal and each said state .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis (energy levels) ;

and an adder for summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US5848388A
CLAIM 24
. A recognition system according to claim 1 , wherein said signal parameter is derived from a plurality of energy levels (frequency bin basis) provided by an energy averager , said energy averager comprising : means for storing an energy level relating to previous energy levels of the speech signal ;
means for comparing the difference between the speech signal energy and said energy level with a threshold ;
means for varying the stored energy level in response to the difference exceeding the threshold ;
means for varying the threshold depending upon the difference ;
and output means for providing said energy level .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal to noise ratio (second value) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US5848388A
CLAIM 16
. A system as in claim 15 in which said measure is derived so as to depend upon the ratio between a first value derived from said pattern-containing portion and a second value (noise ratio) derived from said silence or noise portion .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
CN1131473A

Filed: 1995-08-01     Issued: 1996-09-18

在速率可变的声码器中选择编码速率的方法和装置

(Original Assignee) 夸尔柯姆股份有限公司     

安德鲁·P·德雅克, 威廉·R·加德纳
US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (一个阈值) indicative of sound activity in the sound signal .
CN1131473A
CLAIM 6
. 如权利要求5所述的装置,其特征在于,阈值计算装置通过把背景噪声估计值与所述换算值相乘来确定至少一个阈值 (adaptive threshold)

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (背景噪声) ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
CN1131473A
CLAIM 6
. 如权利要求5所述的装置,其特征在于,阈值计算装置通过把背景噪声 (background noise signal) 估计值与所述换算值相乘来确定至少一个阈值。

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (计算装置) .
CN1131473A
CLAIM 1
. 一种为速率可变声码器确定编码速率的装置,其特征在于,包含:副带能量计算装置 (SNR calculation) ,用于接收输入信号,根据预定的副带能量计算公式确定多个副带能量值;速率确定装置,用于接收所述多个副带能量值,根据所述多个副带能量值确定所述编码速率。

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal (背景噪声) and prevent update of noise energy estimates on the music signal .
CN1131473A
CLAIM 6
. 如权利要求5所述的装置,其特征在于,阈值计算装置通过把背景噪声 (background noise signal) 估计值与所述换算值相乘来确定至少一个阈值。

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (背景噪声) ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
CN1131473A
CLAIM 6
. 如权利要求5所述的装置,其特征在于,阈值计算装置通过把背景噪声 (background noise signal) 估计值与所述换算值相乘来确定至少一个阈值。

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal (背景噪声) ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
CN1131473A
CLAIM 6
. 如权利要求5所述的装置,其特征在于,阈值计算装置通过把背景噪声 (background noise signal) 估计值与所述换算值相乘来确定至少一个阈值。

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal (背景噪声) and preventing update of noise energy estimates .
CN1131473A
CLAIM 6
. 如权利要求5所述的装置,其特征在于,阈值计算装置通过把背景噪声 (background noise signal) 估计值与所述换算值相乘来确定至少一个阈值。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
CN1512487A

Filed: 1995-08-01     Issued: 2004-07-14

在速率可变的声码器中选择编码速率的方法和装置

(Original Assignee) 夸尔柯姆股份有限公司     

安德鲁・P・德雅克, 安德鲁·P·德雅克, R・加德纳, 威廉·R·加德纳
US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (背景噪声) ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
CN1512487A
CLAIM 6
. 如权利要求1所述的方法,其特征在于,还包括生成背景噪声 (background noise signal) 电平的估计值的步骤。

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (计算装置, 来计算) .
CN1512487A
CLAIM 7
. 如权利要求6所述的方法,其特征在于,还包括根据背景噪声电平的估计值来计算 (SNR calculation) 所述信噪比的步骤。

CN1512487A
CLAIM 21
. 如权利要求20所述的将拖尾帧添加到由声码器编码的多个帧的装置,其特征在于,还包括一耦合至所述阈值修正装置的能量计算装置 (SNR calculation) ,该能量计算装置配置成生成一帧能量电平的估计值,所述阈值修正装置还配置成接收所述能量计算装置的一帧能量电平的估计值,并根据一帧能量电平的估计值和背景噪声电平的估计值来计算所述信噪比。

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal (背景噪声) and prevent update of noise energy estimates on the music signal .
CN1512487A
CLAIM 6
. 如权利要求1所述的方法,其特征在于,还包括生成背景噪声 (background noise signal) 电平的估计值的步骤。

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (背景噪声) ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
CN1512487A
CLAIM 6
. 如权利要求1所述的方法,其特征在于,还包括生成背景噪声 (background noise signal) 电平的估计值的步骤。

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal (背景噪声) ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
CN1512487A
CLAIM 6
. 如权利要求1所述的方法,其特征在于,还包括生成背景噪声 (background noise signal) 电平的估计值的步骤。

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal (背景噪声) and preventing update of noise energy estimates .
CN1512487A
CLAIM 6
. 如权利要求1所述的方法,其特征在于,还包括生成背景噪声 (background noise signal) 电平的估计值的步骤。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
CN1945696A

Filed: 1995-08-01     Issued: 2007-04-11

在速率可变的声码器中选择编码速率的方法和装置

(Original Assignee) 高通股份有限公司     

安德鲁·P·德雅克, 威廉·R·加德纳
US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (一个阈值) indicative of sound activity in the sound signal .
CN1945696A
CLAIM 4
. 如权利要求3所述的装置,其特征在于,所述多个阈值修正部件中的每一个根据所指定的频率副带的信号能量和背景噪声估计值来确定一个阈值 (adaptive threshold) ,该阈值用于判断在该指定的频率副带中是否存在声音信号。

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (背景噪声) ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
CN1945696A
CLAIM 4
. 如权利要求3所述的装置,其特征在于,所述多个阈值修正部件中的每一个根据所指定的频率副带的信号能量和背景噪声 (background noise signal) 估计值来确定一个阈值,该阈值用于判断在该指定的频率副带中是否存在声音信号。

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (计算装置) .
CN1945696A
CLAIM 7
. 如权利要求1所述的装置,其特征在于,所述编码速率是为一可变速率声码器确定的,其中所述声音信号检测部件包括副带能量计算装置 (SNR calculation) (4,6),用于接收所述输入信号(S(n))并根据预定的副带能量计算公式来确定多个副带能量值(RL(0),RH(0))。

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal (背景噪声) and prevent update of noise energy estimates on the music signal .
CN1945696A
CLAIM 4
. 如权利要求3所述的装置,其特征在于,所述多个阈值修正部件中的每一个根据所指定的频率副带的信号能量和背景噪声 (background noise signal) 估计值来确定一个阈值,该阈值用于判断在该指定的频率副带中是否存在声音信号。

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group (残留信号) of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
CN1945696A
CLAIM 5
. 如权利要求2所述的装置,其特征在于,各阈值修正部件通过检查归一化的自相关函数来判断声音信号的存在,所述自相关函数由下式给出:NACF=maxT& ;
Sigma ;
n=0N-1e(n)& ;
CenterDot ;
e(n-T)12[& ;
Sigma ;
n=0N-1e2(n)+& ;
Sigma ;
n=0N-1e2(n-T)]---(7)]]> ;
其中,e(n)为输入信号(S(n))被LPC滤波器滤波后得到的特性分量残留信号 (first group)

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (背景噪声) ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
CN1945696A
CLAIM 4
. 如权利要求3所述的装置,其特征在于,所述多个阈值修正部件中的每一个根据所指定的频率副带的信号能量和背景噪声 (background noise signal) 估计值来确定一个阈值,该阈值用于判断在该指定的频率副带中是否存在声音信号。

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal (背景噪声) ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
CN1945696A
CLAIM 4
. 如权利要求3所述的装置,其特征在于,所述多个阈值修正部件中的每一个根据所指定的频率副带的信号能量和背景噪声 (background noise signal) 估计值来确定一个阈值,该阈值用于判断在该指定的频率副带中是否存在声音信号。

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal (背景噪声) and preventing update of noise energy estimates .
CN1945696A
CLAIM 4
. 如权利要求3所述的装置,其特征在于,所述多个阈值修正部件中的每一个根据所指定的频率副带的信号能量和背景噪声 (background noise signal) 估计值来确定一个阈值,该阈值用于判断在该指定的频率副带中是否存在声音信号。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
EP1233408A1

Filed: 1995-08-01     Issued: 2002-08-21

Method and apparatus for selecting an encoding rate in a variable rate vocoder

(Original Assignee) Qualcomm Inc     (Current Assignee) Qualcomm Inc

Andrew P. Dejaco, William R. Gardner
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum (frequency subbands) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
EP1233408A1
CLAIM 35
The system of Claim 33 , wherein the plurality of threshold adaptation elements are configured to determine a threshold value based upon the combined signal energies of the frequency subbands (current residual spectrum) of the input signal , wherein the threshold value is used to determine whether the audio signal is present in the frequency subband .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum (frequency subbands) comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
EP1233408A1
CLAIM 35
The system of Claim 33 , wherein the plurality of threshold adaptation elements are configured to determine a threshold value based upon the combined signal energies of the frequency subbands (current residual spectrum) of the input signal , wherein the threshold value is used to determine whether the audio signal is present in the frequency subband .

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum (frequency subbands) comprises locating a maximum between each pair of two consecutive minima of the current residual spectrum .
EP1233408A1
CLAIM 35
The system of Claim 33 , wherein the plurality of threshold adaptation elements are configured to determine a threshold value based upon the combined signal energies of the frequency subbands (current residual spectrum) of the input signal , wherein the threshold value is used to determine whether the audio signal is present in the frequency subband .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum (frequency subbands) , calculating a normalized correlation value with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
EP1233408A1
CLAIM 35
The system of Claim 33 , wherein the plurality of threshold adaptation elements are configured to determine a threshold value based upon the combined signal energies of the frequency subbands (current residual spectrum) of the input signal , wherein the threshold value is used to determine whether the audio signal is present in the frequency subband .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (rate selection) .
EP1233408A1
CLAIM 31
A system for selecting an encoding rate for an input signal , comprising : a subband filter subsystem for determining a signal energy for each frequency subband of the input signal ;
and a rate selection (SNR calculation) subsystem for selecting the encoding rate of the input signal based upon the signal energies of each frequency subband of the input signal .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection further comprises updating the noise estimates (combined signal, subband filter) for a next frame .
EP1233408A1
CLAIM 31
A system for selecting an encoding rate for an input signal , comprising : a subband filter (noise estimates) subsystem for determining a signal energy for each frequency subband of the input signal ;
and a rate selection subsystem for selecting the encoding rate of the input signal based upon the signal energies of each frequency subband of the input signal .

EP1233408A1
CLAIM 35
The system of Claim 33 , wherein the plurality of threshold adaptation elements are configured to determine a threshold value based upon the combined signal (noise estimates) energies of the frequency subbands of the input signal , wherein the threshold value is used to determine whether the audio signal is present in the frequency subband .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame and an energy of the sound signal in a previous frame , for frequency bands (assigned frequency, band signal) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
EP1233408A1
CLAIM 32
The system of Claim 31 , wherein the subband filter subsystem comprises a plurality of subband energy computation elements , and each of the plurality of subband energy computation elements is for determining a frequency subband signal (first frequency, first frequency bands, frequency bands) energy .

EP1233408A1
CLAIM 37
The apparatus of Claim 36 , wherein the audio signal detection device comprises : a plurality of subband energy computation elements for determining a signal energy for each frequency subband of the input signal ;
and a plurality of threshold adaptation elements , each threshold adaptation element communicatively coupled to one of the plurality of subband energy computation elements , wherein each threshold adaptation element is for using the signal energy of an assigned frequency (first frequency, first frequency bands, frequency bands) subband to determine whether an audio signal is present in the assigned frequency subband .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (assigned frequency, band signal) into a first group of a certain number of first frequency (assigned frequency, band signal) bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
EP1233408A1
CLAIM 32
The system of Claim 31 , wherein the subband filter subsystem comprises a plurality of subband energy computation elements , and each of the plurality of subband energy computation elements is for determining a frequency subband signal (first frequency, first frequency bands, frequency bands) energy .

EP1233408A1
CLAIM 37
The apparatus of Claim 36 , wherein the audio signal detection device comprises : a plurality of subband energy computation elements for determining a signal energy for each frequency subband of the input signal ;
and a plurality of threshold adaptation elements , each threshold adaptation element communicatively coupled to one of the plurality of subband energy computation elements , wherein each threshold adaptation element is for using the signal energy of an assigned frequency (first frequency, first frequency bands, frequency bands) subband to determine whether an audio signal is present in the assigned frequency subband .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum (frequency subbands) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
EP1233408A1
CLAIM 35
The system of Claim 33 , wherein the plurality of threshold adaptation elements are configured to determine a threshold value based upon the combined signal energies of the frequency subbands (current residual spectrum) of the input signal , wherein the threshold value is used to determine whether the audio signal is present in the frequency subband .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum (frequency subbands) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
EP1233408A1
CLAIM 35
The system of Claim 33 , wherein the plurality of threshold adaptation elements are configured to determine a threshold value based upon the combined signal energies of the frequency subbands (current residual spectrum) of the input signal , wherein the threshold value is used to determine whether the audio signal is present in the frequency subband .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum (frequency subbands) comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
EP1233408A1
CLAIM 35
The system of Claim 33 , wherein the plurality of threshold adaptation elements are configured to determine a threshold value based upon the combined signal energies of the frequency subbands (current residual spectrum) of the input signal , wherein the threshold value is used to determine whether the audio signal is present in the frequency subband .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal (bandpass filter) to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
EP1233408A1
CLAIM 2
The apparatus of Claim 1 wherein said subband energy computation means determines each of said plurality of subband energy values in accordance with the equation : where L is the number taps in a bandpass filter (average signal) h bp (n) , where R s (i) is the autocorrelation function of the input signal , S(n) , and where R h bp is the autocorrelation function of the bandpass filter h bp (n) .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
EP1703493A2

Filed: 1995-08-01     Issued: 2006-09-20

Method and apparatus for selecting an encoding rate in a variable rate vocoder

(Original Assignee) Qualcomm Inc     (Current Assignee) Qualcomm Inc

Andrew P. Dejaco, William R. Gardner
US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (second threshold) indicative of sound activity in the sound signal .
EP1703493A2
CLAIM 5
The method of Claims 2 and 4 , wherein comparing the normalized autocorrelation function of the formant residual signal to the detection thresholds comprises : comparing the normalized autocorrelation function of the formant residual signal to a first threshold ;
updating the background noise energy estimate if the normalized autocorrelation function of the formant residual signal is less than the first threshold ;
comparing the normalized autocorrelation function of the formant residual signal to a second threshold (adaptive threshold) , wherein the second threshold is higher than the first threshold ;
updating the signal energy estimate if the normalized autocorrelation function of the formant residual signal is greater than the second threshold ;
and using the updated background noise energy estimate and the updated signal energy estimate to determine whether the input signal has an audio signal or silence .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (noise ratio) .
EP1703493A2
CLAIM 1
A method for detecting whether a frame of an input signal has an audio signal or silence , comprising : setting detection thresholds based upon an estimate of a signal to noise ratio (noise ratio, SNR LT, SNR calculation) (SNR) of the input signal , wherein the signal energy of the SNR is estimated as a maximum signal energy during a time of active speech ;
and using the detection thresholds to detect whether the frame of the input signal has an audio signal or silence .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal to noise ratio (noise ratio) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
EP1703493A2
CLAIM 1
A method for detecting whether a frame of an input signal has an audio signal or silence , comprising : setting detection thresholds based upon an estimate of a signal to noise ratio (noise ratio, SNR LT, SNR calculation) (SNR) of the input signal , wherein the signal energy of the SNR is estimated as a maximum signal energy during a time of active speech ;
and using the detection thresholds to detect whether the frame of the input signal has an audio signal or silence .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US5751903A

Filed: 1994-12-19     Issued: 1998-05-12

Low rate multi-mode CELP codec that encodes line SPECTRAL frequencies utilizing an offset

(Original Assignee) Hughes Electronics Corp     (Current Assignee) JPMorgan Chase Bank NA ; Hughes Network Systems LLC

Kumar Swaminathan, Murthy Vemuganti
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map (decoding method) .
US5751903A
CLAIM 13
. The decoding method (second group, term correlation map, second energy values) according to claim 12 , wherein : the second mode is a voiced mode wherein , for a digitized speech signal segment classified in the voiced mode , the step of determining the first subset of inverse quantized line spectral frequencies comprises , for each member of the first subset , the steps of : predicting a line spectral frequency as a weighted sum of neighboring scalar quantized line spectral frequencies determined for a preceding digitized speech signal segment ;
and determining the inverse quantized line spectral frequency based on the predicted line spectral frequency and a corresponding scalar quantizer parameter from the set of scalar quantizer parameters , which encodes an offset from the predicted line spectral frequency .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates (extracted set) when a tonal sound signal is detected .
US5751903A
CLAIM 12
. A method of decoding a data bitstream containing encoded parameters for a segment of a digitized speech signal comprising the steps of : extracting from the data bitstream : a mode parameter encoding a mode of the digitized speech signal segment , a set of scalar quantizer parameters , and a vector quantizer parameter ;
classifying the digitized speech signal segment in one of a plurality of predetermined modes based on the extracted mode parameter , the plurality of predetermined modes comprising a first mode and a second mode ;
determining a set of inverse quantized line spectral frequencies for the digitized speech signal segment by determining a first subset of inverse quantized line spectral frequencies based on the extracted set (noise energy estimates, noise estimates) of scalar quantizer parameters , and determining a second subset of inverse quantized line spectral frequencies based on the extracted vector quantizer parameter , wherein the set of scalar quantizer parameters and the vector quantizer parameter , for digitized speech signal segments classified in the second mode , represent a set of offsets generated through backward prediction from analysis of a preceding digitized speech signal segment .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates (extracted set) calculated in a previous frame in a SNR calculation .
US5751903A
CLAIM 12
. A method of decoding a data bitstream containing encoded parameters for a segment of a digitized speech signal comprising the steps of : extracting from the data bitstream : a mode parameter encoding a mode of the digitized speech signal segment , a set of scalar quantizer parameters , and a vector quantizer parameter ;
classifying the digitized speech signal segment in one of a plurality of predetermined modes based on the extracted mode parameter , the plurality of predetermined modes comprising a first mode and a second mode ;
determining a set of inverse quantized line spectral frequencies for the digitized speech signal segment by determining a first subset of inverse quantized line spectral frequencies based on the extracted set (noise energy estimates, noise estimates) of scalar quantizer parameters , and determining a second subset of inverse quantized line spectral frequencies based on the extracted vector quantizer parameter , wherein the set of scalar quantizer parameters and the vector quantizer parameter , for digitized speech signal segments classified in the second mode , represent a set of offsets generated through backward prediction from analysis of a preceding digitized speech signal segment .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection further comprises updating the noise estimates (extracted set) for a next frame .
US5751903A
CLAIM 12
. A method of decoding a data bitstream containing encoded parameters for a segment of a digitized speech signal comprising the steps of : extracting from the data bitstream : a mode parameter encoding a mode of the digitized speech signal segment , a set of scalar quantizer parameters , and a vector quantizer parameter ;
classifying the digitized speech signal segment in one of a plurality of predetermined modes based on the extracted mode parameter , the plurality of predetermined modes comprising a first mode and a second mode ;
determining a set of inverse quantized line spectral frequencies for the digitized speech signal segment by determining a first subset of inverse quantized line spectral frequencies based on the extracted set (noise energy estimates, noise estimates) of scalar quantizer parameters , and determining a second subset of inverse quantized line spectral frequencies based on the extracted vector quantizer parameter , wherein the set of scalar quantizer parameters and the vector quantizer parameter , for digitized speech signal segments classified in the second mode , represent a set of offsets generated through backward prediction from analysis of a preceding digitized speech signal segment .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates (extracted set) for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
US5751903A
CLAIM 12
. A method of decoding a data bitstream containing encoded parameters for a segment of a digitized speech signal comprising the steps of : extracting from the data bitstream : a mode parameter encoding a mode of the digitized speech signal segment , a set of scalar quantizer parameters , and a vector quantizer parameter ;
classifying the digitized speech signal segment in one of a plurality of predetermined modes based on the extracted mode parameter , the plurality of predetermined modes comprising a first mode and a second mode ;
determining a set of inverse quantized line spectral frequencies for the digitized speech signal segment by determining a first subset of inverse quantized line spectral frequencies based on the extracted set (noise energy estimates, noise estimates) of scalar quantizer parameters , and determining a second subset of inverse quantized line spectral frequencies based on the extracted vector quantizer parameter , wherein the set of scalar quantizer parameters and the vector quantizer parameter , for digitized speech signal segments classified in the second mode , represent a set of offsets generated through backward prediction from analysis of a preceding digitized speech signal segment .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal prevents updating of noise energy estimates (extracted set) when a music signal is detected .
US5751903A
CLAIM 12
. A method of decoding a data bitstream containing encoded parameters for a segment of a digitized speech signal comprising the steps of : extracting from the data bitstream : a mode parameter encoding a mode of the digitized speech signal segment , a set of scalar quantizer parameters , and a vector quantizer parameter ;
classifying the digitized speech signal segment in one of a plurality of predetermined modes based on the extracted mode parameter , the plurality of predetermined modes comprising a first mode and a second mode ;
determining a set of inverse quantized line spectral frequencies for the digitized speech signal segment by determining a first subset of inverse quantized line spectral frequencies based on the extracted set (noise energy estimates, noise estimates) of scalar quantizer parameters , and determining a second subset of inverse quantized line spectral frequencies based on the extracted vector quantizer parameter , wherein the set of scalar quantizer parameters and the vector quantizer parameter , for digitized speech signal segments classified in the second mode , represent a set of offsets generated through backward prediction from analysis of a preceding digitized speech signal segment .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates (extracted set) on the music signal .
US5751903A
CLAIM 12
. A method of decoding a data bitstream containing encoded parameters for a segment of a digitized speech signal comprising the steps of : extracting from the data bitstream : a mode parameter encoding a mode of the digitized speech signal segment , a set of scalar quantizer parameters , and a vector quantizer parameter ;
classifying the digitized speech signal segment in one of a plurality of predetermined modes based on the extracted mode parameter , the plurality of predetermined modes comprising a first mode and a second mode ;
determining a set of inverse quantized line spectral frequencies for the digitized speech signal segment by determining a first subset of inverse quantized line spectral frequencies based on the extracted set (noise energy estimates, noise estimates) of scalar quantizer parameters , and determining a second subset of inverse quantized line spectral frequencies based on the extracted vector quantizer parameter , wherein the set of scalar quantizer parameters and the vector quantizer parameter , for digitized speech signal segments classified in the second mode , represent a set of offsets generated through backward prediction from analysis of a preceding digitized speech signal segment .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates (extracted set) is prevented in response to having simultaneously the activity prediction parameter larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US5751903A
CLAIM 12
. A method of decoding a data bitstream containing encoded parameters for a segment of a digitized speech signal comprising the steps of : extracting from the data bitstream : a mode parameter encoding a mode of the digitized speech signal segment , a set of scalar quantizer parameters , and a vector quantizer parameter ;
classifying the digitized speech signal segment in one of a plurality of predetermined modes based on the extracted mode parameter , the plurality of predetermined modes comprising a first mode and a second mode ;
determining a set of inverse quantized line spectral frequencies for the digitized speech signal segment by determining a first subset of inverse quantized line spectral frequencies based on the extracted set (noise energy estimates, noise estimates) of scalar quantizer parameters , and determining a second subset of inverse quantized line spectral frequencies based on the extracted vector quantizer parameter , wherein the set of scalar quantizer parameters and the vector quantizer parameter , for digitized speech signal segments classified in the second mode , represent a set of offsets generated through backward prediction from analysis of a preceding digitized speech signal segment .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group (decoding method) of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values (decoding method) so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US5751903A
CLAIM 13
. The decoding method (second group, term correlation map, second energy values) according to claim 12 , wherein : the second mode is a voiced mode wherein , for a digitized speech signal segment classified in the voiced mode , the step of determining the first subset of inverse quantized line spectral frequencies comprises , for each member of the first subset , the steps of : predicting a line spectral frequency as a weighted sum of neighboring scalar quantized line spectral frequencies determined for a preceding digitized speech signal segment ;
and determining the inverse quantized line spectral frequency based on the predicted line spectral frequency and a corresponding scalar quantizer parameter from the set of scalar quantizer parameters , which encodes an offset from the predicted line spectral frequency .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates (extracted set) is prevented in response to having the noise character parameter inferior than a given fixed threshold .
US5751903A
CLAIM 12
. A method of decoding a data bitstream containing encoded parameters for a segment of a digitized speech signal comprising the steps of : extracting from the data bitstream : a mode parameter encoding a mode of the digitized speech signal segment , a set of scalar quantizer parameters , and a vector quantizer parameter ;
classifying the digitized speech signal segment in one of a plurality of predetermined modes based on the extracted mode parameter , the plurality of predetermined modes comprising a first mode and a second mode ;
determining a set of inverse quantized line spectral frequencies for the digitized speech signal segment by determining a first subset of inverse quantized line spectral frequencies based on the extracted set (noise energy estimates, noise estimates) of scalar quantizer parameters , and determining a second subset of inverse quantized line spectral frequencies based on the extracted vector quantizer parameter , wherein the set of scalar quantizer parameters and the vector quantizer parameter , for digitized speech signal segments classified in the second mode , represent a set of offsets generated through backward prediction from analysis of a preceding digitized speech signal segment .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates (extracted set) in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector .
US5751903A
CLAIM 12
. A method of decoding a data bitstream containing encoded parameters for a segment of a digitized speech signal comprising the steps of : extracting from the data bitstream : a mode parameter encoding a mode of the digitized speech signal segment , a set of scalar quantizer parameters , and a vector quantizer parameter ;
classifying the digitized speech signal segment in one of a plurality of predetermined modes based on the extracted mode parameter , the plurality of predetermined modes comprising a first mode and a second mode ;
determining a set of inverse quantized line spectral frequencies for the digitized speech signal segment by determining a first subset of inverse quantized line spectral frequencies based on the extracted set (noise energy estimates, noise estimates) of scalar quantizer parameters , and determining a second subset of inverse quantized line spectral frequencies based on the extracted vector quantizer parameter , wherein the set of scalar quantizer parameters and the vector quantizer parameter , for digitized speech signal segments classified in the second mode , represent a set of offsets generated through backward prediction from analysis of a preceding digitized speech signal segment .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates (extracted set) .
US5751903A
CLAIM 12
. A method of decoding a data bitstream containing encoded parameters for a segment of a digitized speech signal comprising the steps of : extracting from the data bitstream : a mode parameter encoding a mode of the digitized speech signal segment , a set of scalar quantizer parameters , and a vector quantizer parameter ;
classifying the digitized speech signal segment in one of a plurality of predetermined modes based on the extracted mode parameter , the plurality of predetermined modes comprising a first mode and a second mode ;
determining a set of inverse quantized line spectral frequencies for the digitized speech signal segment by determining a first subset of inverse quantized line spectral frequencies based on the extracted set (noise energy estimates, noise estimates) of scalar quantizer parameters , and determining a second subset of inverse quantized line spectral frequencies based on the extracted vector quantizer parameter , wherein the set of scalar quantizer parameters and the vector quantizer parameter , for digitized speech signal segments classified in the second mode , represent a set of offsets generated through backward prediction from analysis of a preceding digitized speech signal segment .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
JPH07334190A

Filed: 1994-06-14     Issued: 1995-12-22

高調波振幅値量子化装置

(Original Assignee) Matsushita Electric Ind Co Ltd; 松下電器産業株式会社     

Tadashi Yonezaki, 正 米崎
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (高調波周波数) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
JPH07334190A
CLAIM 3
【請求項3】 入力音声の基本周波数を求める基本周波 数抽出器と、求められた基本周波数と高調波振幅値復号 器によって復号された高調波振幅値とを用いてスペクト ル包絡を予測するスペクトル包絡予測器と、入力音声の 対数パワースペクトルを求めるパワースペクトル算出器 と、パワースペクトルと基準スペクトル包絡の差分を求 めるスペクトル減算器と、この差分スペクトルのケプス トラム係数を求める改良型ケプストラム分析器と、ケプ ストラム係数をベクトル量子化するケプストラム係数量 子化器と、量子化したケプストラム係数を離散フーリエ 変換して差分スペクトル包絡を求めるスペクトル包絡算 出器と、予測スペクトル包絡に差分スペクトル包絡を加 算して量子化されたスペクトル包絡を求めるスペクトル 加算器と、前記基本周波数抽出器からの基本周波数を基 にして得られる高調波周波数 (frequency spectrum, frequency dependent signal) における入力音声のパワー スペクトルに対する量子化されたスペクトル包絡の誤差 を算出する高調波振幅値残差算出器と、高調波振幅値誤 差をスカラー量子化する高調波振幅値残差量子化器と、 前記基本周波数抽出器からの基本周波数と量子化された スペクトル包絡と量子化された高調波振幅値残差とから 高調波振幅値を復号する高調波振幅値復号器とを備えた 高調波振幅値量子化装置。

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (高調波周波数) of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
JPH07334190A
CLAIM 3
【請求項3】 入力音声の基本周波数を求める基本周波 数抽出器と、求められた基本周波数と高調波振幅値復号 器によって復号された高調波振幅値とを用いてスペクト ル包絡を予測するスペクトル包絡予測器と、入力音声の 対数パワースペクトルを求めるパワースペクトル算出器 と、パワースペクトルと基準スペクトル包絡の差分を求 めるスペクトル減算器と、この差分スペクトルのケプス トラム係数を求める改良型ケプストラム分析器と、ケプ ストラム係数をベクトル量子化するケプストラム係数量 子化器と、量子化したケプストラム係数を離散フーリエ 変換して差分スペクトル包絡を求めるスペクトル包絡算 出器と、予測スペクトル包絡に差分スペクトル包絡を加 算して量子化されたスペクトル包絡を求めるスペクトル 加算器と、前記基本周波数抽出器からの基本周波数を基 にして得られる高調波周波数 (frequency spectrum, frequency dependent signal) における入力音声のパワー スペクトルに対する量子化されたスペクトル包絡の誤差 を算出する高調波振幅値残差算出器と、高調波振幅値誤 差をスカラー量子化する高調波振幅値残差量子化器と、 前記基本周波数抽出器からの基本周波数と量子化された スペクトル包絡と量子化された高調波振幅値残差とから 高調波振幅値を復号する高調波振幅値復号器とを備えた 高調波振幅値量子化装置。

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error (の誤差) energies .
JPH07334190A
CLAIM 3
【請求項3】 入力音声の基本周波数を求める基本周波 数抽出器と、求められた基本周波数と高調波振幅値復号 器によって復号された高調波振幅値とを用いてスペクト ル包絡を予測するスペクトル包絡予測器と、入力音声の 対数パワースペクトルを求めるパワースペクトル算出器 と、パワースペクトルと基準スペクトル包絡の差分を求 めるスペクトル減算器と、この差分スペクトルのケプス トラム係数を求める改良型ケプストラム分析器と、ケプ ストラム係数をベクトル量子化するケプストラム係数量 子化器と、量子化したケプストラム係数を離散フーリエ 変換して差分スペクトル包絡を求めるスペクトル包絡算 出器と、予測スペクトル包絡に差分スペクトル包絡を加 算して量子化されたスペクトル包絡を求めるスペクトル 加算器と、前記基本周波数抽出器からの基本周波数を基 にして得られる高調波周波数における入力音声のパワー スペクトルに対する量子化されたスペクトル包絡の誤差 (residual error) を算出する高調波振幅値残差算出器と、高調波振幅値誤 差をスカラー量子化する高調波振幅値残差量子化器と、 前記基本周波数抽出器からの基本周波数と量子化された スペクトル包絡と量子化された高調波振幅値残差とから 高調波振幅値を復号する高調波振幅値復号器とを備えた 高調波振幅値量子化装置。

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (高調波周波数) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
JPH07334190A
CLAIM 3
【請求項3】 入力音声の基本周波数を求める基本周波 数抽出器と、求められた基本周波数と高調波振幅値復号 器によって復号された高調波振幅値とを用いてスペクト ル包絡を予測するスペクトル包絡予測器と、入力音声の 対数パワースペクトルを求めるパワースペクトル算出器 と、パワースペクトルと基準スペクトル包絡の差分を求 めるスペクトル減算器と、この差分スペクトルのケプス トラム係数を求める改良型ケプストラム分析器と、ケプ ストラム係数をベクトル量子化するケプストラム係数量 子化器と、量子化したケプストラム係数を離散フーリエ 変換して差分スペクトル包絡を求めるスペクトル包絡算 出器と、予測スペクトル包絡に差分スペクトル包絡を加 算して量子化されたスペクトル包絡を求めるスペクトル 加算器と、前記基本周波数抽出器からの基本周波数を基 にして得られる高調波周波数 (frequency spectrum, frequency dependent signal) における入力音声のパワー スペクトルに対する量子化されたスペクトル包絡の誤差 を算出する高調波振幅値残差算出器と、高調波振幅値誤 差をスカラー量子化する高調波振幅値残差量子化器と、 前記基本周波数抽出器からの基本周波数と量子化された スペクトル包絡と量子化された高調波振幅値残差とから 高調波振幅値を復号する高調波振幅値復号器とを備えた 高調波振幅値量子化装置。

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (高調波周波数) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
JPH07334190A
CLAIM 3
【請求項3】 入力音声の基本周波数を求める基本周波 数抽出器と、求められた基本周波数と高調波振幅値復号 器によって復号された高調波振幅値とを用いてスペクト ル包絡を予測するスペクトル包絡予測器と、入力音声の 対数パワースペクトルを求めるパワースペクトル算出器 と、パワースペクトルと基準スペクトル包絡の差分を求 めるスペクトル減算器と、この差分スペクトルのケプス トラム係数を求める改良型ケプストラム分析器と、ケプ ストラム係数をベクトル量子化するケプストラム係数量 子化器と、量子化したケプストラム係数を離散フーリエ 変換して差分スペクトル包絡を求めるスペクトル包絡算 出器と、予測スペクトル包絡に差分スペクトル包絡を加 算して量子化されたスペクトル包絡を求めるスペクトル 加算器と、前記基本周波数抽出器からの基本周波数を基 にして得られる高調波周波数 (frequency spectrum, frequency dependent signal) における入力音声のパワー スペクトルに対する量子化されたスペクトル包絡の誤差 を算出する高調波振幅値残差算出器と、高調波振幅値誤 差をスカラー量子化する高調波振幅値残差量子化器と、 前記基本周波数抽出器からの基本周波数と量子化された スペクトル包絡と量子化された高調波振幅値残差とから 高調波振幅値を復号する高調波振幅値復号器とを備えた 高調波振幅値量子化装置。

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (高調波周波数) of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
JPH07334190A
CLAIM 3
【請求項3】 入力音声の基本周波数を求める基本周波 数抽出器と、求められた基本周波数と高調波振幅値復号 器によって復号された高調波振幅値とを用いてスペクト ル包絡を予測するスペクトル包絡予測器と、入力音声の 対数パワースペクトルを求めるパワースペクトル算出器 と、パワースペクトルと基準スペクトル包絡の差分を求 めるスペクトル減算器と、この差分スペクトルのケプス トラム係数を求める改良型ケプストラム分析器と、ケプ ストラム係数をベクトル量子化するケプストラム係数量 子化器と、量子化したケプストラム係数を離散フーリエ 変換して差分スペクトル包絡を求めるスペクトル包絡算 出器と、予測スペクトル包絡に差分スペクトル包絡を加 算して量子化されたスペクトル包絡を求めるスペクトル 加算器と、前記基本周波数抽出器からの基本周波数を基 にして得られる高調波周波数 (frequency spectrum, frequency dependent signal) における入力音声のパワー スペクトルに対する量子化されたスペクトル包絡の誤差 を算出する高調波振幅値残差算出器と、高調波振幅値誤 差をスカラー量子化する高調波振幅値残差量子化器と、 前記基本周波数抽出器からの基本周波数と量子化された スペクトル包絡と量子化された高調波振幅値残差とから 高調波振幅値を復号する高調波振幅値復号器とを備えた 高調波振幅値量子化装置。

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator (フレーム) of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
JPH07334190A
CLAIM 1
【請求項1】 ハーモニック音声符復号装置における高 調波振幅値量子化装置において、入力音声の対数パワー スペクトルを算出するパワースペクトル算出器と、対数 パワースペクトルからスペクトル包絡復号器により得ら れる前フレーム (tonal stability tonal stability estimator) のスペクトル包絡を減算して差分スペク トルを求める差分スペクトル算出器と、前記差分スペク トルのスペクトル包絡を決定するパラメータを算出する スペクトル包絡算出器と、前記パラメータを量子化する 量子化器と、量子化されたパラメータより求められる量 子化差分スペクトル包絡を算出する量子化差分スペクト ル包絡算出器と、前フレームのスペクトル包絡と量子化 差分スペクトル包絡とを加算して現フレームのスペクト ル包絡を復号するスペクトル包絡復号器とを備えた高調 波振幅値量子化装置。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US5594833A

Filed: 1994-01-28     Issued: 1997-01-14

Rapid sound data compression in code book creation

(Original Assignee) Miyazawa; Takeo     

Takeo Miyazawa
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map (correlation map) between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US5594833A
CLAIM 1
. A sound data compressing method with which , when a series of sound data are sampled at predetermined time intervals and input as a sound data pattern having a predetermined number of samples , said input sound data pattern is replaced by an representative sound value for sorting and approximation of said input sound data pattern , said representative sound value being coded , thereby compressing said series of sound data , said method including : map forming means for forming a plurality of division regions , in which representative sound values for sorting and making approximation of said input sound data pattern in accordance with change in said input sound data pattern are loaded as the centroids , into a three-or-more dimensional correlation map (correlation map) , and for modifying the representative sound values as the centroids of said respective division regions depending on change in said input sound data pattern to update said correlation map , and coding means for converting the representative sound values , which are loaded in said respective division regions of said correlation map formed by said map forming means , into predetermined code data , whereby the representative sound values as the centroids of said respective division regions are modified depending on change in said input sound data pattern to update said three-or-more or n-dimensional correlation map , and the representative sound values of said updated correlation map are converted into predetermined code data , thereby compressing the input sound data .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map (correlation map) comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value (initial values) with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US5594833A
CLAIM 1
. A sound data compressing method with which , when a series of sound data are sampled at predetermined time intervals and input as a sound data pattern having a predetermined number of samples , said input sound data pattern is replaced by an representative sound value for sorting and approximation of said input sound data pattern , said representative sound value being coded , thereby compressing said series of sound data , said method including : map forming means for forming a plurality of division regions , in which representative sound values for sorting and making approximation of said input sound data pattern in accordance with change in said input sound data pattern are loaded as the centroids , into a three-or-more dimensional correlation map (correlation map) , and for modifying the representative sound values as the centroids of said respective division regions depending on change in said input sound data pattern to update said correlation map , and coding means for converting the representative sound values , which are loaded in said respective division regions of said correlation map formed by said map forming means , into predetermined code data , whereby the representative sound values as the centroids of said respective division regions are modified depending on change in said input sound data pattern to update said three-or-more or n-dimensional correlation map , and the representative sound values of said updated correlation map are converted into predetermined code data , thereby compressing the input sound data .

US5594833A
CLAIM 2
. A sound data compressing method according to claim 1 , wherein said map forming means comprises : initial value setting means for setting initial values (correlation value) , which are weighted for each of said plural division regions , so as to set said plural representative sound values , presenting means for presenting said input sound data pattern to the initial values for said respective plural division regions set by said initial value setting means , distance calculating means for calculating distances between said input sound data pattern presented by said presenting means and the initial values for said respective plural division regions in directions of three-or-more dimensions , and update means for updating the plural representative sound values for said respective plural division regions based on a result calculated by said distance calculating means , thereby updating said three-or-more dimensional correlation map .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map (correlation map) comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis ;

and summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US5594833A
CLAIM 1
. A sound data compressing method with which , when a series of sound data are sampled at predetermined time intervals and input as a sound data pattern having a predetermined number of samples , said input sound data pattern is replaced by an representative sound value for sorting and approximation of said input sound data pattern , said representative sound value being coded , thereby compressing said series of sound data , said method including : map forming means for forming a plurality of division regions , in which representative sound values for sorting and making approximation of said input sound data pattern in accordance with change in said input sound data pattern are loaded as the centroids , into a three-or-more dimensional correlation map (correlation map) , and for modifying the representative sound values as the centroids of said respective division regions depending on change in said input sound data pattern to update said correlation map , and coding means for converting the representative sound values , which are loaded in said respective division regions of said correlation map formed by said map forming means , into predetermined code data , whereby the representative sound values as the centroids of said respective division regions are modified depending on change in said input sound data pattern to update said three-or-more or n-dimensional correlation map , and the representative sound values of said updated correlation map are converted into predetermined code data , thereby compressing the input sound data .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map (correlation map) for frequency bins having a magnitude that exceeds a given fixed threshold .
US5594833A
CLAIM 1
. A sound data compressing method with which , when a series of sound data are sampled at predetermined time intervals and input as a sound data pattern having a predetermined number of samples , said input sound data pattern is replaced by an representative sound value for sorting and approximation of said input sound data pattern , said representative sound value being coded , thereby compressing said series of sound data , said method including : map forming means for forming a plurality of division regions , in which representative sound values for sorting and making approximation of said input sound data pattern in accordance with change in said input sound data pattern are loaded as the centroids , into a three-or-more dimensional correlation map (correlation map) , and for modifying the representative sound values as the centroids of said respective division regions depending on change in said input sound data pattern to update said correlation map , and coding means for converting the representative sound values , which are loaded in said respective division regions of said correlation map formed by said map forming means , into predetermined code data , whereby the representative sound values as the centroids of said respective division regions are modified depending on change in said input sound data pattern to update said three-or-more or n-dimensional correlation map , and the representative sound values of said updated correlation map are converted into predetermined code data , thereby compressing the input sound data .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map (correlation map) with an adaptive threshold indicative of sound activity in the sound signal .
US5594833A
CLAIM 1
. A sound data compressing method with which , when a series of sound data are sampled at predetermined time intervals and input as a sound data pattern having a predetermined number of samples , said input sound data pattern is replaced by an representative sound value for sorting and approximation of said input sound data pattern , said representative sound value being coded , thereby compressing said series of sound data , said method including : map forming means for forming a plurality of division regions , in which representative sound values for sorting and making approximation of said input sound data pattern in accordance with change in said input sound data pattern are loaded as the centroids , into a three-or-more dimensional correlation map (correlation map) , and for modifying the representative sound values as the centroids of said respective division regions depending on change in said input sound data pattern to update said correlation map , and coding means for converting the representative sound values , which are loaded in said respective division regions of said correlation map formed by said map forming means , into predetermined code data , whereby the representative sound values as the centroids of said respective division regions are modified depending on change in said input sound data pattern to update said three-or-more or n-dimensional correlation map , and the representative sound values of said updated correlation map are converted into predetermined code data , thereby compressing the input sound data .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame energy and an average frame energy (predetermined time interval) .
US5594833A
CLAIM 1
. A sound data compressing method with which , when a series of sound data are sampled at predetermined time interval (average frame energy) s and input as a sound data pattern having a predetermined number of samples , said input sound data pattern is replaced by an representative sound value for sorting and approximation of said input sound data pattern , said representative sound value being coded , thereby compressing said series of sound data , said method including : map forming means for forming a plurality of division regions , in which representative sound values for sorting and making approximation of said input sound data pattern in accordance with change in said input sound data pattern are loaded as the centroids , into a three-or-more dimensional correlation map , and for modifying the representative sound values as the centroids of said respective division regions depending on change in said input sound data pattern to update said correlation map , and coding means for converting the representative sound values , which are loaded in said respective division regions of said correlation map formed by said map forming means , into predetermined code data , whereby the representative sound values as the centroids of said respective division regions are modified depending on change in said input sound data pattern to update said three-or-more or n-dimensional correlation map , and the representative sound values of said updated correlation map are converted into predetermined code data , thereby compressing the input sound data .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency (said series) bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US5594833A
CLAIM 1
. A sound data compressing method with which , when a series of sound data are sampled at predetermined time intervals and input as a sound data pattern having a predetermined number of samples , said input sound data pattern is replaced by an representative sound value for sorting and approximation of said input sound data pattern , said representative sound value being coded , thereby compressing said series (first frequency) of sound data , said method including : map forming means for forming a plurality of division regions , in which representative sound values for sorting and making approximation of said input sound data pattern in accordance with change in said input sound data pattern are loaded as the centroids , into a three-or-more dimensional correlation map , and for modifying the representative sound values as the centroids of said respective division regions depending on change in said input sound data pattern to update said correlation map , and coding means for converting the representative sound values , which are loaded in said respective division regions of said correlation map formed by said map forming means , into predetermined code data , whereby the representative sound values as the centroids of said respective division regions are modified depending on change in said input sound data pattern to update said three-or-more or n-dimensional correlation map , and the representative sound values of said updated correlation map are converted into predetermined code data , thereby compressing the input sound data .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map (correlation map) between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US5594833A
CLAIM 1
. A sound data compressing method with which , when a series of sound data are sampled at predetermined time intervals and input as a sound data pattern having a predetermined number of samples , said input sound data pattern is replaced by an representative sound value for sorting and approximation of said input sound data pattern , said representative sound value being coded , thereby compressing said series of sound data , said method including : map forming means for forming a plurality of division regions , in which representative sound values for sorting and making approximation of said input sound data pattern in accordance with change in said input sound data pattern are loaded as the centroids , into a three-or-more dimensional correlation map (correlation map) , and for modifying the representative sound values as the centroids of said respective division regions depending on change in said input sound data pattern to update said correlation map , and coding means for converting the representative sound values , which are loaded in said respective division regions of said correlation map formed by said map forming means , into predetermined code data , whereby the representative sound values as the centroids of said respective division regions are modified depending on change in said input sound data pattern to update said three-or-more or n-dimensional correlation map , and the representative sound values of said updated correlation map are converted into predetermined code data , thereby compressing the input sound data .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map (correlation map) between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US5594833A
CLAIM 1
. A sound data compressing method with which , when a series of sound data are sampled at predetermined time intervals and input as a sound data pattern having a predetermined number of samples , said input sound data pattern is replaced by an representative sound value for sorting and approximation of said input sound data pattern , said representative sound value being coded , thereby compressing said series of sound data , said method including : map forming means for forming a plurality of division regions , in which representative sound values for sorting and making approximation of said input sound data pattern in accordance with change in said input sound data pattern are loaded as the centroids , into a three-or-more dimensional correlation map (correlation map) , and for modifying the representative sound values as the centroids of said respective division regions depending on change in said input sound data pattern to update said correlation map , and coding means for converting the representative sound values , which are loaded in said respective division regions of said correlation map formed by said map forming means , into predetermined code data , whereby the representative sound values as the centroids of said respective division regions are modified depending on change in said input sound data pattern to update said three-or-more or n-dimensional correlation map , and the representative sound values of said updated correlation map are converted into predetermined code data , thereby compressing the input sound data .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map (correlation map) comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US5594833A
CLAIM 1
. A sound data compressing method with which , when a series of sound data are sampled at predetermined time intervals and input as a sound data pattern having a predetermined number of samples , said input sound data pattern is replaced by an representative sound value for sorting and approximation of said input sound data pattern , said representative sound value being coded , thereby compressing said series of sound data , said method including : map forming means for forming a plurality of division regions , in which representative sound values for sorting and making approximation of said input sound data pattern in accordance with change in said input sound data pattern are loaded as the centroids , into a three-or-more dimensional correlation map (correlation map) , and for modifying the representative sound values as the centroids of said respective division regions depending on change in said input sound data pattern to update said correlation map , and coding means for converting the representative sound values , which are loaded in said respective division regions of said correlation map formed by said map forming means , into predetermined code data , whereby the representative sound values as the centroids of said respective division regions are modified depending on change in said input sound data pattern to update said three-or-more or n-dimensional correlation map , and the representative sound values of said updated correlation map are converted into predetermined code data , thereby compressing the input sound data .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
JPH07114396A

Filed: 1993-10-19     Issued: 1995-05-02

ピッチ検出方法

(Original Assignee) Sony Corp; ソニー株式会社     

Atsushi Matsumoto, Masayuki Nishiguchi, 淳 松本, 正之 西口
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (音声信号) using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
JPH07114396A
CLAIM 1
【請求項1】 入力音声信号 (sound signal) を時間軸上でブロック単位 で区分し、この区分された各ブロックの信号毎に音声の 基本周期に相当するピッチを検出するピッチ検出方法に おいて、 この区分されたブロック内の信号のスペクトルの低域側 のパワー偏在及びピークを検出する工程と、 この検出されたピーク近傍の極大値を検出する工程と、 上記検出されたピークの位置と上記極大値の位置とに基 づいてピッチを検出する工程とを有することを特徴とす るピッチ検出方法。

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal (音声信号) in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
JPH07114396A
CLAIM 1
【請求項1】 入力音声信号 (sound signal) を時間軸上でブロック単位 で区分し、この区分された各ブロックの信号毎に音声の 基本周期に相当するピッチを検出するピッチ検出方法に おいて、 この区分されたブロック内の信号のスペクトルの低域側 のパワー偏在及びピークを検出する工程と、 この検出されたピーク近傍の極大値を検出する工程と、 上記検出されたピークの位置と上記極大値の位置とに基 づいてピッチを検出する工程とを有することを特徴とす るピッチ検出方法。

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (音声信号) .
JPH07114396A
CLAIM 1
【請求項1】 入力音声信号 (sound signal) を時間軸上でブロック単位 で区分し、この区分された各ブロックの信号毎に音声の 基本周期に相当するピッチを検出するピッチ検出方法に おいて、 この区分されたブロック内の信号のスペクトルの低域側 のパワー偏在及びピークを検出する工程と、 この検出されたピーク近傍の極大値を検出する工程と、 上記検出されたピークの位置と上記極大値の位置とに基 づいてピッチを検出する工程とを有することを特徴とす るピッチ検出方法。

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (音声信号) comprises searching in the correlation map for frequency bins having a magnitude that exceeds a given fixed threshold .
JPH07114396A
CLAIM 1
【請求項1】 入力音声信号 (sound signal) を時間軸上でブロック単位 で区分し、この区分された各ブロックの信号毎に音声の 基本周期に相当するピッチを検出するピッチ検出方法に おいて、 この区分されたブロック内の信号のスペクトルの低域側 のパワー偏在及びピークを検出する工程と、 この検出されたピーク近傍の極大値を検出する工程と、 上記検出されたピークの位置と上記極大値の位置とに基 づいてピッチを検出する工程とを有することを特徴とす るピッチ検出方法。

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (音声信号) comprises comparing the summed long-term correlation map with an adaptive threshold indicative of sound activity in the sound signal .
JPH07114396A
CLAIM 1
【請求項1】 入力音声信号 (sound signal) を時間軸上でブロック単位 で区分し、この区分された各ブロックの信号毎に音声の 基本周期に相当するピッチを検出するピッチ検出方法に おいて、 この区分されたブロック内の信号のスペクトルの低域側 のパワー偏在及びピークを検出する工程と、 この検出されたピーク近傍の極大値を検出する工程と、 上記検出されたピークの位置と上記極大値の位置とに基 づいてピッチを検出する工程とを有することを特徴とす るピッチ検出方法。

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal (音声信号) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (出方法) from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
JPH07114396A
CLAIM 1
【請求項1】 入力音声信号 (sound signal) を時間軸上でブロック単位 で区分し、この区分された各ブロックの信号毎に音声の 基本周期に相当するピッチを検出するピッチ検出方法 (music signal) に おいて、 この区分されたブロック内の信号のスペクトルの低域側 のパワー偏在及びピークを検出する工程と、 この検出されたピーク近傍の極大値を検出する工程と、 上記検出されたピークの位置と上記極大値の位置とに基 づいてピッチを検出する工程とを有することを特徴とす るピッチ検出方法

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound signal (音声信号) is detected .
JPH07114396A
CLAIM 1
【請求項1】 入力音声信号 (sound signal) を時間軸上でブロック単位 で区分し、この区分された各ブロックの信号毎に音声の 基本周期に相当するピッチを検出するピッチ検出方法に おいて、 この区分されたブロック内の信号のスペクトルの低域側 のパワー偏在及びピークを検出する工程と、 この検出されたピーク近傍の極大値を検出する工程と、 上記検出されたピークの位置と上記極大値の位置とに基 づいてピッチを検出する工程とを有することを特徴とす るピッチ検出方法。

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity in the sound signal (音声信号) further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
JPH07114396A
CLAIM 1
【請求項1】 入力音声信号 (sound signal) を時間軸上でブロック単位 で区分し、この区分された各ブロックの信号毎に音声の 基本周期に相当するピッチを検出するピッチ検出方法に おいて、 この区分されたブロック内の信号のスペクトルの低域側 のパワー偏在及びピークを検出する工程と、 この検出されたピーク近傍の極大値を検出する工程と、 上記検出されたピークの位置と上記極大値の位置とに基 づいてピッチを検出する工程とを有することを特徴とす るピッチ検出方法。

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection comprises detecting the sound signal (音声信号) based on a frequency dependent signal-to-noise ratio (SNR) .
JPH07114396A
CLAIM 1
【請求項1】 入力音声信号 (sound signal) を時間軸上でブロック単位 で区分し、この区分された各ブロックの信号毎に音声の 基本周期に相当するピッチを検出するピッチ検出方法に おいて、 この区分されたブロック内の信号のスペクトルの低域側 のパワー偏在及びピークを検出する工程と、 この検出されたピーク近傍の極大値を検出する工程と、 上記検出されたピークの位置と上記極大値の位置とに基 づいてピッチを検出する工程とを有することを特徴とす るピッチ検出方法。

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal (音声信号) further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
JPH07114396A
CLAIM 1
【請求項1】 入力音声信号 (sound signal) を時間軸上でブロック単位 で区分し、この区分された各ブロックの信号毎に音声の 基本周期に相当するピッチを検出するピッチ検出方法に おいて、 この区分されたブロック内の信号のスペクトルの低域側 のパワー偏在及びピークを検出する工程と、 この検出されたピーク近傍の極大値を検出する工程と、 上記検出されたピークの位置と上記極大値の位置とに基 づいてピッチを検出する工程とを有することを特徴とす るピッチ検出方法。

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal (音声信号) and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
JPH07114396A
CLAIM 1
【請求項1】 入力音声信号 (sound signal) を時間軸上でブロック単位 で区分し、この区分された各ブロックの信号毎に音声の 基本周期に相当するピッチを検出するピッチ検出方法に おいて、 この区分されたブロック内の信号のスペクトルの低域側 のパワー偏在及びピークを検出する工程と、 この検出されたピーク近傍の極大値を検出する工程と、 上記検出されたピークの位置と上記極大値の位置とに基 づいてピッチを検出する工程とを有することを特徴とす るピッチ検出方法。

US8990073B2
CLAIM 18
. A method as defined in claim 14 , comprising classifying the sound signal (音声信号) as one of an inactive sound signal and active sound signal , which comprises determining an inactive sound signal when the average signal-to-noise ratio (SNR av ) is inferior to the calculated threshold .
JPH07114396A
CLAIM 1
【請求項1】 入力音声信号 (sound signal) を時間軸上でブロック単位 で区分し、この区分された各ブロックの信号毎に音声の 基本周期に相当するピッチを検出するピッチ検出方法に おいて、 この区分されたブロック内の信号のスペクトルの低域側 のパワー偏在及びピークを検出する工程と、 この検出されたピーク近傍の極大値を検出する工程と、 上記検出されたピークの位置と上記極大値の位置とに基 づいてピッチを検出する工程とを有することを特徴とす るピッチ検出方法。

US8990073B2
CLAIM 19
. A method as defined in claim 14 , comprising classifying the sound signal (音声信号) as one of an inactive sound signal and active sound signal , which comprises determining an active sound signal when the average signal-to-noise ratio (SNR av ) is larger than the calculated threshold .
JPH07114396A
CLAIM 1
【請求項1】 入力音声信号 (sound signal) を時間軸上でブロック単位 で区分し、この区分された各ブロックの信号毎に音声の 基本周期に相当するピッチを検出するピッチ検出方法に おいて、 この区分されたブロック内の信号のスペクトルの低域側 のパワー偏在及びピークを検出する工程と、 この検出されたピーク近傍の極大値を検出する工程と、 上記検出されたピークの位置と上記極大値の位置とに基 づいてピッチを検出する工程とを有することを特徴とす るピッチ検出方法。

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal (音声信号) prevents updating of noise energy estimates when a music signal (出方法) is detected .
JPH07114396A
CLAIM 1
【請求項1】 入力音声信号 (sound signal) を時間軸上でブロック単位 で区分し、この区分された各ブロックの信号毎に音声の 基本周期に相当するピッチを検出するピッチ検出方法 (music signal) に おいて、 この区分されたブロック内の信号のスペクトルの低域側 のパワー偏在及びピークを検出する工程と、 この検出されたピーク近傍の極大値を検出する工程と、 上記検出されたピークの位置と上記極大値の位置とに基 づいてピッチを検出する工程とを有することを特徴とす るピッチ検出方法

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal (出方法) from a background noise signal and prevent update of noise energy estimates on the music signal .
JPH07114396A
CLAIM 1
【請求項1】 入力音声信号を時間軸上でブロック単位 で区分し、この区分された各ブロックの信号毎に音声の 基本周期に相当するピッチを検出するピッチ検出方法 (music signal) に おいて、 この区分されたブロック内の信号のスペクトルの低域側 のパワー偏在及びピークを検出する工程と、 この検出されたピーク近傍の極大値を検出する工程と、 上記検出されたピークの位置と上記極大値の位置とに基 づいてピッチを検出する工程とを有することを特徴とす るピッチ検出方法

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal (音声信号) in a current frame and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
JPH07114396A
CLAIM 1
【請求項1】 入力音声信号 (sound signal) を時間軸上でブロック単位 で区分し、この区分された各ブロックの信号毎に音声の 基本周期に相当するピッチを検出するピッチ検出方法に おいて、 この区分されたブロック内の信号のスペクトルの低域側 のパワー偏在及びピークを検出する工程と、 この検出されたピーク近傍の極大値を検出する工程と、 上記検出されたピークの位置と上記極大値の位置とに基 づいてピッチを検出する工程とを有することを特徴とす るピッチ検出方法。

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter indicative of an activity of the sound signal (音声信号) .
JPH07114396A
CLAIM 1
【請求項1】 入力音声信号 (sound signal) を時間軸上でブロック単位 で区分し、この区分された各ブロックの信号毎に音声の 基本周期に相当するピッチを検出するピッチ検出方法に おいて、 この区分されたブロック内の信号のスペクトルの低域側 のパワー偏在及びピークを検出する工程と、 この検出されたピーク近傍の極大値を検出する工程と、 上記検出されたピークの位置と上記極大値の位置とに基 づいてピッチを検出する工程とを有することを特徴とす るピッチ検出方法。

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal (音声信号) and the complementary non-stationarity parameter .
JPH07114396A
CLAIM 1
【請求項1】 入力音声信号 (sound signal) を時間軸上でブロック単位 で区分し、この区分された各ブロックの信号毎に音声の 基本周期に相当するピッチを検出するピッチ検出方法に おいて、 この区分されたブロック内の信号のスペクトルの低域側 のパワー偏在及びピークを検出する工程と、 この検出されたピーク近傍の極大値を検出する工程と、 上記検出されたピークの位置と上記極大値の位置とに基 づいてピッチを検出する工程とを有することを特徴とす るピッチ検出方法。

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal (音声信号) using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
JPH07114396A
CLAIM 1
【請求項1】 入力音声信号 (sound signal) を時間軸上でブロック単位 で区分し、この区分された各ブロックの信号毎に音声の 基本周期に相当するピッチを検出するピッチ検出方法に おいて、 この区分されたブロック内の信号のスペクトルの低域側 のパワー偏在及びピークを検出する工程と、 この検出されたピーク近傍の極大値を検出する工程と、 上記検出されたピークの位置と上記極大値の位置とに基 づいてピッチを検出する工程とを有することを特徴とす るピッチ検出方法。

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal (音声信号) using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
JPH07114396A
CLAIM 1
【請求項1】 入力音声信号 (sound signal) を時間軸上でブロック単位 で区分し、この区分された各ブロックの信号毎に音声の 基本周期に相当するピッチを検出するピッチ検出方法に おいて、 この区分されたブロック内の信号のスペクトルの低域側 のパワー偏在及びピークを検出する工程と、 この検出されたピーク近傍の極大値を検出する工程と、 上記検出されたピークの位置と上記極大値の位置とに基 づいてピッチを検出する工程とを有することを特徴とす るピッチ検出方法。

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal (音声信号) in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
JPH07114396A
CLAIM 1
【請求項1】 入力音声信号 (sound signal) を時間軸上でブロック単位 で区分し、この区分された各ブロックの信号毎に音声の 基本周期に相当するピッチを検出するピッチ検出方法に おいて、 この区分されたブロック内の信号のスペクトルの低域側 のパワー偏在及びピークを検出する工程と、 この検出されたピーク近傍の極大値を検出する工程と、 上記検出されたピークの位置と上記極大値の位置とに基 づいてピッチを検出する工程とを有することを特徴とす るピッチ検出方法。

US8990073B2
CLAIM 34
. A device as defined in claim 31 , further comprising a detector of strong tones in the sound signal (音声信号) .
JPH07114396A
CLAIM 1
【請求項1】 入力音声信号 (sound signal) を時間軸上でブロック単位 で区分し、この区分された各ブロックの信号毎に音声の 基本周期に相当するピッチを検出するピッチ検出方法に おいて、 この区分されたブロック内の信号のスペクトルの低域側 のパワー偏在及びピークを検出する工程と、 この検出されたピーク近傍の極大値を検出する工程と、 上記検出されたピークの位置と上記極大値の位置とに基 づいてピッチを検出する工程とを有することを特徴とす るピッチ検出方法。

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal (音声信号) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (出方法) from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
JPH07114396A
CLAIM 1
【請求項1】 入力音声信号 (sound signal) を時間軸上でブロック単位 で区分し、この区分された各ブロックの信号毎に音声の 基本周期に相当するピッチを検出するピッチ検出方法 (music signal) に おいて、 この区分されたブロック内の信号のスペクトルの低域側 のパワー偏在及びピークを検出する工程と、 この検出されたピーク近傍の極大値を検出する工程と、 上記検出されたピークの位置と上記極大値の位置とに基 づいてピッチを検出する工程とを有することを特徴とす るピッチ検出方法

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal (音声信号) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal (出方法) from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
JPH07114396A
CLAIM 1
【請求項1】 入力音声信号 (sound signal) を時間軸上でブロック単位 で区分し、この区分された各ブロックの信号毎に音声の 基本周期に相当するピッチを検出するピッチ検出方法 (music signal) に おいて、 この区分されたブロック内の信号のスペクトルの低域側 のパワー偏在及びピークを検出する工程と、 この検出されたピーク近傍の極大値を検出する工程と、 上記検出されたピークの位置と上記極大値の位置とに基 づいてピッチを検出する工程とを有することを特徴とす るピッチ検出方法

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal (音声信号) for distinguishing a music signal (出方法) from a background noise signal and preventing update of noise energy estimates .
JPH07114396A
CLAIM 1
【請求項1】 入力音声信号 (sound signal) を時間軸上でブロック単位 で区分し、この区分された各ブロックの信号毎に音声の 基本周期に相当するピッチを検出するピッチ検出方法 (music signal) に おいて、 この区分されたブロック内の信号のスペクトルの低域側 のパワー偏在及びピークを検出する工程と、 この検出されたピーク近傍の極大値を検出する工程と、 上記検出されたピークの位置と上記極大値の位置とに基 づいてピッチを検出する工程とを有することを特徴とす るピッチ検出方法

US8990073B2
CLAIM 41
. A device as defined in claim 36 , further comprising a calculator of a spectral parameter used for detecting spectral changes and spectral attacks in the sound signal (音声信号) .
JPH07114396A
CLAIM 1
【請求項1】 入力音声信号 (sound signal) を時間軸上でブロック単位 で区分し、この区分された各ブロックの信号毎に音声の 基本周期に相当するピッチを検出するピッチ検出方法に おいて、 この区分されたブロック内の信号のスペクトルの低域側 のパワー偏在及びピークを検出する工程と、 この検出されたピーク近傍の極大値を検出する工程と、 上記検出されたピークの位置と上記極大値の位置とに基 づいてピッチを検出する工程とを有することを特徴とす るピッチ検出方法。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US5406635A

Filed: 1993-02-05     Issued: 1995-04-11

Noise attenuation system

(Original Assignee) Nokia Mobile Phones Ltd     (Current Assignee) Intellectual Ventures I LLC

Kari J. Jarvinen
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (noise attenuation) using a frequency spectrum (band signals, noise estimation) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US5406635A
CLAIM 1
. A noise attenuation (sound signal) system for attenuating noise in a signal , comprising a filter for dividing the signal into a plurality of channels of a predetermined bandwidth , means for calculating the signal strength in each channel and estimating the noise strength in each channel , said system further comprising ;
a buffer for processing of the signal sequentially in distinct time periods of a predetermined length , and having an output coupled to a tonality decision block and said filter , said tonality decision block classifying each period as tonal or toneless , a background noise measurement system coupled to said filter and responsive to signals in said plurality of channels , for determining an estimate of background noise strength , said background noise measurement system splitting frequency components of signals from at least one channel into plural frequency ranges during each time period in which a tonal signal is indicated for said channel , and determining from one split frequency range , said background noise strength and a signal to noise ratio for each channel in each time period , a gain calculation block for determining a gain coefficient for each channel in each time period , such that the channel attenuation is increased for a decreasing signal to noise ratio , the gain calculation block coupled to a multiplication block for amplifying the signals in the plurality of channels dependent on the determined gain coefficient , and an assembly filter which reassembles noise attenuated channels to produce a noise attenuated signal .

US5406635A
CLAIM 2
. A noise attenuation system according to claim 1 , characterized in that the background noise measurement system comprises a splitting filter group for providing a plurality of first narrow passband signals (frequency dependent signal, frequency bands, first frequency bands, frequency bins, first group, first frequency, first energy, frequency spectrum, frequency bin, frequency bin basis, first energy value) , a background noise estimation (frequency dependent signal, frequency bands, first frequency bands, frequency bins, first group, first frequency, first energy, frequency spectrum, frequency bin, frequency bin basis, first energy value) block for providing a noise estimate for each first narrow passband signal during a tonal period , and a channel power estimation block for estimating power in each said first narrow passband signal .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (band signals, noise estimation) of the sound signal (noise attenuation) in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US5406635A
CLAIM 1
. A noise attenuation (sound signal) system for attenuating noise in a signal , comprising a filter for dividing the signal into a plurality of channels of a predetermined bandwidth , means for calculating the signal strength in each channel and estimating the noise strength in each channel , said system further comprising ;
a buffer for processing of the signal sequentially in distinct time periods of a predetermined length , and having an output coupled to a tonality decision block and said filter , said tonality decision block classifying each period as tonal or toneless , a background noise measurement system coupled to said filter and responsive to signals in said plurality of channels , for determining an estimate of background noise strength , said background noise measurement system splitting frequency components of signals from at least one channel into plural frequency ranges during each time period in which a tonal signal is indicated for said channel , and determining from one split frequency range , said background noise strength and a signal to noise ratio for each channel in each time period , a gain calculation block for determining a gain coefficient for each channel in each time period , such that the channel attenuation is increased for a decreasing signal to noise ratio , the gain calculation block coupled to a multiplication block for amplifying the signals in the plurality of channels dependent on the determined gain coefficient , and an assembly filter which reassembles noise attenuated channels to produce a noise attenuated signal .

US5406635A
CLAIM 2
. A noise attenuation system according to claim 1 , characterized in that the background noise measurement system comprises a splitting filter group for providing a plurality of first narrow passband signals (frequency dependent signal, frequency bands, first frequency bands, frequency bins, first group, first frequency, first energy, frequency spectrum, frequency bin, frequency bin basis, first energy value) , a background noise estimation (frequency dependent signal, frequency bands, first frequency bands, frequency bins, first group, first frequency, first energy, frequency spectrum, frequency bin, frequency bin basis, first energy value) block for providing a noise estimate for each first narrow passband signal during a tonal period , and a channel power estimation block for estimating power in each said first narrow passband signal .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (band signals, noise estimation) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US5406635A
CLAIM 2
. A noise attenuation system according to claim 1 , characterized in that the background noise measurement system comprises a splitting filter group for providing a plurality of first narrow passband signals (frequency dependent signal, frequency bands, first frequency bands, frequency bins, first group, first frequency, first energy, frequency spectrum, frequency bin, frequency bin basis, first energy value) , a background noise estimation (frequency dependent signal, frequency bands, first frequency bands, frequency bins, first group, first frequency, first energy, frequency spectrum, frequency bin, frequency bin basis, first energy value) block for providing a noise estimate for each first narrow passband signal during a tonal period , and a channel power estimation block for estimating power in each said first narrow passband signal .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin (band signals, noise estimation) by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (band signals, noise estimation) so as to produce a summed long-term correlation map .
US5406635A
CLAIM 2
. A noise attenuation system according to claim 1 , characterized in that the background noise measurement system comprises a splitting filter group for providing a plurality of first narrow passband signals (frequency dependent signal, frequency bands, first frequency bands, frequency bins, first group, first frequency, first energy, frequency spectrum, frequency bin, frequency bin basis, first energy value) , a background noise estimation (frequency dependent signal, frequency bands, first frequency bands, frequency bins, first group, first frequency, first energy, frequency spectrum, frequency bin, frequency bin basis, first energy value) block for providing a noise estimate for each first narrow passband signal during a tonal period , and a channel power estimation block for estimating power in each said first narrow passband signal .

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (noise attenuation) .
US5406635A
CLAIM 1
. A noise attenuation (sound signal) system for attenuating noise in a signal , comprising a filter for dividing the signal into a plurality of channels of a predetermined bandwidth , means for calculating the signal strength in each channel and estimating the noise strength in each channel , said system further comprising ;
a buffer for processing of the signal sequentially in distinct time periods of a predetermined length , and having an output coupled to a tonality decision block and said filter , said tonality decision block classifying each period as tonal or toneless , a background noise measurement system coupled to said filter and responsive to signals in said plurality of channels , for determining an estimate of background noise strength , said background noise measurement system splitting frequency components of signals from at least one channel into plural frequency ranges during each time period in which a tonal signal is indicated for said channel , and determining from one split frequency range , said background noise strength and a signal to noise ratio for each channel in each time period , a gain calculation block for determining a gain coefficient for each channel in each time period , such that the channel attenuation is increased for a decreasing signal to noise ratio , the gain calculation block coupled to a multiplication block for amplifying the signals in the plurality of channels dependent on the determined gain coefficient , and an assembly filter which reassembles noise attenuated channels to produce a noise attenuated signal .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (noise attenuation) comprises searching in the correlation map for frequency bins (band signals, noise estimation) having a magnitude that exceeds a given fixed threshold .
US5406635A
CLAIM 1
. A noise attenuation (sound signal) system for attenuating noise in a signal , comprising a filter for dividing the signal into a plurality of channels of a predetermined bandwidth , means for calculating the signal strength in each channel and estimating the noise strength in each channel , said system further comprising ;
a buffer for processing of the signal sequentially in distinct time periods of a predetermined length , and having an output coupled to a tonality decision block and said filter , said tonality decision block classifying each period as tonal or toneless , a background noise measurement system coupled to said filter and responsive to signals in said plurality of channels , for determining an estimate of background noise strength , said background noise measurement system splitting frequency components of signals from at least one channel into plural frequency ranges during each time period in which a tonal signal is indicated for said channel , and determining from one split frequency range , said background noise strength and a signal to noise ratio for each channel in each time period , a gain calculation block for determining a gain coefficient for each channel in each time period , such that the channel attenuation is increased for a decreasing signal to noise ratio , the gain calculation block coupled to a multiplication block for amplifying the signals in the plurality of channels dependent on the determined gain coefficient , and an assembly filter which reassembles noise attenuated channels to produce a noise attenuated signal .

US5406635A
CLAIM 2
. A noise attenuation system according to claim 1 , characterized in that the background noise measurement system comprises a splitting filter group for providing a plurality of first narrow passband signals (frequency dependent signal, frequency bands, first frequency bands, frequency bins, first group, first frequency, first energy, frequency spectrum, frequency bin, frequency bin basis, first energy value) , a background noise estimation (frequency dependent signal, frequency bands, first frequency bands, frequency bins, first group, first frequency, first energy, frequency spectrum, frequency bin, frequency bin basis, first energy value) block for providing a noise estimate for each first narrow passband signal during a tonal period , and a channel power estimation block for estimating power in each said first narrow passband signal .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (noise attenuation) comprises comparing the summed long-term correlation map with an adaptive threshold indicative of sound activity in the sound signal .
US5406635A
CLAIM 1
. A noise attenuation (sound signal) system for attenuating noise in a signal , comprising a filter for dividing the signal into a plurality of channels of a predetermined bandwidth , means for calculating the signal strength in each channel and estimating the noise strength in each channel , said system further comprising ;
a buffer for processing of the signal sequentially in distinct time periods of a predetermined length , and having an output coupled to a tonality decision block and said filter , said tonality decision block classifying each period as tonal or toneless , a background noise measurement system coupled to said filter and responsive to signals in said plurality of channels , for determining an estimate of background noise strength , said background noise measurement system splitting frequency components of signals from at least one channel into plural frequency ranges during each time period in which a tonal signal is indicated for said channel , and determining from one split frequency range , said background noise strength and a signal to noise ratio for each channel in each time period , a gain calculation block for determining a gain coefficient for each channel in each time period , such that the channel attenuation is increased for a decreasing signal to noise ratio , the gain calculation block coupled to a multiplication block for amplifying the signals in the plurality of channels dependent on the determined gain coefficient , and an assembly filter which reassembles noise attenuated channels to produce a noise attenuated signal .

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal (noise attenuation) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US5406635A
CLAIM 1
. A noise attenuation (sound signal) system for attenuating noise in a signal , comprising a filter for dividing the signal into a plurality of channels of a predetermined bandwidth , means for calculating the signal strength in each channel and estimating the noise strength in each channel , said system further comprising ;
a buffer for processing of the signal sequentially in distinct time periods of a predetermined length , and having an output coupled to a tonality decision block and said filter , said tonality decision block classifying each period as tonal or toneless , a background noise measurement system coupled to said filter and responsive to signals in said plurality of channels , for determining an estimate of background noise strength , said background noise measurement system splitting frequency components of signals from at least one channel into plural frequency ranges during each time period in which a tonal signal is indicated for said channel , and determining from one split frequency range , said background noise strength and a signal to noise ratio for each channel in each time period , a gain calculation block for determining a gain coefficient for each channel in each time period , such that the channel attenuation is increased for a decreasing signal to noise ratio , the gain calculation block coupled to a multiplication block for amplifying the signals in the plurality of channels dependent on the determined gain coefficient , and an assembly filter which reassembles noise attenuated channels to produce a noise attenuated signal .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound signal (noise attenuation) is detected .
US5406635A
CLAIM 1
. A noise attenuation (sound signal) system for attenuating noise in a signal , comprising a filter for dividing the signal into a plurality of channels of a predetermined bandwidth , means for calculating the signal strength in each channel and estimating the noise strength in each channel , said system further comprising ;
a buffer for processing of the signal sequentially in distinct time periods of a predetermined length , and having an output coupled to a tonality decision block and said filter , said tonality decision block classifying each period as tonal or toneless , a background noise measurement system coupled to said filter and responsive to signals in said plurality of channels , for determining an estimate of background noise strength , said background noise measurement system splitting frequency components of signals from at least one channel into plural frequency ranges during each time period in which a tonal signal is indicated for said channel , and determining from one split frequency range , said background noise strength and a signal to noise ratio for each channel in each time period , a gain calculation block for determining a gain coefficient for each channel in each time period , such that the channel attenuation is increased for a decreasing signal to noise ratio , the gain calculation block coupled to a multiplication block for amplifying the signals in the plurality of channels dependent on the determined gain coefficient , and an assembly filter which reassembles noise attenuated channels to produce a noise attenuated signal .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity in the sound signal (noise attenuation) further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
US5406635A
CLAIM 1
. A noise attenuation (sound signal) system for attenuating noise in a signal , comprising a filter for dividing the signal into a plurality of channels of a predetermined bandwidth , means for calculating the signal strength in each channel and estimating the noise strength in each channel , said system further comprising ;
a buffer for processing of the signal sequentially in distinct time periods of a predetermined length , and having an output coupled to a tonality decision block and said filter , said tonality decision block classifying each period as tonal or toneless , a background noise measurement system coupled to said filter and responsive to signals in said plurality of channels , for determining an estimate of background noise strength , said background noise measurement system splitting frequency components of signals from at least one channel into plural frequency ranges during each time period in which a tonal signal is indicated for said channel , and determining from one split frequency range , said background noise strength and a signal to noise ratio for each channel in each time period , a gain calculation block for determining a gain coefficient for each channel in each time period , such that the channel attenuation is increased for a decreasing signal to noise ratio , the gain calculation block coupled to a multiplication block for amplifying the signals in the plurality of channels dependent on the determined gain coefficient , and an assembly filter which reassembles noise attenuated channels to produce a noise attenuated signal .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection comprises detecting the sound signal (noise attenuation) based on a frequency dependent signal-to-noise ratio (SNR) .
US5406635A
CLAIM 1
. A noise attenuation (sound signal) system for attenuating noise in a signal , comprising a filter for dividing the signal into a plurality of channels of a predetermined bandwidth , means for calculating the signal strength in each channel and estimating the noise strength in each channel , said system further comprising ;
a buffer for processing of the signal sequentially in distinct time periods of a predetermined length , and having an output coupled to a tonality decision block and said filter , said tonality decision block classifying each period as tonal or toneless , a background noise measurement system coupled to said filter and responsive to signals in said plurality of channels , for determining an estimate of background noise strength , said background noise measurement system splitting frequency components of signals from at least one channel into plural frequency ranges during each time period in which a tonal signal is indicated for said channel , and determining from one split frequency range , said background noise strength and a signal to noise ratio for each channel in each time period , a gain calculation block for determining a gain coefficient for each channel in each time period , such that the channel attenuation is increased for a decreasing signal to noise ratio , the gain calculation block coupled to a multiplication block for amplifying the signals in the plurality of channels dependent on the determined gain coefficient , and an assembly filter which reassembles noise attenuated channels to produce a noise attenuated signal .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal (noise attenuation) further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (noise ratio, noise power) .
US5406635A
CLAIM 1
. A noise attenuation (sound signal) system for attenuating noise in a signal , comprising a filter for dividing the signal into a plurality of channels of a predetermined bandwidth , means for calculating the signal strength in each channel and estimating the noise strength in each channel , said system further comprising ;
a buffer for processing of the signal sequentially in distinct time periods of a predetermined length , and having an output coupled to a tonality decision block and said filter , said tonality decision block classifying each period as tonal or toneless , a background noise measurement system coupled to said filter and responsive to signals in said plurality of channels , for determining an estimate of background noise strength , said background noise measurement system splitting frequency components of signals from at least one channel into plural frequency ranges during each time period in which a tonal signal is indicated for said channel , and determining from one split frequency range , said background noise strength and a signal to noise ratio (noise ratio, SNR LT, SNR calculation) for each channel in each time period , a gain calculation block for determining a gain coefficient for each channel in each time period , such that the channel attenuation is increased for a decreasing signal to noise ratio , the gain calculation block coupled to a multiplication block for amplifying the signals in the plurality of channels dependent on the determined gain coefficient , and an assembly filter which reassembles noise attenuated channels to produce a noise attenuated signal .

US5406635A
CLAIM 9
. A method according to claim 8 , characterized in that the noise signal strength is estimated during periods classified as tonal , so that powers are measured of two narrow partial passbands that are separated by a frequency range that is smaller than the basic frequency on each channel of the noise attenuation system , and a lower partial passband power is defined to represent noise prevailing on the respective channel , whereby the background noise power (noise ratio, SNR LT, SNR calculation) of each channel is selected by suitably scaling the partial passband power .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal (noise attenuation) and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
US5406635A
CLAIM 1
. A noise attenuation (sound signal) system for attenuating noise in a signal , comprising a filter for dividing the signal into a plurality of channels of a predetermined bandwidth , means for calculating the signal strength in each channel and estimating the noise strength in each channel , said system further comprising ;
a buffer for processing of the signal sequentially in distinct time periods of a predetermined length , and having an output coupled to a tonality decision block and said filter , said tonality decision block classifying each period as tonal or toneless , a background noise measurement system coupled to said filter and responsive to signals in said plurality of channels , for determining an estimate of background noise strength , said background noise measurement system splitting frequency components of signals from at least one channel into plural frequency ranges during each time period in which a tonal signal is indicated for said channel , and determining from one split frequency range , said background noise strength and a signal to noise ratio for each channel in each time period , a gain calculation block for determining a gain coefficient for each channel in each time period , such that the channel attenuation is increased for a decreasing signal to noise ratio , the gain calculation block coupled to a multiplication block for amplifying the signals in the plurality of channels dependent on the determined gain coefficient , and an assembly filter which reassembles noise attenuated channels to produce a noise attenuated signal .

US8990073B2
CLAIM 18
. A method as defined in claim 14 , comprising classifying the sound signal (noise attenuation) as one of an inactive sound signal and active sound signal , which comprises determining an inactive sound signal when the average signal-to-noise ratio (SNR av ) is inferior to the calculated threshold .
US5406635A
CLAIM 1
. A noise attenuation (sound signal) system for attenuating noise in a signal , comprising a filter for dividing the signal into a plurality of channels of a predetermined bandwidth , means for calculating the signal strength in each channel and estimating the noise strength in each channel , said system further comprising ;
a buffer for processing of the signal sequentially in distinct time periods of a predetermined length , and having an output coupled to a tonality decision block and said filter , said tonality decision block classifying each period as tonal or toneless , a background noise measurement system coupled to said filter and responsive to signals in said plurality of channels , for determining an estimate of background noise strength , said background noise measurement system splitting frequency components of signals from at least one channel into plural frequency ranges during each time period in which a tonal signal is indicated for said channel , and determining from one split frequency range , said background noise strength and a signal to noise ratio for each channel in each time period , a gain calculation block for determining a gain coefficient for each channel in each time period , such that the channel attenuation is increased for a decreasing signal to noise ratio , the gain calculation block coupled to a multiplication block for amplifying the signals in the plurality of channels dependent on the determined gain coefficient , and an assembly filter which reassembles noise attenuated channels to produce a noise attenuated signal .

US8990073B2
CLAIM 19
. A method as defined in claim 14 , comprising classifying the sound signal (noise attenuation) as one of an inactive sound signal and active sound signal , which comprises determining an active sound signal when the average signal-to-noise ratio (SNR av ) is larger than the calculated threshold .
US5406635A
CLAIM 1
. A noise attenuation (sound signal) system for attenuating noise in a signal , comprising a filter for dividing the signal into a plurality of channels of a predetermined bandwidth , means for calculating the signal strength in each channel and estimating the noise strength in each channel , said system further comprising ;
a buffer for processing of the signal sequentially in distinct time periods of a predetermined length , and having an output coupled to a tonality decision block and said filter , said tonality decision block classifying each period as tonal or toneless , a background noise measurement system coupled to said filter and responsive to signals in said plurality of channels , for determining an estimate of background noise strength , said background noise measurement system splitting frequency components of signals from at least one channel into plural frequency ranges during each time period in which a tonal signal is indicated for said channel , and determining from one split frequency range , said background noise strength and a signal to noise ratio for each channel in each time period , a gain calculation block for determining a gain coefficient for each channel in each time period , such that the channel attenuation is increased for a decreasing signal to noise ratio , the gain calculation block coupled to a multiplication block for amplifying the signals in the plurality of channels dependent on the determined gain coefficient , and an assembly filter which reassembles noise attenuated channels to produce a noise attenuated signal .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal (noise attenuation) prevents updating of noise energy estimates when a music signal is detected .
US5406635A
CLAIM 1
. A noise attenuation (sound signal) system for attenuating noise in a signal , comprising a filter for dividing the signal into a plurality of channels of a predetermined bandwidth , means for calculating the signal strength in each channel and estimating the noise strength in each channel , said system further comprising ;
a buffer for processing of the signal sequentially in distinct time periods of a predetermined length , and having an output coupled to a tonality decision block and said filter , said tonality decision block classifying each period as tonal or toneless , a background noise measurement system coupled to said filter and responsive to signals in said plurality of channels , for determining an estimate of background noise strength , said background noise measurement system splitting frequency components of signals from at least one channel into plural frequency ranges during each time period in which a tonal signal is indicated for said channel , and determining from one split frequency range , said background noise strength and a signal to noise ratio for each channel in each time period , a gain calculation block for determining a gain coefficient for each channel in each time period , such that the channel attenuation is increased for a decreasing signal to noise ratio , the gain calculation block coupled to a multiplication block for amplifying the signals in the plurality of channels dependent on the determined gain coefficient , and an assembly filter which reassembles noise attenuated channels to produce a noise attenuated signal .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal (noise attenuation) in a current frame and an energy of the sound signal in a previous frame , for frequency bands (band signals, noise estimation) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US5406635A
CLAIM 1
. A noise attenuation (sound signal) system for attenuating noise in a signal , comprising a filter for dividing the signal into a plurality of channels of a predetermined bandwidth , means for calculating the signal strength in each channel and estimating the noise strength in each channel , said system further comprising ;
a buffer for processing of the signal sequentially in distinct time periods of a predetermined length , and having an output coupled to a tonality decision block and said filter , said tonality decision block classifying each period as tonal or toneless , a background noise measurement system coupled to said filter and responsive to signals in said plurality of channels , for determining an estimate of background noise strength , said background noise measurement system splitting frequency components of signals from at least one channel into plural frequency ranges during each time period in which a tonal signal is indicated for said channel , and determining from one split frequency range , said background noise strength and a signal to noise ratio for each channel in each time period , a gain calculation block for determining a gain coefficient for each channel in each time period , such that the channel attenuation is increased for a decreasing signal to noise ratio , the gain calculation block coupled to a multiplication block for amplifying the signals in the plurality of channels dependent on the determined gain coefficient , and an assembly filter which reassembles noise attenuated channels to produce a noise attenuated signal .

US5406635A
CLAIM 2
. A noise attenuation system according to claim 1 , characterized in that the background noise measurement system comprises a splitting filter group for providing a plurality of first narrow passband signals (frequency dependent signal, frequency bands, first frequency bands, frequency bins, first group, first frequency, first energy, frequency spectrum, frequency bin, frequency bin basis, first energy value) , a background noise estimation (frequency dependent signal, frequency bands, first frequency bands, frequency bins, first group, first frequency, first energy, frequency spectrum, frequency bin, frequency bin basis, first energy value) block for providing a noise estimate for each first narrow passband signal during a tonal period , and a channel power estimation block for estimating power in each said first narrow passband signal .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter indicative of an activity of the sound signal (noise attenuation) .
US5406635A
CLAIM 1
. A noise attenuation (sound signal) system for attenuating noise in a signal , comprising a filter for dividing the signal into a plurality of channels of a predetermined bandwidth , means for calculating the signal strength in each channel and estimating the noise strength in each channel , said system further comprising ;
a buffer for processing of the signal sequentially in distinct time periods of a predetermined length , and having an output coupled to a tonality decision block and said filter , said tonality decision block classifying each period as tonal or toneless , a background noise measurement system coupled to said filter and responsive to signals in said plurality of channels , for determining an estimate of background noise strength , said background noise measurement system splitting frequency components of signals from at least one channel into plural frequency ranges during each time period in which a tonal signal is indicated for said channel , and determining from one split frequency range , said background noise strength and a signal to noise ratio for each channel in each time period , a gain calculation block for determining a gain coefficient for each channel in each time period , such that the channel attenuation is increased for a decreasing signal to noise ratio , the gain calculation block coupled to a multiplication block for amplifying the signals in the plurality of channels dependent on the determined gain coefficient , and an assembly filter which reassembles noise attenuated channels to produce a noise attenuated signal .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal (noise attenuation) and the complementary non-stationarity parameter .
US5406635A
CLAIM 1
. A noise attenuation (sound signal) system for attenuating noise in a signal , comprising a filter for dividing the signal into a plurality of channels of a predetermined bandwidth , means for calculating the signal strength in each channel and estimating the noise strength in each channel , said system further comprising ;
a buffer for processing of the signal sequentially in distinct time periods of a predetermined length , and having an output coupled to a tonality decision block and said filter , said tonality decision block classifying each period as tonal or toneless , a background noise measurement system coupled to said filter and responsive to signals in said plurality of channels , for determining an estimate of background noise strength , said background noise measurement system splitting frequency components of signals from at least one channel into plural frequency ranges during each time period in which a tonal signal is indicated for said channel , and determining from one split frequency range , said background noise strength and a signal to noise ratio for each channel in each time period , a gain calculation block for determining a gain coefficient for each channel in each time period , such that the channel attenuation is increased for a decreasing signal to noise ratio , the gain calculation block coupled to a multiplication block for amplifying the signals in the plurality of channels dependent on the determined gain coefficient , and an assembly filter which reassembles noise attenuated channels to produce a noise attenuated signal .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (band signals, noise estimation) into a first group (band signals, noise estimation) of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy (band signals, noise estimation) value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US5406635A
CLAIM 2
. A noise attenuation system according to claim 1 , characterized in that the background noise measurement system comprises a splitting filter group for providing a plurality of first narrow passband signals (frequency dependent signal, frequency bands, first frequency bands, frequency bins, first group, first frequency, first energy, frequency spectrum, frequency bin, frequency bin basis, first energy value) , a background noise estimation (frequency dependent signal, frequency bands, first frequency bands, frequency bins, first group, first frequency, first energy, frequency spectrum, frequency bin, frequency bin basis, first energy value) block for providing a noise estimate for each first narrow passband signal during a tonal period , and a channel power estimation block for estimating power in each said first narrow passband signal .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal (noise attenuation) using a frequency spectrum (band signals, noise estimation) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US5406635A
CLAIM 1
. A noise attenuation (sound signal) system for attenuating noise in a signal , comprising a filter for dividing the signal into a plurality of channels of a predetermined bandwidth , means for calculating the signal strength in each channel and estimating the noise strength in each channel , said system further comprising ;
a buffer for processing of the signal sequentially in distinct time periods of a predetermined length , and having an output coupled to a tonality decision block and said filter , said tonality decision block classifying each period as tonal or toneless , a background noise measurement system coupled to said filter and responsive to signals in said plurality of channels , for determining an estimate of background noise strength , said background noise measurement system splitting frequency components of signals from at least one channel into plural frequency ranges during each time period in which a tonal signal is indicated for said channel , and determining from one split frequency range , said background noise strength and a signal to noise ratio for each channel in each time period , a gain calculation block for determining a gain coefficient for each channel in each time period , such that the channel attenuation is increased for a decreasing signal to noise ratio , the gain calculation block coupled to a multiplication block for amplifying the signals in the plurality of channels dependent on the determined gain coefficient , and an assembly filter which reassembles noise attenuated channels to produce a noise attenuated signal .

US5406635A
CLAIM 2
. A noise attenuation system according to claim 1 , characterized in that the background noise measurement system comprises a splitting filter group for providing a plurality of first narrow passband signals (frequency dependent signal, frequency bands, first frequency bands, frequency bins, first group, first frequency, first energy, frequency spectrum, frequency bin, frequency bin basis, first energy value) , a background noise estimation (frequency dependent signal, frequency bands, first frequency bands, frequency bins, first group, first frequency, first energy, frequency spectrum, frequency bin, frequency bin basis, first energy value) block for providing a noise estimate for each first narrow passband signal during a tonal period , and a channel power estimation block for estimating power in each said first narrow passband signal .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal (noise attenuation) using a frequency spectrum (band signals, noise estimation) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US5406635A
CLAIM 1
. A noise attenuation (sound signal) system for attenuating noise in a signal , comprising a filter for dividing the signal into a plurality of channels of a predetermined bandwidth , means for calculating the signal strength in each channel and estimating the noise strength in each channel , said system further comprising ;
a buffer for processing of the signal sequentially in distinct time periods of a predetermined length , and having an output coupled to a tonality decision block and said filter , said tonality decision block classifying each period as tonal or toneless , a background noise measurement system coupled to said filter and responsive to signals in said plurality of channels , for determining an estimate of background noise strength , said background noise measurement system splitting frequency components of signals from at least one channel into plural frequency ranges during each time period in which a tonal signal is indicated for said channel , and determining from one split frequency range , said background noise strength and a signal to noise ratio for each channel in each time period , a gain calculation block for determining a gain coefficient for each channel in each time period , such that the channel attenuation is increased for a decreasing signal to noise ratio , the gain calculation block coupled to a multiplication block for amplifying the signals in the plurality of channels dependent on the determined gain coefficient , and an assembly filter which reassembles noise attenuated channels to produce a noise attenuated signal .

US5406635A
CLAIM 2
. A noise attenuation system according to claim 1 , characterized in that the background noise measurement system comprises a splitting filter group for providing a plurality of first narrow passband signals (frequency dependent signal, frequency bands, first frequency bands, frequency bins, first group, first frequency, first energy, frequency spectrum, frequency bin, frequency bin basis, first energy value) , a background noise estimation (frequency dependent signal, frequency bands, first frequency bands, frequency bins, first group, first frequency, first energy, frequency spectrum, frequency bin, frequency bin basis, first energy value) block for providing a noise estimate for each first narrow passband signal during a tonal period , and a channel power estimation block for estimating power in each said first narrow passband signal .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (band signals, noise estimation) of the sound signal (noise attenuation) in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US5406635A
CLAIM 1
. A noise attenuation (sound signal) system for attenuating noise in a signal , comprising a filter for dividing the signal into a plurality of channels of a predetermined bandwidth , means for calculating the signal strength in each channel and estimating the noise strength in each channel , said system further comprising ;
a buffer for processing of the signal sequentially in distinct time periods of a predetermined length , and having an output coupled to a tonality decision block and said filter , said tonality decision block classifying each period as tonal or toneless , a background noise measurement system coupled to said filter and responsive to signals in said plurality of channels , for determining an estimate of background noise strength , said background noise measurement system splitting frequency components of signals from at least one channel into plural frequency ranges during each time period in which a tonal signal is indicated for said channel , and determining from one split frequency range , said background noise strength and a signal to noise ratio for each channel in each time period , a gain calculation block for determining a gain coefficient for each channel in each time period , such that the channel attenuation is increased for a decreasing signal to noise ratio , the gain calculation block coupled to a multiplication block for amplifying the signals in the plurality of channels dependent on the determined gain coefficient , and an assembly filter which reassembles noise attenuated channels to produce a noise attenuated signal .

US5406635A
CLAIM 2
. A noise attenuation system according to claim 1 , characterized in that the background noise measurement system comprises a splitting filter group for providing a plurality of first narrow passband signals (frequency dependent signal, frequency bands, first frequency bands, frequency bins, first group, first frequency, first energy, frequency spectrum, frequency bin, frequency bin basis, first energy value) , a background noise estimation (frequency dependent signal, frequency bands, first frequency bands, frequency bins, first group, first frequency, first energy, frequency spectrum, frequency bin, frequency bin basis, first energy value) block for providing a noise estimate for each first narrow passband signal during a tonal period , and a channel power estimation block for estimating power in each said first narrow passband signal .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin (band signals, noise estimation) by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (band signals, noise estimation) so as to produce a summed long-term correlation map .
US5406635A
CLAIM 2
. A noise attenuation system according to claim 1 , characterized in that the background noise measurement system comprises a splitting filter group for providing a plurality of first narrow passband signals (frequency dependent signal, frequency bands, first frequency bands, frequency bins, first group, first frequency, first energy, frequency spectrum, frequency bin, frequency bin basis, first energy value) , a background noise estimation (frequency dependent signal, frequency bands, first frequency bands, frequency bins, first group, first frequency, first energy, frequency spectrum, frequency bin, frequency bin basis, first energy value) block for providing a noise estimate for each first narrow passband signal during a tonal period , and a channel power estimation block for estimating power in each said first narrow passband signal .

US8990073B2
CLAIM 34
. A device as defined in claim 31 , further comprising a detector of strong tones in the sound signal (noise attenuation) .
US5406635A
CLAIM 1
. A noise attenuation (sound signal) system for attenuating noise in a signal , comprising a filter for dividing the signal into a plurality of channels of a predetermined bandwidth , means for calculating the signal strength in each channel and estimating the noise strength in each channel , said system further comprising ;
a buffer for processing of the signal sequentially in distinct time periods of a predetermined length , and having an output coupled to a tonality decision block and said filter , said tonality decision block classifying each period as tonal or toneless , a background noise measurement system coupled to said filter and responsive to signals in said plurality of channels , for determining an estimate of background noise strength , said background noise measurement system splitting frequency components of signals from at least one channel into plural frequency ranges during each time period in which a tonal signal is indicated for said channel , and determining from one split frequency range , said background noise strength and a signal to noise ratio for each channel in each time period , a gain calculation block for determining a gain coefficient for each channel in each time period , such that the channel attenuation is increased for a decreasing signal to noise ratio , the gain calculation block coupled to a multiplication block for amplifying the signals in the plurality of channels dependent on the determined gain coefficient , and an assembly filter which reassembles noise attenuated channels to produce a noise attenuated signal .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal (noise attenuation) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US5406635A
CLAIM 1
. A noise attenuation (sound signal) system for attenuating noise in a signal , comprising a filter for dividing the signal into a plurality of channels of a predetermined bandwidth , means for calculating the signal strength in each channel and estimating the noise strength in each channel , said system further comprising ;
a buffer for processing of the signal sequentially in distinct time periods of a predetermined length , and having an output coupled to a tonality decision block and said filter , said tonality decision block classifying each period as tonal or toneless , a background noise measurement system coupled to said filter and responsive to signals in said plurality of channels , for determining an estimate of background noise strength , said background noise measurement system splitting frequency components of signals from at least one channel into plural frequency ranges during each time period in which a tonal signal is indicated for said channel , and determining from one split frequency range , said background noise strength and a signal to noise ratio for each channel in each time period , a gain calculation block for determining a gain coefficient for each channel in each time period , such that the channel attenuation is increased for a decreasing signal to noise ratio , the gain calculation block coupled to a multiplication block for amplifying the signals in the plurality of channels dependent on the determined gain coefficient , and an assembly filter which reassembles noise attenuated channels to produce a noise attenuated signal .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal (noise attenuation) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US5406635A
CLAIM 1
. A noise attenuation (sound signal) system for attenuating noise in a signal , comprising a filter for dividing the signal into a plurality of channels of a predetermined bandwidth , means for calculating the signal strength in each channel and estimating the noise strength in each channel , said system further comprising ;
a buffer for processing of the signal sequentially in distinct time periods of a predetermined length , and having an output coupled to a tonality decision block and said filter , said tonality decision block classifying each period as tonal or toneless , a background noise measurement system coupled to said filter and responsive to signals in said plurality of channels , for determining an estimate of background noise strength , said background noise measurement system splitting frequency components of signals from at least one channel into plural frequency ranges during each time period in which a tonal signal is indicated for said channel , and determining from one split frequency range , said background noise strength and a signal to noise ratio for each channel in each time period , a gain calculation block for determining a gain coefficient for each channel in each time period , such that the channel attenuation is increased for a decreasing signal to noise ratio , the gain calculation block coupled to a multiplication block for amplifying the signals in the plurality of channels dependent on the determined gain coefficient , and an assembly filter which reassembles noise attenuated channels to produce a noise attenuated signal .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal to noise ratio (noise ratio, noise power) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US5406635A
CLAIM 1
. A noise attenuation system for attenuating noise in a signal , comprising a filter for dividing the signal into a plurality of channels of a predetermined bandwidth , means for calculating the signal strength in each channel and estimating the noise strength in each channel , said system further comprising ;
a buffer for processing of the signal sequentially in distinct time periods of a predetermined length , and having an output coupled to a tonality decision block and said filter , said tonality decision block classifying each period as tonal or toneless , a background noise measurement system coupled to said filter and responsive to signals in said plurality of channels , for determining an estimate of background noise strength , said background noise measurement system splitting frequency components of signals from at least one channel into plural frequency ranges during each time period in which a tonal signal is indicated for said channel , and determining from one split frequency range , said background noise strength and a signal to noise ratio (noise ratio, SNR LT, SNR calculation) for each channel in each time period , a gain calculation block for determining a gain coefficient for each channel in each time period , such that the channel attenuation is increased for a decreasing signal to noise ratio , the gain calculation block coupled to a multiplication block for amplifying the signals in the plurality of channels dependent on the determined gain coefficient , and an assembly filter which reassembles noise attenuated channels to produce a noise attenuated signal .

US5406635A
CLAIM 9
. A method according to claim 8 , characterized in that the noise signal strength is estimated during periods classified as tonal , so that powers are measured of two narrow partial passbands that are separated by a frequency range that is smaller than the basic frequency on each channel of the noise attenuation system , and a lower partial passband power is defined to represent noise prevailing on the respective channel , whereby the background noise power (noise ratio, SNR LT, SNR calculation) of each channel is selected by suitably scaling the partial passband power .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal (noise attenuation) for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates .
US5406635A
CLAIM 1
. A noise attenuation (sound signal) system for attenuating noise in a signal , comprising a filter for dividing the signal into a plurality of channels of a predetermined bandwidth , means for calculating the signal strength in each channel and estimating the noise strength in each channel , said system further comprising ;
a buffer for processing of the signal sequentially in distinct time periods of a predetermined length , and having an output coupled to a tonality decision block and said filter , said tonality decision block classifying each period as tonal or toneless , a background noise measurement system coupled to said filter and responsive to signals in said plurality of channels , for determining an estimate of background noise strength , said background noise measurement system splitting frequency components of signals from at least one channel into plural frequency ranges during each time period in which a tonal signal is indicated for said channel , and determining from one split frequency range , said background noise strength and a signal to noise ratio for each channel in each time period , a gain calculation block for determining a gain coefficient for each channel in each time period , such that the channel attenuation is increased for a decreasing signal to noise ratio , the gain calculation block coupled to a multiplication block for amplifying the signals in the plurality of channels dependent on the determined gain coefficient , and an assembly filter which reassembles noise attenuated channels to produce a noise attenuated signal .

US8990073B2
CLAIM 41
. A device as defined in claim 36 , further comprising a calculator of a spectral parameter used for detecting spectral changes and spectral attacks in the sound signal (noise attenuation) .
US5406635A
CLAIM 1
. A noise attenuation (sound signal) system for attenuating noise in a signal , comprising a filter for dividing the signal into a plurality of channels of a predetermined bandwidth , means for calculating the signal strength in each channel and estimating the noise strength in each channel , said system further comprising ;
a buffer for processing of the signal sequentially in distinct time periods of a predetermined length , and having an output coupled to a tonality decision block and said filter , said tonality decision block classifying each period as tonal or toneless , a background noise measurement system coupled to said filter and responsive to signals in said plurality of channels , for determining an estimate of background noise strength , said background noise measurement system splitting frequency components of signals from at least one channel into plural frequency ranges during each time period in which a tonal signal is indicated for said channel , and determining from one split frequency range , said background noise strength and a signal to noise ratio for each channel in each time period , a gain calculation block for determining a gain coefficient for each channel in each time period , such that the channel attenuation is increased for a decreasing signal to noise ratio , the gain calculation block coupled to a multiplication block for amplifying the signals in the plurality of channels dependent on the determined gain coefficient , and an assembly filter which reassembles noise attenuated channels to produce a noise attenuated signal .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
CN1071036A

Filed: 1992-06-11     Issued: 1993-04-14

可变速率声码器

(Original Assignee) 夸尔柯姆股份有限公司     

保罗·E·雅各布, 威廉·R·加德纳, 冲·U·李, 克莱恩·S·吉豪森, S·凯瑟琳·兰姆, 民昌·蔡
US8990073B2
CLAIM 10
. A method for detecting sound activity (声音信号) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (背景噪声, 一对应) ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
CN1071036A
CLAIM 1
. 一种通过对数字化语音取样帧进行可变速率编码的语音信号压缩的方法,其特征在于,它包括下列步骤:对于数字化语音取样的一个帧,确定语音动作的级别;依据上述确定的所述帧中语音动作的级别,从一组速率中选择一个编码速率;各个速率有一对应 (background noise signal) 的不同编码格式,按照上述选中的速率下的预定编码格式对所述帧进行编码;在所述选中速率下为所述帧提供一个对应的输出数据包。

CN1071036A
CLAIM 5
. 在编码激励线性预测(CELP)编码器中,对于主要由语音及背景噪声 (background noise signal) 组成的声信号的数字化取样的输入帧进行可变速率编码的方法,其特征在于,它包括:对于数字化语音取样的一系列输入帧中的每一个计算线性预测编码系数(LPC) ;
依据至少一个所述LPC系数从一组数据包速率中为每个帧选出一个输出数据包速率 ;
将表示LPC系数的位数限制为由所述选中速率确定的预定数量 ;
对每个帧的一组成分音调分析子帧的各个音调子帧确定音调参数,其中,每个帧的音调子帧数量由所述选中速率确定,每个音调子帧的所述音调参数由所述选中速率所确定的位数来表示 ;
对于每个帧的一组成分码书分析子帧的各个码书子帧确定码书参数,其中,每个帧的码书分析子帧的数量由所述选中速率确定,各码书子帧的所述码书参数由依据所述选中速率而确定的位数表示 ;
和为每个帧提供一个对应的表示所述LPC系数的位的输出数据包,为每个相应的音调和码书子帧提供音调参数和码书参数。

CN1071036A
CLAIM 6
. 一种将声音信号 (detecting sound activity) 压缩成可变速率数据的装置,其特征在于,它包括:用于对所述声音信号的数字化取样的一个输入帧确定一个声音动作级别的装置 ;
依据所述帧中声音动作的所述确定级别 ;
从一组速率中选取一个输出数据速率的装置 ;
用于按照在所述选中速率下的一个预定编码格式将所述帧编码的装置,每个速率有一对应的不同编码格式 ;
用于在对应于所述选中速率的数据速率下给所述帧提供一个对应的输出数据包的装置。

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction (预测编码) residual error energies .
CN1071036A
CLAIM 5
. 在编码激励线性预测(CELP)编码器中,对于主要由语音及背景噪声组成的声信号的数字化取样的输入帧进行可变速率编码的方法,其特征在于,它包括:对于数字化语音取样的一系列输入帧中的每一个计算线性预测编码 (linear prediction) 系数(LPC) ;
依据至少一个所述LPC系数从一组数据包速率中为每个帧选出一个输出数据包速率 ;
将表示LPC系数的位数限制为由所述选中速率确定的预定数量 ;
对每个帧的一组成分音调分析子帧的各个音调子帧确定音调参数,其中,每个帧的音调子帧数量由所述选中速率确定,每个音调子帧的所述音调参数由所述选中速率所确定的位数来表示 ;
对于每个帧的一组成分码书分析子帧的各个码书子帧确定码书参数,其中,每个帧的码书分析子帧的数量由所述选中速率确定,各码书子帧的所述码书参数由依据所述选中速率而确定的位数表示 ;
和为每个帧提供一个对应的表示所述LPC系数的位的输出数据包,为每个相应的音调和码书子帧提供音调参数和码书参数。

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal (背景噪声, 一对应) and prevent update of noise energy estimates on the music signal .
CN1071036A
CLAIM 1
. 一种通过对数字化语音取样帧进行可变速率编码的语音信号压缩的方法,其特征在于,它包括下列步骤:对于数字化语音取样的一个帧,确定语音动作的级别;依据上述确定的所述帧中语音动作的级别,从一组速率中选择一个编码速率;各个速率有一对应 (background noise signal) 的不同编码格式,按照上述选中的速率下的预定编码格式对所述帧进行编码;在所述选中速率下为所述帧提供一个对应的输出数据包。

CN1071036A
CLAIM 5
. 在编码激励线性预测(CELP)编码器中,对于主要由语音及背景噪声 (background noise signal) 组成的声信号的数字化取样的输入帧进行可变速率编码的方法,其特征在于,它包括:对于数字化语音取样的一系列输入帧中的每一个计算线性预测编码系数(LPC) ;
依据至少一个所述LPC系数从一组数据包速率中为每个帧选出一个输出数据包速率 ;
将表示LPC系数的位数限制为由所述选中速率确定的预定数量 ;
对每个帧的一组成分音调分析子帧的各个音调子帧确定音调参数,其中,每个帧的音调子帧数量由所述选中速率确定,每个音调子帧的所述音调参数由所述选中速率所确定的位数来表示 ;
对于每个帧的一组成分码书分析子帧的各个码书子帧确定码书参数,其中,每个帧的码书分析子帧的数量由所述选中速率确定,各码书子帧的所述码书参数由依据所述选中速率而确定的位数表示 ;
和为每个帧提供一个对应的表示所述LPC系数的位的输出数据包,为每个相应的音调和码书子帧提供音调参数和码书参数。

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values (一个输出) so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
CN1071036A
CLAIM 5
. 在编码激励线性预测(CELP)编码器中,对于主要由语音及背景噪声组成的声信号的数字化取样的输入帧进行可变速率编码的方法,其特征在于,它包括:对于数字化语音取样的一系列输入帧中的每一个计算线性预测编码系数(LPC) ;
依据至少一个所述LPC系数从一组数据包速率中为每个帧选出一个输出 (second energy values) 数据包速率 ;
将表示LPC系数的位数限制为由所述选中速率确定的预定数量 ;
对每个帧的一组成分音调分析子帧的各个音调子帧确定音调参数,其中,每个帧的音调子帧数量由所述选中速率确定,每个音调子帧的所述音调参数由所述选中速率所确定的位数来表示 ;
对于每个帧的一组成分码书分析子帧的各个码书子帧确定码书参数,其中,每个帧的码书分析子帧的数量由所述选中速率确定,各码书子帧的所述码书参数由依据所述选中速率而确定的位数表示 ;
和为每个帧提供一个对应的表示所述LPC系数的位的输出数据包,为每个相应的音调和码书子帧提供音调参数和码书参数。

US8990073B2
CLAIM 35
. A device for detecting sound activity (声音信号) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (背景噪声, 一对应) ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
CN1071036A
CLAIM 1
. 一种通过对数字化语音取样帧进行可变速率编码的语音信号压缩的方法,其特征在于,它包括下列步骤:对于数字化语音取样的一个帧,确定语音动作的级别;依据上述确定的所述帧中语音动作的级别,从一组速率中选择一个编码速率;各个速率有一对应 (background noise signal) 的不同编码格式,按照上述选中的速率下的预定编码格式对所述帧进行编码;在所述选中速率下为所述帧提供一个对应的输出数据包。

CN1071036A
CLAIM 5
. 在编码激励线性预测(CELP)编码器中,对于主要由语音及背景噪声 (background noise signal) 组成的声信号的数字化取样的输入帧进行可变速率编码的方法,其特征在于,它包括:对于数字化语音取样的一系列输入帧中的每一个计算线性预测编码系数(LPC) ;
依据至少一个所述LPC系数从一组数据包速率中为每个帧选出一个输出数据包速率 ;
将表示LPC系数的位数限制为由所述选中速率确定的预定数量 ;
对每个帧的一组成分音调分析子帧的各个音调子帧确定音调参数,其中,每个帧的音调子帧数量由所述选中速率确定,每个音调子帧的所述音调参数由所述选中速率所确定的位数来表示 ;
对于每个帧的一组成分码书分析子帧的各个码书子帧确定码书参数,其中,每个帧的码书分析子帧的数量由所述选中速率确定,各码书子帧的所述码书参数由依据所述选中速率而确定的位数表示 ;
和为每个帧提供一个对应的表示所述LPC系数的位的输出数据包,为每个相应的音调和码书子帧提供音调参数和码书参数。

CN1071036A
CLAIM 6
. 一种将声音信号 (detecting sound activity) 压缩成可变速率数据的装置,其特征在于,它包括:用于对所述声音信号的数字化取样的一个输入帧确定一个声音动作级别的装置 ;
依据所述帧中声音动作的所述确定级别 ;
从一组速率中选取一个输出数据速率的装置 ;
用于按照在所述选中速率下的一个预定编码格式将所述帧编码的装置,每个速率有一对应的不同编码格式 ;
用于在对应于所述选中速率的数据速率下给所述帧提供一个对应的输出数据包的装置。

US8990073B2
CLAIM 36
. A device for detecting sound activity (声音信号) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal (背景噪声, 一对应) ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
CN1071036A
CLAIM 1
. 一种通过对数字化语音取样帧进行可变速率编码的语音信号压缩的方法,其特征在于,它包括下列步骤:对于数字化语音取样的一个帧,确定语音动作的级别;依据上述确定的所述帧中语音动作的级别,从一组速率中选择一个编码速率;各个速率有一对应 (background noise signal) 的不同编码格式,按照上述选中的速率下的预定编码格式对所述帧进行编码;在所述选中速率下为所述帧提供一个对应的输出数据包。

CN1071036A
CLAIM 5
. 在编码激励线性预测(CELP)编码器中,对于主要由语音及背景噪声 (background noise signal) 组成的声信号的数字化取样的输入帧进行可变速率编码的方法,其特征在于,它包括:对于数字化语音取样的一系列输入帧中的每一个计算线性预测编码系数(LPC) ;
依据至少一个所述LPC系数从一组数据包速率中为每个帧选出一个输出数据包速率 ;
将表示LPC系数的位数限制为由所述选中速率确定的预定数量 ;
对每个帧的一组成分音调分析子帧的各个音调子帧确定音调参数,其中,每个帧的音调子帧数量由所述选中速率确定,每个音调子帧的所述音调参数由所述选中速率所确定的位数来表示 ;
对于每个帧的一组成分码书分析子帧的各个码书子帧确定码书参数,其中,每个帧的码书分析子帧的数量由所述选中速率确定,各码书子帧的所述码书参数由依据所述选中速率而确定的位数表示 ;
和为每个帧提供一个对应的表示所述LPC系数的位的输出数据包,为每个相应的音调和码书子帧提供音调参数和码书参数。

CN1071036A
CLAIM 6
. 一种将声音信号 (detecting sound activity) 压缩成可变速率数据的装置,其特征在于,它包括:用于对所述声音信号的数字化取样的一个输入帧确定一个声音动作级别的装置 ;
依据所述帧中声音动作的所述确定级别 ;
从一组速率中选取一个输出数据速率的装置 ;
用于按照在所述选中速率下的一个预定编码格式将所述帧编码的装置,每个速率有一对应的不同编码格式 ;
用于在对应于所述选中速率的数据速率下给所述帧提供一个对应的输出数据包的装置。

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal (背景噪声, 一对应) and preventing update of noise energy estimates .
CN1071036A
CLAIM 1
. 一种通过对数字化语音取样帧进行可变速率编码的语音信号压缩的方法,其特征在于,它包括下列步骤:对于数字化语音取样的一个帧,确定语音动作的级别;依据上述确定的所述帧中语音动作的级别,从一组速率中选择一个编码速率;各个速率有一对应 (background noise signal) 的不同编码格式,按照上述选中的速率下的预定编码格式对所述帧进行编码;在所述选中速率下为所述帧提供一个对应的输出数据包。

CN1071036A
CLAIM 5
. 在编码激励线性预测(CELP)编码器中,对于主要由语音及背景噪声 (background noise signal) 组成的声信号的数字化取样的输入帧进行可变速率编码的方法,其特征在于,它包括:对于数字化语音取样的一系列输入帧中的每一个计算线性预测编码系数(LPC) ;
依据至少一个所述LPC系数从一组数据包速率中为每个帧选出一个输出数据包速率 ;
将表示LPC系数的位数限制为由所述选中速率确定的预定数量 ;
对每个帧的一组成分音调分析子帧的各个音调子帧确定音调参数,其中,每个帧的音调子帧数量由所述选中速率确定,每个音调子帧的所述音调参数由所述选中速率所确定的位数来表示 ;
对于每个帧的一组成分码书分析子帧的各个码书子帧确定码书参数,其中,每个帧的码书分析子帧的数量由所述选中速率确定,各码书子帧的所述码书参数由依据所述选中速率而确定的位数表示 ;
和为每个帧提供一个对应的表示所述LPC系数的位的输出数据包,为每个相应的音调和码书子帧提供音调参数和码书参数。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
CN1381956A

Filed: 1992-06-11     Issued: 2002-11-27

可变速率声码器

(Original Assignee) 夸尔柯姆股份有限公司     

保罗·E·雅各布, 威廉·R·加德纳, 冲·U·李, 克莱恩·S·吉豪森, S·凯瑟琳·兰姆, 民昌·蔡
US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction (预测编码) residual error energies .
CN1381956A
CLAIM 9
. 在用于接收根据线性预测编码 (linear prediction) 算法编码的数据帧的解码器中,一种用于掩蔽帧错误的设备,其特征在于,包括:存储装置,用于存储一个正确接收帧的参数数据;和掩蔽装置,该装置用所述存储装置所存至少一个参数代替一错误接收帧中的至少一个参数。

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (至少一个参数) indicative of an activity of the sound signal .
CN1381956A
CLAIM 9
. 在用于接收根据线性预测编码算法编码的数据帧的解码器中,一种用于掩蔽帧错误的设备,其特征在于,包括:存储装置,用于存储一个正确接收帧的参数数据;和掩蔽装置,该装置用所述存储装置所存至少一个参数 (activity prediction parameter) 代替一错误接收帧中的至少一个参数

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (至少一个参数) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
CN1381956A
CLAIM 9
. 在用于接收根据线性预测编码算法编码的数据帧的解码器中,一种用于掩蔽帧错误的设备,其特征在于,包括:存储装置,用于存储一个正确接收帧的参数数据;和掩蔽装置,该装置用所述存储装置所存至少一个参数 (activity prediction parameter) 代替一错误接收帧中的至少一个参数

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (至少一个参数) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
CN1381956A
CLAIM 9
. 在用于接收根据线性预测编码算法编码的数据帧的解码器中,一种用于掩蔽帧错误的设备,其特征在于,包括:存储装置,用于存储一个正确接收帧的参数数据;和掩蔽装置,该装置用所述存储装置所存至少一个参数 (activity prediction parameter) 代替一错误接收帧中的至少一个参数




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
EP1107231A2

Filed: 1992-06-03     Issued: 2001-06-13

Variable rate decoder

(Original Assignee) Qualcomm Inc     (Current Assignee) Qualcomm Inc

William R. Gardener, Klein S. Gilhousen, Paul E. Jacobs, S. Katherine Lam, Chong U. Lee, Ming-Chang Tsai
US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value (autocorrelation coefficients) with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
EP1107231A2
CLAIM 48
A method of speech signal compression by variable rate coding of frames of digitized speech samples comprising the steps of : multiplying one frame of digitized speech samples in a sequence of said frames of digitized speech samples by a windowing function to provide a windowed frame of speech data ;
calculating a set of autocorrelation coefficients (correlation value) from said windowed frame of speech ;
determining an encoding rate from said set of autocorrelation coefficients ;
calculating from said set of autocorrelation coefficients a set of linear predictive coding (LPC) coefficients ;
converting said set of LPC coefficients to a set of line spectral pair values ;
quantizing said set of line spectral pair (LSP) coefficients in accordance with said rate command and said encoding rate ;
selecting a pitch value from a predetermined set of pitch values to provide a selected pitch value for each pitch subframe in each frame of digitized speech ;
quantizing said selected pitch value in accordance with said encoding rate and said rate command ;
selecting a codebook value from a predetermined set of pitch values to provide a selected pitch value for a pitch frame ;
quantizing said selected codebook value in accordance with said encoding rate and said rate command ;
and generating an output data packet comprising said quantized line spectral pair values , quantized selected pitch value , and quantized selected codebook value .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (linear predictive coefficient, previous frame, speech signal) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
EP1107231A2
CLAIM 1
A method of speech signal (activity prediction parameter, noise character parameter) compression , by variable rate coding of frames of digitized speech samples , comprising the steps of : determining a level of speech activity for a frame of digitized speech samples ;
selecting an encoding rate from a set of rates based upon said determined level of speech activity for said frame ;
coding said frame according to a coding format of a set of coding formats for said selected rate wherein each rate has a corresponding different coding format and wherein each coding format provides for a different plurality of parameter signals representing said digitized speech samples in accordance with a speech model ;
and generating for said frame a data packet of said parameter signals .

EP1107231A2
CLAIM 2
The method of claim 1 wherein said step of determining said level of frame speech activity comprises the steps of : measuring speech activity in said frame of digitized speech samples ;
comparing said measured speech activity with at least one speech activity threshold level of a predetermined set of activity threshold levels ;
and adaptively adjusting in response to said comparison at least one of said at least one speech activity threshold levels with respect to a level of activity of a previous frame (activity prediction parameter, noise character parameter) of digitized speech samples .

EP1107231A2
CLAIM 5
The method of claim 1 wherein said step of providing said data packet of said parameter signals comprises : generating a variable number of bits to represent linear predictive coefficient (activity prediction parameter, noise character parameter) (LPC) vector signals of said frame of digitized speech samples , wherein said variable number of bits representing said LPC vector signals is responsive to said measured speech activity level ;
generating a variable number of bits to represent pitch vector signals of said frame of digitized speech samples , wherein said variable number of bits representing said pitch vector signals is responsive to said measured speech activity level ;
and generating variable number of bits to represent codebook excitation vector signals of said frame of digitized speech samples , wherein said variable number of bits representing said codebook excitation vector signals is responsive to said measured speech activity level .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (linear predictive coefficient, previous frame, speech signal) indicative of an activity of the sound signal .
EP1107231A2
CLAIM 1
A method of speech signal (activity prediction parameter, noise character parameter) compression , by variable rate coding of frames of digitized speech samples , comprising the steps of : determining a level of speech activity for a frame of digitized speech samples ;
selecting an encoding rate from a set of rates based upon said determined level of speech activity for said frame ;
coding said frame according to a coding format of a set of coding formats for said selected rate wherein each rate has a corresponding different coding format and wherein each coding format provides for a different plurality of parameter signals representing said digitized speech samples in accordance with a speech model ;
and generating for said frame a data packet of said parameter signals .

EP1107231A2
CLAIM 2
The method of claim 1 wherein said step of determining said level of frame speech activity comprises the steps of : measuring speech activity in said frame of digitized speech samples ;
comparing said measured speech activity with at least one speech activity threshold level of a predetermined set of activity threshold levels ;
and adaptively adjusting in response to said comparison at least one of said at least one speech activity threshold levels with respect to a level of activity of a previous frame (activity prediction parameter, noise character parameter) of digitized speech samples .

EP1107231A2
CLAIM 5
The method of claim 1 wherein said step of providing said data packet of said parameter signals comprises : generating a variable number of bits to represent linear predictive coefficient (activity prediction parameter, noise character parameter) (LPC) vector signals of said frame of digitized speech samples , wherein said variable number of bits representing said LPC vector signals is responsive to said measured speech activity level ;
generating a variable number of bits to represent pitch vector signals of said frame of digitized speech samples , wherein said variable number of bits representing said pitch vector signals is responsive to said measured speech activity level ;
and generating variable number of bits to represent codebook excitation vector signals of said frame of digitized speech samples , wherein said variable number of bits representing said codebook excitation vector signals is responsive to said measured speech activity level .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (linear predictive coefficient, previous frame, speech signal) comprises : calculating a long-term value of a binary decision (linear prediction coefficient, block code) obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
EP1107231A2
CLAIM 1
A method of speech signal (activity prediction parameter, noise character parameter) compression , by variable rate coding of frames of digitized speech samples , comprising the steps of : determining a level of speech activity for a frame of digitized speech samples ;
selecting an encoding rate from a set of rates based upon said determined level of speech activity for said frame ;
coding said frame according to a coding format of a set of coding formats for said selected rate wherein each rate has a corresponding different coding format and wherein each coding format provides for a different plurality of parameter signals representing said digitized speech samples in accordance with a speech model ;
and generating for said frame a data packet of said parameter signals .

EP1107231A2
CLAIM 2
The method of claim 1 wherein said step of determining said level of frame speech activity comprises the steps of : measuring speech activity in said frame of digitized speech samples ;
comparing said measured speech activity with at least one speech activity threshold level of a predetermined set of activity threshold levels ;
and adaptively adjusting in response to said comparison at least one of said at least one speech activity threshold levels with respect to a level of activity of a previous frame (activity prediction parameter, noise character parameter) of digitized speech samples .

EP1107231A2
CLAIM 5
The method of claim 1 wherein said step of providing said data packet of said parameter signals comprises : generating a variable number of bits to represent linear predictive coefficient (activity prediction parameter, noise character parameter) (LPC) vector signals of said frame of digitized speech samples , wherein said variable number of bits representing said LPC vector signals is responsive to said measured speech activity level ;
generating a variable number of bits to represent pitch vector signals of said frame of digitized speech samples , wherein said variable number of bits representing said pitch vector signals is responsive to said measured speech activity level ;
and generating variable number of bits to represent codebook excitation vector signals of said frame of digitized speech samples , wherein said variable number of bits representing said codebook excitation vector signals is responsive to said measured speech activity level .

EP1107231A2
CLAIM 6
The method of claim 1 wherein said step coding said frame comprises : generating for said frame a variable number of linear prediction coefficient (binary decision) s wherein said variable number of said linear prediction coefficients is responsive to said selected encoding rate ;
generating for said frame a variable number of pitch coefficients wherein said variable number of said pitch coefficients is responsive to said selected encoding rate ;
and generating for said frame a variable number of codebook excitation values wherein said variable number of said codebook excitation values is responsive to said selected encoding rate .

EP1107231A2
CLAIM 12
The method of claim 8 wherein said step of generating error protection for said data packet further comprises determining the values of said error protection bits in accordance with a cyclic block code (binary decision) .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (linear predictive coefficient, previous frame, speech signal) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
EP1107231A2
CLAIM 1
A method of speech signal (activity prediction parameter, noise character parameter) compression , by variable rate coding of frames of digitized speech samples , comprising the steps of : determining a level of speech activity for a frame of digitized speech samples ;
selecting an encoding rate from a set of rates based upon said determined level of speech activity for said frame ;
coding said frame according to a coding format of a set of coding formats for said selected rate wherein each rate has a corresponding different coding format and wherein each coding format provides for a different plurality of parameter signals representing said digitized speech samples in accordance with a speech model ;
and generating for said frame a data packet of said parameter signals .

EP1107231A2
CLAIM 2
The method of claim 1 wherein said step of determining said level of frame speech activity comprises the steps of : measuring speech activity in said frame of digitized speech samples ;
comparing said measured speech activity with at least one speech activity threshold level of a predetermined set of activity threshold levels ;
and adaptively adjusting in response to said comparison at least one of said at least one speech activity threshold levels with respect to a level of activity of a previous frame (activity prediction parameter, noise character parameter) of digitized speech samples .

EP1107231A2
CLAIM 5
The method of claim 1 wherein said step of providing said data packet of said parameter signals comprises : generating a variable number of bits to represent linear predictive coefficient (activity prediction parameter, noise character parameter) (LPC) vector signals of said frame of digitized speech samples , wherein said variable number of bits representing said LPC vector signals is responsive to said measured speech activity level ;
generating a variable number of bits to represent pitch vector signals of said frame of digitized speech samples , wherein said variable number of bits representing said pitch vector signals is responsive to said measured speech activity level ;
and generating variable number of bits to represent codebook excitation vector signals of said frame of digitized speech samples , wherein said variable number of bits representing said codebook excitation vector signals is responsive to said measured speech activity level .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (linear predictive coefficient, previous frame, speech signal) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
EP1107231A2
CLAIM 1
A method of speech signal (activity prediction parameter, noise character parameter) compression , by variable rate coding of frames of digitized speech samples , comprising the steps of : determining a level of speech activity for a frame of digitized speech samples ;
selecting an encoding rate from a set of rates based upon said determined level of speech activity for said frame ;
coding said frame according to a coding format of a set of coding formats for said selected rate wherein each rate has a corresponding different coding format and wherein each coding format provides for a different plurality of parameter signals representing said digitized speech samples in accordance with a speech model ;
and generating for said frame a data packet of said parameter signals .

EP1107231A2
CLAIM 2
The method of claim 1 wherein said step of determining said level of frame speech activity comprises the steps of : measuring speech activity in said frame of digitized speech samples ;
comparing said measured speech activity with at least one speech activity threshold level of a predetermined set of activity threshold levels ;
and adaptively adjusting in response to said comparison at least one of said at least one speech activity threshold levels with respect to a level of activity of a previous frame (activity prediction parameter, noise character parameter) of digitized speech samples .

EP1107231A2
CLAIM 5
The method of claim 1 wherein said step of providing said data packet of said parameter signals comprises : generating a variable number of bits to represent linear predictive coefficient (activity prediction parameter, noise character parameter) (LPC) vector signals of said frame of digitized speech samples , wherein said variable number of bits representing said LPC vector signals is responsive to said measured speech activity level ;
generating a variable number of bits to represent pitch vector signals of said frame of digitized speech samples , wherein said variable number of bits representing said pitch vector signals is responsive to said measured speech activity level ;
and generating variable number of bits to represent codebook excitation vector signals of said frame of digitized speech samples , wherein said variable number of bits representing said codebook excitation vector signals is responsive to said measured speech activity level .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (linear predictive coefficient, previous frame, speech signal) inferior than a given fixed threshold .
EP1107231A2
CLAIM 1
A method of speech signal (activity prediction parameter, noise character parameter) compression , by variable rate coding of frames of digitized speech samples , comprising the steps of : determining a level of speech activity for a frame of digitized speech samples ;
selecting an encoding rate from a set of rates based upon said determined level of speech activity for said frame ;
coding said frame according to a coding format of a set of coding formats for said selected rate wherein each rate has a corresponding different coding format and wherein each coding format provides for a different plurality of parameter signals representing said digitized speech samples in accordance with a speech model ;
and generating for said frame a data packet of said parameter signals .

EP1107231A2
CLAIM 2
The method of claim 1 wherein said step of determining said level of frame speech activity comprises the steps of : measuring speech activity in said frame of digitized speech samples ;
comparing said measured speech activity with at least one speech activity threshold level of a predetermined set of activity threshold levels ;
and adaptively adjusting in response to said comparison at least one of said at least one speech activity threshold levels with respect to a level of activity of a previous frame (activity prediction parameter, noise character parameter) of digitized speech samples .

EP1107231A2
CLAIM 5
The method of claim 1 wherein said step of providing said data packet of said parameter signals comprises : generating a variable number of bits to represent linear predictive coefficient (activity prediction parameter, noise character parameter) (LPC) vector signals of said frame of digitized speech samples , wherein said variable number of bits representing said LPC vector signals is responsive to said measured speech activity level ;
generating a variable number of bits to represent pitch vector signals of said frame of digitized speech samples , wherein said variable number of bits representing said pitch vector signals is responsive to said measured speech activity level ;
and generating variable number of bits to represent codebook excitation vector signals of said frame of digitized speech samples , wherein said variable number of bits representing said codebook excitation vector signals is responsive to said measured speech activity level .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
EP1162601A2

Filed: 1992-06-03     Issued: 2001-12-12

Variable rate vocoder

(Original Assignee) Qualcomm Inc     (Current Assignee) Qualcomm Inc

William R. Gardner, Klein S. Gilhousen, Paul E. Jacobs, S. Katherine Lam, Chong U. Lee, Ming-Chang Tsai
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long term correlation map .
EP1162601A2
CLAIM 4
The method of Claim 1 wherein said determining said new estimate of background noise comprises : comparing said measured energy of said current frame (current frame) of speech to said previous estimate of said background noise ;
and computing said new estimate of background noise based on said measured energy of said current speech frame and a previous estimate of said background noise , and the result of said comparison of said measured energy of said current frame of speech to said previous estimate of said background noise .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame (current frame) ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
EP1162601A2
CLAIM 4
The method of Claim 1 wherein said determining said new estimate of background noise comprises : comparing said measured energy of said current frame (current frame) of speech to said previous estimate of said background noise ;
and computing said new estimate of background noise based on said measured energy of said current speech frame and a previous estimate of said background noise , and the result of said comparison of said measured energy of said current frame of speech to said previous estimate of said background noise .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates (random code) when a tonal sound signal is detected .
EP1162601A2
CLAIM 30
The apparatus of Claims 23 and 29 wherein said masking means selects a random code (noise energy estimates, term signal) book excitation vector index and replaces a codebook excitation vector index of said frame received in error with said randomly selected codebook excitation vector index .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates (random code) calculated in a previous frame in a SNR calculation .
EP1162601A2
CLAIM 30
The apparatus of Claims 23 and 29 wherein said masking means selects a random code (noise energy estimates, term signal) book excitation vector index and replaces a codebook excitation vector index of said frame received in error with said randomly selected codebook excitation vector index .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates (random code) for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
EP1162601A2
CLAIM 30
The apparatus of Claims 23 and 29 wherein said masking means selects a random code (noise energy estimates, term signal) book excitation vector index and replaces a codebook excitation vector index of said frame received in error with said randomly selected codebook excitation vector index .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal prevents updating of noise energy estimates (random code) when a music signal is detected .
EP1162601A2
CLAIM 30
The apparatus of Claims 23 and 29 wherein said masking means selects a random code (noise energy estimates, term signal) book excitation vector index and replaces a codebook excitation vector index of said frame received in error with said randomly selected codebook excitation vector index .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates (random code) on the music signal .
EP1162601A2
CLAIM 30
The apparatus of Claims 23 and 29 wherein said masking means selects a random code (noise energy estimates, term signal) book excitation vector index and replaces a codebook excitation vector index of said frame received in error with said randomly selected codebook excitation vector index .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame (current frame) energy and an average frame energy .
EP1162601A2
CLAIM 4
The method of Claim 1 wherein said determining said new estimate of background noise comprises : comparing said measured energy of said current frame (current frame) of speech to said previous estimate of said background noise ;
and computing said new estimate of background noise based on said measured energy of said current speech frame and a previous estimate of said background noise , and the result of said comparison of said measured energy of said current frame of speech to said previous estimate of said background noise .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame (current frame) and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
EP1162601A2
CLAIM 4
The method of Claim 1 wherein said determining said new estimate of background noise comprises : comparing said measured energy of said current frame (current frame) of speech to said previous estimate of said background noise ;
and computing said new estimate of background noise based on said measured energy of said current speech frame and a previous estimate of said background noise , and the result of said comparison of said measured energy of said current frame of speech to said previous estimate of said background noise .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates (random code) is prevented in response to having simultaneously the activity prediction parameter larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
EP1162601A2
CLAIM 30
The apparatus of Claims 23 and 29 wherein said masking means selects a random code (noise energy estimates, term signal) book excitation vector index and replaces a codebook excitation vector index of said frame received in error with said randomly selected codebook excitation vector index .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates (random code) is prevented in response to having the noise character parameter inferior than a given fixed threshold .
EP1162601A2
CLAIM 30
The apparatus of Claims 23 and 29 wherein said masking means selects a random code (noise energy estimates, term signal) book excitation vector index and replaces a codebook excitation vector index of said frame received in error with said randomly selected codebook excitation vector index .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long-term correlation map .
EP1162601A2
CLAIM 4
The method of Claim 1 wherein said determining said new estimate of background noise comprises : comparing said measured energy of said current frame (current frame) of speech to said previous estimate of said background noise ;
and computing said new estimate of background noise based on said measured energy of said current speech frame and a previous estimate of said background noise , and the result of said comparison of said measured energy of said current frame of speech to said previous estimate of said background noise .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long-term correlation map .
EP1162601A2
CLAIM 4
The method of Claim 1 wherein said determining said new estimate of background noise comprises : comparing said measured energy of said current frame (current frame) of speech to said previous estimate of said background noise ;
and computing said new estimate of background noise based on said measured energy of said current speech frame and a previous estimate of said background noise , and the result of said comparison of said measured energy of said current frame of speech to said previous estimate of said background noise .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame (current frame) ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
EP1162601A2
CLAIM 4
The method of Claim 1 wherein said determining said new estimate of background noise comprises : comparing said measured energy of said current frame (current frame) of speech to said previous estimate of said background noise ;
and computing said new estimate of background noise based on said measured energy of said current speech frame and a previous estimate of said background noise , and the result of said comparison of said measured energy of said current frame of speech to said previous estimate of said background noise .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal to noise ratio (gain parameter, second value) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
EP1162601A2
CLAIM 5
The method of Claim 4 wherein said determining said new estimate of background noise comprises selecting the minimum of said energy of said current speech frame and a second value (noise ratio) determined in accordance with said previous estimate of said background noise .

EP1162601A2
CLAIM 27
The apparatus of Claim 23 wherein said masking means replaces a codebook gain parameter (noise ratio) in said frame received in error with a value approximately equal to zero .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates (random code) in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector .
EP1162601A2
CLAIM 30
The apparatus of Claims 23 and 29 wherein said masking means selects a random code (noise energy estimates, term signal) book excitation vector index and replaces a codebook excitation vector index of said frame received in error with said randomly selected codebook excitation vector index .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates (random code) .
EP1162601A2
CLAIM 30
The apparatus of Claims 23 and 29 wherein said masking means selects a random code (noise energy estimates, term signal) book excitation vector index and replaces a codebook excitation vector index of said frame received in error with said randomly selected codebook excitation vector index .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
EP1239456A1

Filed: 1992-06-03     Issued: 2002-09-11

Variable rate vocoder

(Original Assignee) Qualcomm Inc     (Current Assignee) Qualcomm Inc

William R. Gardner, Klein S. Gilhousen, Paul E. Jacobs, Katherine S. Lam, Chong U. Lee, Ming-Chang Tsai
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long term correlation map .
EP1239456A1
CLAIM 11
A speech processor configured to process a speech signal comprising a plurality of frames , the speech processor comprising : a first circuit configured to calculate an energy level of a frame of the speech signal ;
a second circuit configured to compute an estimate of background noise in a previous frame of the speech signal and to increase the estimate of background noise in a previous frame of the speech signal by a predefined amount to generate an increased estimate value ;
a first multiplexer coupled to the first and second circuits and configured to receive the increased estimate value and the energy level , and to select either the increased estimate value or the energy level as an estimate of background noise in a current frame (current frame) of the speech signal .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame (current frame) ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
EP1239456A1
CLAIM 11
A speech processor configured to process a speech signal comprising a plurality of frames , the speech processor comprising : a first circuit configured to calculate an energy level of a frame of the speech signal ;
a second circuit configured to compute an estimate of background noise in a previous frame of the speech signal and to increase the estimate of background noise in a previous frame of the speech signal by a predefined amount to generate an increased estimate value ;
a first multiplexer coupled to the first and second circuits and configured to receive the increased estimate value and the energy level , and to select either the increased estimate value or the energy level as an estimate of background noise in a current frame (current frame) of the speech signal .

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum comprises locating a maximum between each pair of two consecutive minima (second limiter, first limiter) of the current residual spectrum .
EP1239456A1
CLAIM 12
The speech processor of claim 11 , further comprising a first limiter (consecutive minima, two consecutive minima) coupled to the second circuit and configured to limit the increased estimate value to a value that is below a predefined level .

EP1239456A1
CLAIM 13
The speech processor of claim 12 , further comprising a second limiter (consecutive minima, two consecutive minima) coupled to the first multiplexer and configured to limit the estimate of background noise in a current frame of the speech signal to a value that is less than or equal to the energy level .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins between two consecutive minima (second limiter, first limiter) in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
EP1239456A1
CLAIM 12
The speech processor of claim 11 , further comprising a first limiter (consecutive minima, two consecutive minima) coupled to the second circuit and configured to limit the increased estimate value to a value that is below a predefined level .

EP1239456A1
CLAIM 13
The speech processor of claim 12 , further comprising a second limiter (consecutive minima, two consecutive minima) coupled to the first multiplexer and configured to limit the estimate of background noise in a current frame of the speech signal to a value that is less than or equal to the energy level .

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (first adder) ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
EP1239456A1
CLAIM 14
The speech processor of claim 12 , wherein the second circuit comprises : a first adder (background noise signal) configured to add a nominal constant value to the estimate of background noise in a previous frame of the speech signal to generate an absolutely increased estimate value ;
a multiplier configured to multiply the estimate of background noise in a previous frame of the speech signal by a constant value that is marginally greater than one to generate a percentage increased estimate value ;
a second multiplexer coupled to the first adder and the multiplier and configured to receive the absolutely increased estimate value and the percentage increased estimate value ;
a third circuit coupled to the first adder , the second multiplier , and the second multiplexer , and configured to control the second multiplexer to select the larger of the absolutely increased estimate value and the percentage increased estimate value as the increased estimate value .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal (first adder) and prevent update of noise energy estimates on the music signal .
EP1239456A1
CLAIM 14
The speech processor of claim 12 , wherein the second circuit comprises : a first adder (background noise signal) configured to add a nominal constant value to the estimate of background noise in a previous frame of the speech signal to generate an absolutely increased estimate value ;
a multiplier configured to multiply the estimate of background noise in a previous frame of the speech signal by a constant value that is marginally greater than one to generate a percentage increased estimate value ;
a second multiplexer coupled to the first adder and the multiplier and configured to receive the absolutely increased estimate value and the percentage increased estimate value ;
a third circuit coupled to the first adder , the second multiplier , and the second multiplexer , and configured to control the second multiplexer to select the larger of the absolutely increased estimate value and the percentage increased estimate value as the increased estimate value .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame (current frame) energy and an average frame energy .
EP1239456A1
CLAIM 11
A speech processor configured to process a speech signal comprising a plurality of frames , the speech processor comprising : a first circuit configured to calculate an energy level of a frame of the speech signal ;
a second circuit configured to compute an estimate of background noise in a previous frame of the speech signal and to increase the estimate of background noise in a previous frame of the speech signal by a predefined amount to generate an increased estimate value ;
a first multiplexer coupled to the first and second circuits and configured to receive the increased estimate value and the energy level , and to select either the increased estimate value or the energy level as an estimate of background noise in a current frame (current frame) of the speech signal .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame (current frame) and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
EP1239456A1
CLAIM 11
A speech processor configured to process a speech signal comprising a plurality of frames , the speech processor comprising : a first circuit configured to calculate an energy level of a frame of the speech signal ;
a second circuit configured to compute an estimate of background noise in a previous frame of the speech signal and to increase the estimate of background noise in a previous frame of the speech signal by a predefined amount to generate an increased estimate value ;
a first multiplexer coupled to the first and second circuits and configured to receive the increased estimate value and the energy level , and to select either the increased estimate value or the energy level as an estimate of background noise in a current frame (current frame) of the speech signal .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (previous frame) indicative of an activity of the sound signal .
EP1239456A1
CLAIM 1
A method of generating a current estimate of background noise in a frame of speech in a speech signal comprising a plurality of frames , the method comprising : calculating an energy level of a frame of the speech signal ;
obtaining an estimate of background noise in a previous frame (activity prediction parameter) of the speech signal ;
and generating the current estimate of background noise based on the energy level and the estimate of background noise in a previous frame of the speech signal .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (previous frame) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
EP1239456A1
CLAIM 1
A method of generating a current estimate of background noise in a frame of speech in a speech signal comprising a plurality of frames , the method comprising : calculating an energy level of a frame of the speech signal ;
obtaining an estimate of background noise in a previous frame (activity prediction parameter) of the speech signal ;
and generating the current estimate of background noise based on the energy level and the estimate of background noise in a previous frame of the speech signal .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (previous frame) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
EP1239456A1
CLAIM 1
A method of generating a current estimate of background noise in a frame of speech in a speech signal comprising a plurality of frames , the method comprising : calculating an energy level of a frame of the speech signal ;
obtaining an estimate of background noise in a previous frame (activity prediction parameter) of the speech signal ;
and generating the current estimate of background noise based on the energy level and the estimate of background noise in a previous frame of the speech signal .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long-term correlation map .
EP1239456A1
CLAIM 11
A speech processor configured to process a speech signal comprising a plurality of frames , the speech processor comprising : a first circuit configured to calculate an energy level of a frame of the speech signal ;
a second circuit configured to compute an estimate of background noise in a previous frame of the speech signal and to increase the estimate of background noise in a previous frame of the speech signal by a predefined amount to generate an increased estimate value ;
a first multiplexer coupled to the first and second circuits and configured to receive the increased estimate value and the energy level , and to select either the increased estimate value or the energy level as an estimate of background noise in a current frame (current frame) of the speech signal .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long-term correlation map .
EP1239456A1
CLAIM 11
A speech processor configured to process a speech signal comprising a plurality of frames , the speech processor comprising : a first circuit configured to calculate an energy level of a frame of the speech signal ;
a second circuit configured to compute an estimate of background noise in a previous frame of the speech signal and to increase the estimate of background noise in a previous frame of the speech signal by a predefined amount to generate an increased estimate value ;
a first multiplexer coupled to the first and second circuits and configured to receive the increased estimate value and the energy level , and to select either the increased estimate value or the energy level as an estimate of background noise in a current frame (current frame) of the speech signal .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame (current frame) ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
EP1239456A1
CLAIM 11
A speech processor configured to process a speech signal comprising a plurality of frames , the speech processor comprising : a first circuit configured to calculate an energy level of a frame of the speech signal ;
a second circuit configured to compute an estimate of background noise in a previous frame of the speech signal and to increase the estimate of background noise in a previous frame of the speech signal by a predefined amount to generate an increased estimate value ;
a first multiplexer coupled to the first and second circuits and configured to receive the increased estimate value and the energy level , and to select either the increased estimate value or the energy level as an estimate of background noise in a current frame (current frame) of the speech signal .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (first adder) ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
EP1239456A1
CLAIM 14
The speech processor of claim 12 , wherein the second circuit comprises : a first adder (background noise signal) configured to add a nominal constant value to the estimate of background noise in a previous frame of the speech signal to generate an absolutely increased estimate value ;
a multiplier configured to multiply the estimate of background noise in a previous frame of the speech signal by a constant value that is marginally greater than one to generate a percentage increased estimate value ;
a second multiplexer coupled to the first adder and the multiplier and configured to receive the absolutely increased estimate value and the percentage increased estimate value ;
a third circuit coupled to the first adder , the second multiplier , and the second multiplexer , and configured to control the second multiplexer to select the larger of the absolutely increased estimate value and the percentage increased estimate value as the increased estimate value .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal (first adder) ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
EP1239456A1
CLAIM 14
The speech processor of claim 12 , wherein the second circuit comprises : a first adder (background noise signal) configured to add a nominal constant value to the estimate of background noise in a previous frame of the speech signal to generate an absolutely increased estimate value ;
a multiplier configured to multiply the estimate of background noise in a previous frame of the speech signal by a constant value that is marginally greater than one to generate a percentage increased estimate value ;
a second multiplexer coupled to the first adder and the multiplier and configured to receive the absolutely increased estimate value and the percentage increased estimate value ;
a third circuit coupled to the first adder , the second multiplier , and the second multiplexer , and configured to control the second multiplexer to select the larger of the absolutely increased estimate value and the percentage increased estimate value as the increased estimate value .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal (first adder) and preventing update of noise energy estimates .
EP1239456A1
CLAIM 14
The speech processor of claim 12 , wherein the second circuit comprises : a first adder (background noise signal) configured to add a nominal constant value to the estimate of background noise in a previous frame of the speech signal to generate an absolutely increased estimate value ;
a multiplier configured to multiply the estimate of background noise in a previous frame of the speech signal by a constant value that is marginally greater than one to generate a percentage increased estimate value ;
a second multiplexer coupled to the first adder and the multiplier and configured to receive the absolutely increased estimate value and the percentage increased estimate value ;
a third circuit coupled to the first adder , the second multiplier , and the second multiplexer , and configured to control the second multiplexer to select the larger of the absolutely increased estimate value and the percentage increased estimate value as the increased estimate value .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US5040217A

Filed: 1989-10-18     Issued: 1991-08-13

Perceptual coding of audio signals

(Original Assignee) Nokia Bell Labs     (Current Assignee) Nokia Bell Labs ; AT&T Corp

Karlheinz Brandenburg, James D. Johnston
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (said blocks) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US5040217A
CLAIM 1
. A method of processing an ordered time sequence of audio signals partitioned into contiguous blocks of samples , each such block having a discrete short-time spectrum , S(ω i) , i=1 , 2 , . . . N , for each of said blocks (frequency spectrum) , comprising predicting , for each block , an estimate of the values for each S(ω i) based on the values for S(ω i) for one or more prior blocks , determining for each frequency , ω i , a randomness metric based on the predicted value for each S(ω i) and the actual value for S(ω i) for each block , based on said randomness metrics , and the distribution of power with frequency in the block , determining the value of a tonality function as a function of frequency , and based on said tonality function , estimating the noise masking threshold at each ω i .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (said blocks) of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US5040217A
CLAIM 1
. A method of processing an ordered time sequence of audio signals partitioned into contiguous blocks of samples , each such block having a discrete short-time spectrum , S(ω i) , i=1 , 2 , . . . N , for each of said blocks (frequency spectrum) , comprising predicting , for each block , an estimate of the values for each S(ω i) based on the values for S(ω i) for one or more prior blocks , determining for each frequency , ω i , a randomness metric based on the predicted value for each S(ω i) and the actual value for S(ω i) for each block , based on said randomness metrics , and the distribution of power with frequency in the block , determining the value of a tonality function as a function of frequency , and based on said tonality function , estimating the noise masking threshold at each ω i .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (function value) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
US5040217A
CLAIM 9
. The method of claim 8 , further comprising modifying said limited threshold function to eliminate any existing pre-echoes , thereby generating an output threshold function value (noise character parameter) for each ω i .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (function value) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group (absolute value) of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US5040217A
CLAIM 6
. The method of claim 5 , wherein said determining of said randomness metric further comprises normalizing said euclidian distance with respect to the sum of the magnitude of said actual magnitude for S(ω i) and the absolute value (second group) of said estimate of S(ω i) .

US5040217A
CLAIM 9
. The method of claim 8 , further comprising modifying said limited threshold function to eliminate any existing pre-echoes , thereby generating an output threshold function value (noise character parameter) for each ω i .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (function value) inferior than a given fixed threshold .
US5040217A
CLAIM 9
. The method of claim 8 , further comprising modifying said limited threshold function to eliminate any existing pre-echoes , thereby generating an output threshold function value (noise character parameter) for each ω i .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (said blocks) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US5040217A
CLAIM 1
. A method of processing an ordered time sequence of audio signals partitioned into contiguous blocks of samples , each such block having a discrete short-time spectrum , S(ω i) , i=1 , 2 , . . . N , for each of said blocks (frequency spectrum) , comprising predicting , for each block , an estimate of the values for each S(ω i) based on the values for S(ω i) for one or more prior blocks , determining for each frequency , ω i , a randomness metric based on the predicted value for each S(ω i) and the actual value for S(ω i) for each block , based on said randomness metrics , and the distribution of power with frequency in the block , determining the value of a tonality function as a function of frequency , and based on said tonality function , estimating the noise masking threshold at each ω i .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (said blocks) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US5040217A
CLAIM 1
. A method of processing an ordered time sequence of audio signals partitioned into contiguous blocks of samples , each such block having a discrete short-time spectrum , S(ω i) , i=1 , 2 , . . . N , for each of said blocks (frequency spectrum) , comprising predicting , for each block , an estimate of the values for each S(ω i) based on the values for S(ω i) for one or more prior blocks , determining for each frequency , ω i , a randomness metric based on the predicted value for each S(ω i) and the actual value for S(ω i) for each block , based on said randomness metrics , and the distribution of power with frequency in the block , determining the value of a tonality function as a function of frequency , and based on said tonality function , estimating the noise masking threshold at each ω i .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (said blocks) of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US5040217A
CLAIM 1
. A method of processing an ordered time sequence of audio signals partitioned into contiguous blocks of samples , each such block having a discrete short-time spectrum , S(ω i) , i=1 , 2 , . . . N , for each of said blocks (frequency spectrum) , comprising predicting , for each block , an estimate of the values for each S(ω i) based on the values for S(ω i) for one or more prior blocks , determining for each frequency , ω i , a randomness metric based on the predicted value for each S(ω i) and the actual value for S(ω i) for each block , based on said randomness metrics , and the distribution of power with frequency in the block , determining the value of a tonality function as a function of frequency , and based on said tonality function , estimating the noise masking threshold at each ω i .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20070136059A1

Filed: 2006-12-01     Issued: 2007-06-14

Multi-voice speech recognition

(Original Assignee) Gadbois Gregory J     

Gregory Gadbois
US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold indicative of sound activity (acoustic models) in the sound signal .
US20070136059A1
CLAIM 2
. Speech recognition method of claim 1 , wherein speech recognition engines are configured with acoustic models (sound activity, sound activity detection, detecting sound activity) and language models and executed on a computer .

US8990073B2
CLAIM 10
. A method for detecting sound activity (acoustic models) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US20070136059A1
CLAIM 2
. Speech recognition method of claim 1 , wherein speech recognition engines are configured with acoustic models (sound activity, sound activity detection, detecting sound activity) and language models and executed on a computer .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity (acoustic models) in the sound signal further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
US20070136059A1
CLAIM 2
. Speech recognition method of claim 1 , wherein speech recognition engines are configured with acoustic models (sound activity, sound activity detection, detecting sound activity) and language models and executed on a computer .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (acoustic models) detection comprises detecting the sound signal based on a frequency dependent signal-to-noise ratio (SNR) .
US20070136059A1
CLAIM 2
. Speech recognition method of claim 1 , wherein speech recognition engines are configured with acoustic models (sound activity, sound activity detection, detecting sound activity) and language models and executed on a computer .

US8990073B2
CLAIM 14
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (acoustic models) detection comprises comparing an average signal-to-noise ratio (SNR av ) to a threshold calculated as a function of a long-term signal-to-noise ratio (SNR LT ) .
US20070136059A1
CLAIM 2
. Speech recognition method of claim 1 , wherein speech recognition engines are configured with acoustic models (sound activity, sound activity detection, detecting sound activity) and language models and executed on a computer .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity (acoustic models) detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
US20070136059A1
CLAIM 2
. Speech recognition method of claim 1 , wherein speech recognition engines are configured with acoustic models (sound activity, sound activity detection, detecting sound activity) and language models and executed on a computer .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity (acoustic models) detection further comprises updating the noise estimates for a next frame .
US20070136059A1
CLAIM 2
. Speech recognition method of claim 1 , wherein speech recognition engines are configured with acoustic models (sound activity, sound activity detection, detecting sound activity) and language models and executed on a computer .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction (speech input) residual error energies .
US20070136059A1
CLAIM 4
. Speech recognition method of claim 1 , wherein the recognizers operate synchronously by processing speech input (linear prediction) frame by frame , the pruning being executed every frame .

US8990073B2
CLAIM 35
. A device for detecting sound activity (acoustic models) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US20070136059A1
CLAIM 2
. Speech recognition method of claim 1 , wherein speech recognition engines are configured with acoustic models (sound activity, sound activity detection, detecting sound activity) and language models and executed on a computer .

US8990073B2
CLAIM 36
. A device for detecting sound activity (acoustic models) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US20070136059A1
CLAIM 2
. Speech recognition method of claim 1 , wherein speech recognition engines are configured with acoustic models (sound activity, sound activity detection, detecting sound activity) and language models and executed on a computer .

US8990073B2
CLAIM 37
. A device as defined in claim 36 , further comprising a signal-to-noise ratio (SNR)-based sound activity (acoustic models) detector .
US20070136059A1
CLAIM 2
. Speech recognition method of claim 1 , wherein speech recognition engines are configured with acoustic models (sound activity, sound activity detection, detecting sound activity) and language models and executed on a computer .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity (acoustic models) detector comprises a comparator of an average signal to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US20070136059A1
CLAIM 2
. Speech recognition method of claim 1 , wherein speech recognition engines are configured with acoustic models (sound activity, sound activity detection, detecting sound activity) and language models and executed on a computer .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity (acoustic models) detector .
US20070136059A1
CLAIM 2
. Speech recognition method of claim 1 , wherein speech recognition engines are configured with acoustic models (sound activity, sound activity detection, detecting sound activity) and language models and executed on a computer .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
WO2007051548A1

Filed: 2006-10-24     Issued: 2007-05-10

Time warped modified transform coding of audio signals

(Original Assignee) Coding Technologies Ab     

Lars Villemoes
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum (sample values) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (two frames) , and an initial value of the long term correlation map .
WO2007051548A1
CLAIM 7
. Encoder in accordance with claim 1 , which is adapted to derive a representation of an audio signal given by a sequence of discrete sample values (current residual spectrum) .

WO2007051548A1
CLAIM 15
. Encoder in accordance with claim 1 , in which the spectral analyzer is adapted to derive the spectral coeffi- cients using a weighted representation of two frames (current frame, average frame, current frame energy, average frame energy) .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum (sample values) comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame (two frames) ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
WO2007051548A1
CLAIM 7
. Encoder in accordance with claim 1 , which is adapted to derive a representation of an audio signal given by a sequence of discrete sample values (current residual spectrum) .

WO2007051548A1
CLAIM 15
. Encoder in accordance with claim 1 , in which the spectral analyzer is adapted to derive the spectral coeffi- cients using a weighted representation of two frames (current frame, average frame, current frame energy, average frame energy) .

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum (sample values) comprises locating a maximum between each pair of two consecutive minima of the current residual spectrum .
WO2007051548A1
CLAIM 7
. Encoder in accordance with claim 1 , which is adapted to derive a representation of an audio signal given by a sequence of discrete sample values (current residual spectrum) .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum (sample values) , calculating a normalized correlation value with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
WO2007051548A1
CLAIM 7
. Encoder in accordance with claim 1 , which is adapted to derive a representation of an audio signal given by a sequence of discrete sample values (current residual spectrum) .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection further comprises updating the noise estimates for a next frame (when r) .
WO2007051548A1
CLAIM 35
. Computer program having a program code for performing , when r (next frame) unning on a computer , a method for deriving a representation of an audio signal having a first frame , a second frame following the first frame , and a third frame following the second frame , the method comprising : estimating first warp information for the first and the second frame and for estimating second warp information for the second frame and the third frame , the warp informa- tion describing a pitch information of the audio signal ;
deriving first spectral coefficients for the first and the second frame using the first warp information and for deriving second spectral coefficients for the second and the third frame using the second warp information ;
and outputting the representation of the audio signal including the first and the second spectral coefficients .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame (when r) comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
WO2007051548A1
CLAIM 35
. Computer program having a program code for performing , when r (next frame) unning on a computer , a method for deriving a representation of an audio signal having a first frame , a second frame following the first frame , and a third frame following the second frame , the method comprising : estimating first warp information for the first and the second frame and for estimating second warp information for the second frame and the third frame , the warp informa- tion describing a pitch information of the audio signal ;
deriving first spectral coefficients for the first and the second frame using the first warp information and for deriving second spectral coefficients for the second and the third frame using the second warp information ;
and outputting the representation of the audio signal including the first and the second spectral coefficients .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame (two frames) energy and an average frame (two frames) energy .
WO2007051548A1
CLAIM 15
. Encoder in accordance with claim 1 , in which the spectral analyzer is adapted to derive the spectral coeffi- cients using a weighted representation of two frames (current frame, average frame, current frame energy, average frame energy) .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame (two frames) and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
WO2007051548A1
CLAIM 15
. Encoder in accordance with claim 1 , in which the spectral analyzer is adapted to derive the spectral coeffi- cients using a weighted representation of two frames (current frame, average frame, current frame energy, average frame energy) .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value (second intermediate) of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
WO2007051548A1
CLAIM 9
. Encoder in accordance with claim 1 , in which the warp estimator is operative to estimate the warp information such that first intermediate warp information of a first corresponding frame and second intermediate (second energy value) warp information of a second corresponding frame . are combined using a combination rule .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum (sample values) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (two frames) , and an initial value of the long-term correlation map .
WO2007051548A1
CLAIM 7
. Encoder in accordance with claim 1 , which is adapted to derive a representation of an audio signal given by a sequence of discrete sample values (current residual spectrum) .

WO2007051548A1
CLAIM 15
. Encoder in accordance with claim 1 , in which the spectral analyzer is adapted to derive the spectral coeffi- cients using a weighted representation of two frames (current frame, average frame, current frame energy, average frame energy) .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum (sample values) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (two frames) , and an initial value of the long-term correlation map .
WO2007051548A1
CLAIM 7
. Encoder in accordance with claim 1 , which is adapted to derive a representation of an audio signal given by a sequence of discrete sample values (current residual spectrum) .

WO2007051548A1
CLAIM 15
. Encoder in accordance with claim 1 , in which the spectral analyzer is adapted to derive the spectral coeffi- cients using a weighted representation of two frames (current frame, average frame, current frame energy, average frame energy) .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum (sample values) comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame (two frames) ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
WO2007051548A1
CLAIM 7
. Encoder in accordance with claim 1 , which is adapted to derive a representation of an audio signal given by a sequence of discrete sample values (current residual spectrum) .

WO2007051548A1
CLAIM 15
. Encoder in accordance with claim 1 , in which the spectral analyzer is adapted to derive the spectral coeffi- cients using a weighted representation of two frames (current frame, average frame, current frame energy, average frame energy) .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20070071089A1

Filed: 2006-09-28     Issued: 2007-03-29

Scalable audio encoding and decoding apparatus, method, and medium

(Original Assignee) Samsung Electronics Co Ltd     (Current Assignee) Samsung Electronics Co Ltd

Dohyung Kim, Miyoung Kim, Shihwa Lee, Sangwook Kim
US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin (high frequency band) by frequency bin basis ;

and summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US20070071089A1
CLAIM 1
. A scalable encoding apparatus comprising : a scalable encoder to encode a base layer , a first enhancement layer , and a second enhancement layer in a frame having the base layer ;
and an encoding frame generator to generate an encoded frame by synthesizing the encoded results , wherein the base layer is a layer to be encoded using a predetermined encoding method , a low frequency band of the frame is a frequency band of the base layer , and a high frequency band (frequency bin, frequency bands, first frequency bands) of the frame is a frequency band of the first enhancement layer .

US8990073B2
CLAIM 10
. A method for detecting sound activity (low frequency band) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US20070071089A1
CLAIM 1
. A scalable encoding apparatus comprising : a scalable encoder to encode a base layer , a first enhancement layer , and a second enhancement layer in a frame having the base layer ;
and an encoding frame generator to generate an encoded frame by synthesizing the encoded results , wherein the base layer is a layer to be encoded using a predetermined encoding method , a low frequency band (detecting sound activity) of the frame is a frequency band of the base layer , and a high frequency band of the frame is a frequency band of the first enhancement layer .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction (scalable decoding method) residual error energies .
US20070071089A1
CLAIM 16
. A scalable decoding method (linear prediction) comprising : dividing an encoded frame into a base layer , a first enhancement layer , and a second enhancement layer ;
and decoding the base layer , the first enhancement layer , and the second enhancement layer , wherein the base layer is a layer to be decoded using a predetermined decoding method , a low frequency band of the frame is a frequency band of the base layer , and a high frequency band of the frame is a frequency band of the first enhancement layer .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame and an energy of the sound signal in a previous frame , for frequency bands (high frequency band) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US20070071089A1
CLAIM 1
. A scalable encoding apparatus comprising : a scalable encoder to encode a base layer , a first enhancement layer , and a second enhancement layer in a frame having the base layer ;
and an encoding frame generator to generate an encoded frame by synthesizing the encoded results , wherein the base layer is a layer to be encoded using a predetermined encoding method , a low frequency band of the frame is a frequency band of the base layer , and a high frequency band (frequency bin, frequency bands, first frequency bands) of the frame is a frequency band of the first enhancement layer .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (high frequency band) into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20070071089A1
CLAIM 1
. A scalable encoding apparatus comprising : a scalable encoder to encode a base layer , a first enhancement layer , and a second enhancement layer in a frame having the base layer ;
and an encoding frame generator to generate an encoded frame by synthesizing the encoded results , wherein the base layer is a layer to be encoded using a predetermined encoding method , a low frequency band of the frame is a frequency band of the base layer , and a high frequency band (frequency bin, frequency bands, first frequency bands) of the frame is a frequency band of the first enhancement layer .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin (high frequency band) by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US20070071089A1
CLAIM 1
. A scalable encoding apparatus comprising : a scalable encoder to encode a base layer , a first enhancement layer , and a second enhancement layer in a frame having the base layer ;
and an encoding frame generator to generate an encoded frame by synthesizing the encoded results , wherein the base layer is a layer to be encoded using a predetermined encoding method , a low frequency band of the frame is a frequency band of the base layer , and a high frequency band (frequency bin, frequency bands, first frequency bands) of the frame is a frequency band of the first enhancement layer .

US8990073B2
CLAIM 35
. A device for detecting sound activity (low frequency band) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US20070071089A1
CLAIM 1
. A scalable encoding apparatus comprising : a scalable encoder to encode a base layer , a first enhancement layer , and a second enhancement layer in a frame having the base layer ;
and an encoding frame generator to generate an encoded frame by synthesizing the encoded results , wherein the base layer is a layer to be encoded using a predetermined encoding method , a low frequency band (detecting sound activity) of the frame is a frequency band of the base layer , and a high frequency band of the frame is a frequency band of the first enhancement layer .

US8990073B2
CLAIM 36
. A device for detecting sound activity (low frequency band) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US20070071089A1
CLAIM 1
. A scalable encoding apparatus comprising : a scalable encoder to encode a base layer , a first enhancement layer , and a second enhancement layer in a frame having the base layer ;
and an encoding frame generator to generate an encoded frame by synthesizing the encoded results , wherein the base layer is a layer to be encoded using a predetermined encoding method , a low frequency band (detecting sound activity) of the frame is a frequency band of the base layer , and a high frequency band of the frame is a frequency band of the first enhancement layer .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
CN1909060A

Filed: 2006-08-01     Issued: 2007-02-07

提取浊音/清音分类信息的方法和设备

(Original Assignee) 三星电子株式会社     

金炫秀
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum (系数计算) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
CN1909060A
CLAIM 17
. 如权利要求16所述的设备,还包括:谐波系数计算 (current residual spectrum) 单元,计算有关的谐波系数,从而最小化使用谐波模型表示的语音信号中的残余信号的能量,所述谐波模型被表示为基本频率的谐波和小的残余的和;以及音调检测单元,提供计算谐波系数所需的音调。

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum (系数计算) comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
CN1909060A
CLAIM 17
. 如权利要求16所述的设备,还包括:谐波系数计算 (current residual spectrum) 单元,计算有关的谐波系数,从而最小化使用谐波模型表示的语音信号中的残余信号的能量,所述谐波模型被表示为基本频率的谐波和小的残余的和;以及音调检测单元,提供计算谐波系数所需的音调。

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum (系数计算) comprises locating a maximum between each pair of two consecutive minima of the current residual spectrum .
CN1909060A
CLAIM 17
. 如权利要求16所述的设备,还包括:谐波系数计算 (current residual spectrum) 单元,计算有关的谐波系数,从而最小化使用谐波模型表示的语音信号中的残余信号的能量,所述谐波模型被表示为基本频率的谐波和小的残余的和;以及音调检测单元,提供计算谐波系数所需的音调。

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum (系数计算) , calculating a normalized correlation value with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
CN1909060A
CLAIM 17
. 如权利要求16所述的设备,还包括:谐波系数计算 (current residual spectrum) 单元,计算有关的谐波系数,从而最小化使用谐波模型表示的语音信号中的残余信号的能量,所述谐波模型被表示为基本频率的谐波和小的残余的和;以及音调检测单元,提供计算谐波系数所需的音调。

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update (残余能) of noise energy estimates when a tonal sound signal is detected .
CN1909060A
CLAIM 3
. 如权利要求2所述的方法,其中,计算谐波信号和除了谐波信号之外的残余信号的步骤包括:计算有关的谐波系数,从而使残余能 (preventing update) 量最小;使用计算的谐波系数来获得谐波信号;以及当已经获得谐波信号时,通过从转换的语音信号中减去谐波信号来计算残余信号。

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (单元计算, 信号计算, 进行计算, 信号中减, 来计算) .
CN1909060A
CLAIM 1
. 一种使用语音信号的谐波分量提取浊音/清音分类信息的方法,该方法包括以下步骤:将输入的语音信号转换为频域的语音信号;从转换的语音信号中计算谐波信号和除了谐波信号之外的残余信号;使用谐波信号和残余信号的计算结果来计算 (SNR calculation) 谐波对残余比率(HRR);以及通过比较HRR和阈值对浊音/清音进行分类。

CN1909060A
CLAIM 3
. 如权利要求2所述的方法,其中,计算谐波信号和除了谐波信号之外的残余信号的步骤包括:计算有关的谐波系数,从而使残余能量最小;使用计算的谐波系数来获得谐波信号;以及当已经获得谐波信号时,通过从转换的语音信号中减 (SNR calculation) 去谐波信号来计算残余信号。

CN1909060A
CLAIM 7
. 如权利要求1所述的方法,其中,计算HRR的步骤包括:使用计算的谐波信号和残余信号获得谐波能量;通过从语音信号的整个能量中减去谐波能量来计算残余能量;以及对算出的谐波能量对算出的残余能量的比率进行计算 (SNR calculation)

CN1909060A
CLAIM 16
. 一种使用语音信号的谐波分量提取浊音/清音分类信息的设备,该设备包括:语音信号输入单元,接收语音信号;频域转换单元,将接收的时域语音信号转换为频域语音信号;谐波残余信号计算 (SNR calculation) 单元,从转换的语音信号中计算谐波信号和除了谐波信号之外的残余信号;以及谐波对残余比率计算单元(HRR),通过使用谐波残余信号计算单元的计算结果计算谐波信号对残余信号的能量比率。

CN1909060A
CLAIM 20
. 如权利要求19所述的设备,其中,谐波对噪声能量比率计算单元计算 (SNR calculation) 所有谐波部分对所有噪声部分的能量比率(HNR)。

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy (谐波模型) value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
CN1909060A
CLAIM 17
. 如权利要求16所述的设备,还包括:谐波系数计算单元,计算有关的谐波系数,从而最小化使用谐波模型 (first energy) 表示的语音信号中的残余信号的能量,所述谐波模型被表示为基本频率的谐波和小的残余的和;以及音调检测单元,提供计算谐波系数所需的音调。

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum (系数计算) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
CN1909060A
CLAIM 17
. 如权利要求16所述的设备,还包括:谐波系数计算 (current residual spectrum) 单元,计算有关的谐波系数,从而最小化使用谐波模型表示的语音信号中的残余信号的能量,所述谐波模型被表示为基本频率的谐波和小的残余的和;以及音调检测单元,提供计算谐波系数所需的音调。

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum (系数计算) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
CN1909060A
CLAIM 17
. 如权利要求16所述的设备,还包括:谐波系数计算 (current residual spectrum) 单元,计算有关的谐波系数,从而最小化使用谐波模型表示的语音信号中的残余信号的能量,所述谐波模型被表示为基本频率的谐波和小的残余的和;以及音调检测单元,提供计算谐波系数所需的音调。

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum (系数计算) comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
CN1909060A
CLAIM 17
. 如权利要求16所述的设备,还包括:谐波系数计算 (current residual spectrum) 单元,计算有关的谐波系数,从而最小化使用谐波模型表示的语音信号中的残余信号的能量,所述谐波模型被表示为基本频率的谐波和小的残余的和;以及音调检测单元,提供计算谐波系数所需的音调。

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal and preventing update (残余能) of noise energy estimates .
CN1909060A
CLAIM 3
. 如权利要求2所述的方法,其中,计算谐波信号和除了谐波信号之外的残余信号的步骤包括:计算有关的谐波系数,从而使残余能 (preventing update) 量最小;使用计算的谐波系数来获得谐波信号;以及当已经获得谐波信号时,通过从转换的语音信号中减去谐波信号来计算残余信号。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
JP2007065636A

Filed: 2006-07-31     Issued: 2007-03-15

音声通信システムにおいて快適雑音を生成する方法および装置

(Original Assignee) Motorola Inc; モトローラ・インコーポレイテッドMotorola Incorporated     

James P Ashley, Edgardo M Cruz-Zeno, エム.クルーズ−ジーノ エドガード, ピー.アシュリー ジェームズ
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (音声信号) using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
JP2007065636A
CLAIM 6
複数の情報フレームから音声信号 (sound signal) を生成する工程、および 音声活動検出に基づいて快適雑音信号と音声信号との間で切り替えを行うことにより出力信号を生成する工程、をさらに含む請求項1に記載の方法。

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal (音声信号) in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
JP2007065636A
CLAIM 6
複数の情報フレームから音声信号 (sound signal) を生成する工程、および 音声活動検出に基づいて快適雑音信号と音声信号との間で切り替えを行うことにより出力信号を生成する工程、をさらに含む請求項1に記載の方法。

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (音声信号) .
JP2007065636A
CLAIM 6
複数の情報フレームから音声信号 (sound signal) を生成する工程、および 音声活動検出に基づいて快適雑音信号と音声信号との間で切り替えを行うことにより出力信号を生成する工程、をさらに含む請求項1に記載の方法。

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (音声信号) comprises searching in the correlation map for frequency bins having a magnitude that exceeds a given fixed threshold .
JP2007065636A
CLAIM 6
複数の情報フレームから音声信号 (sound signal) を生成する工程、および 音声活動検出に基づいて快適雑音信号と音声信号との間で切り替えを行うことにより出力信号を生成する工程、をさらに含む請求項1に記載の方法。

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (音声信号) comprises comparing the summed long-term correlation map with an adaptive threshold indicative of sound activity in the sound signal .
JP2007065636A
CLAIM 6
複数の情報フレームから音声信号 (sound signal) を生成する工程、および 音声活動検出に基づいて快適雑音信号と音声信号との間で切り替えを行うことにより出力信号を生成する工程、をさらに含む請求項1に記載の方法。

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal (音声信号) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
JP2007065636A
CLAIM 6
複数の情報フレームから音声信号 (sound signal) を生成する工程、および 音声活動検出に基づいて快適雑音信号と音声信号との間で切り替えを行うことにより出力信号を生成する工程、をさらに含む請求項1に記載の方法。

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound signal (音声信号) is detected .
JP2007065636A
CLAIM 6
複数の情報フレームから音声信号 (sound signal) を生成する工程、および 音声活動検出に基づいて快適雑音信号と音声信号との間で切り替えを行うことにより出力信号を生成する工程、をさらに含む請求項1に記載の方法。

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity in the sound signal (音声信号) further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
JP2007065636A
CLAIM 6
複数の情報フレームから音声信号 (sound signal) を生成する工程、および 音声活動検出に基づいて快適雑音信号と音声信号との間で切り替えを行うことにより出力信号を生成する工程、をさらに含む請求項1に記載の方法。

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection comprises detecting the sound signal (音声信号) based on a frequency dependent signal-to-noise ratio (SNR) .
JP2007065636A
CLAIM 6
複数の情報フレームから音声信号 (sound signal) を生成する工程、および 音声活動検出に基づいて快適雑音信号と音声信号との間で切り替えを行うことにより出力信号を生成する工程、をさらに含む請求項1に記載の方法。

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal (音声信号) further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
JP2007065636A
CLAIM 6
複数の情報フレームから音声信号 (sound signal) を生成する工程、および 音声活動検出に基づいて快適雑音信号と音声信号との間で切り替えを行うことにより出力信号を生成する工程、をさらに含む請求項1に記載の方法。

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal (音声信号) and a ratio between a second order and a sixteenth order (システム) of linear prediction residual error energies .
JP2007065636A
CLAIM 1
音声通信システム (sixteenth order) で快適雑音を生成する方法であって、 音声に背景雑音を加えたものを示す複数の情報フレームを受け取る工程、 前記複数の情報フレームに基づいて1または複数の背景雑音特性を推定する工程、および 前記1または複数の背景雑音特性に基づいて快適雑音信号を生成する工程、 を含む方法。

JP2007065636A
CLAIM 6
複数の情報フレームから音声信号 (sound signal) を生成する工程、および 音声活動検出に基づいて快適雑音信号と音声信号との間で切り替えを行うことにより出力信号を生成する工程、をさらに含む請求項1に記載の方法。

US8990073B2
CLAIM 18
. A method as defined in claim 14 , comprising classifying the sound signal (音声信号) as one of an inactive sound signal and active sound signal , which comprises determining an inactive sound signal when the average signal-to-noise ratio (SNR av ) is inferior to the calculated threshold .
JP2007065636A
CLAIM 6
複数の情報フレームから音声信号 (sound signal) を生成する工程、および 音声活動検出に基づいて快適雑音信号と音声信号との間で切り替えを行うことにより出力信号を生成する工程、をさらに含む請求項1に記載の方法。

US8990073B2
CLAIM 19
. A method as defined in claim 14 , comprising classifying the sound signal (音声信号) as one of an inactive sound signal and active sound signal , which comprises determining an active sound signal when the average signal-to-noise ratio (SNR av ) is larger than the calculated threshold .
JP2007065636A
CLAIM 6
複数の情報フレームから音声信号 (sound signal) を生成する工程、および 音声活動検出に基づいて快適雑音信号と音声信号との間で切り替えを行うことにより出力信号を生成する工程、をさらに含む請求項1に記載の方法。

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal (音声信号) prevents updating of noise energy estimates when a music signal is detected .
JP2007065636A
CLAIM 6
複数の情報フレームから音声信号 (sound signal) を生成する工程、および 音声活動検出に基づいて快適雑音信号と音声信号との間で切り替えを行うことにより出力信号を生成する工程、をさらに含む請求項1に記載の方法。

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal (音声信号) in a current frame and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
JP2007065636A
CLAIM 6
複数の情報フレームから音声信号 (sound signal) を生成する工程、および 音声活動検出に基づいて快適雑音信号と音声信号との間で切り替えを行うことにより出力信号を生成する工程、をさらに含む請求項1に記載の方法。

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter indicative of an activity of the sound signal (音声信号) .
JP2007065636A
CLAIM 6
複数の情報フレームから音声信号 (sound signal) を生成する工程、および 音声活動検出に基づいて快適雑音信号と音声信号との間で切り替えを行うことにより出力信号を生成する工程、をさらに含む請求項1に記載の方法。

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal (音声信号) and the complementary non-stationarity parameter .
JP2007065636A
CLAIM 6
複数の情報フレームから音声信号 (sound signal) を生成する工程、および 音声活動検出に基づいて快適雑音信号と音声信号との間で切り替えを行うことにより出力信号を生成する工程、をさらに含む請求項1に記載の方法。

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal (音声信号) using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
JP2007065636A
CLAIM 6
複数の情報フレームから音声信号 (sound signal) を生成する工程、および 音声活動検出に基づいて快適雑音信号と音声信号との間で切り替えを行うことにより出力信号を生成する工程、をさらに含む請求項1に記載の方法。

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal (音声信号) using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
JP2007065636A
CLAIM 6
複数の情報フレームから音声信号 (sound signal) を生成する工程、および 音声活動検出に基づいて快適雑音信号と音声信号との間で切り替えを行うことにより出力信号を生成する工程、をさらに含む請求項1に記載の方法。

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal (音声信号) in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
JP2007065636A
CLAIM 6
複数の情報フレームから音声信号 (sound signal) を生成する工程、および 音声活動検出に基づいて快適雑音信号と音声信号との間で切り替えを行うことにより出力信号を生成する工程、をさらに含む請求項1に記載の方法。

US8990073B2
CLAIM 34
. A device as defined in claim 31 , further comprising a detector of strong tones in the sound signal (音声信号) .
JP2007065636A
CLAIM 6
複数の情報フレームから音声信号 (sound signal) を生成する工程、および 音声活動検出に基づいて快適雑音信号と音声信号との間で切り替えを行うことにより出力信号を生成する工程、をさらに含む請求項1に記載の方法。

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal (音声信号) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
JP2007065636A
CLAIM 6
複数の情報フレームから音声信号 (sound signal) を生成する工程、および 音声活動検出に基づいて快適雑音信号と音声信号との間で切り替えを行うことにより出力信号を生成する工程、をさらに含む請求項1に記載の方法。

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal (音声信号) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
JP2007065636A
CLAIM 6
複数の情報フレームから音声信号 (sound signal) を生成する工程、および 音声活動検出に基づいて快適雑音信号と音声信号との間で切り替えを行うことにより出力信号を生成する工程、をさらに含む請求項1に記載の方法。

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal (音声信号) for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates .
JP2007065636A
CLAIM 6
複数の情報フレームから音声信号 (sound signal) を生成する工程、および 音声活動検出に基づいて快適雑音信号と音声信号との間で切り替えを行うことにより出力信号を生成する工程、をさらに含む請求項1に記載の方法。

US8990073B2
CLAIM 41
. A device as defined in claim 36 , further comprising a calculator of a spectral parameter used for detecting spectral changes and spectral attacks in the sound signal (音声信号) .
JP2007065636A
CLAIM 6
複数の情報フレームから音声信号 (sound signal) を生成する工程、および 音声活動検出に基づいて快適雑音信号と音声信号との間で切り替えを行うことにより出力信号を生成する工程、をさらに含む請求項1に記載の方法。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
CN1905006A

Filed: 2006-07-27     Issued: 2007-01-31

噪声抑制系统与方法及程序

(Original Assignee) 日本电气株式会社     

荒川隆行, 辻川刚范
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (进行上述) using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (平均值) of the long term correlation map .
CN1905006A
CLAIM 3
. 如权利要求1所述的噪声抑制系统,其特征在于:上述修正暂时推定声音的机构,假设概率分布为上述标准模式,根据构成上述标准模式的概率分布输出暂时推定声音的概率、以及构成上述标准模式的概率分布的平均值 (initial value, term value) ,求出声音期待值,将上述声音期待值设为暂时推定声音的修正值。

CN1905006A
CLAIM 6
. 如权利要求5所述的噪声抑制系统,其特征在于:包括根据上述噪声的标准偏差计算出上述暂时推定声音与暂时推定声音的可靠度的机构,考虑上述暂时推定声音的值与暂时推定声音的可靠度,来进行上述 (sound signal, sound signal prevents updating) 暂时推定声音的修正。

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal (进行上述) in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
CN1905006A
CLAIM 6
. 如权利要求5所述的噪声抑制系统,其特征在于:包括根据上述噪声的标准偏差计算出上述暂时推定声音与暂时推定声音的可靠度的机构,考虑上述暂时推定声音的值与暂时推定声音的可靠度,来进行上述 (sound signal, sound signal prevents updating) 暂时推定声音的修正。

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (进行上述) .
CN1905006A
CLAIM 6
. 如权利要求5所述的噪声抑制系统,其特征在于:包括根据上述噪声的标准偏差计算出上述暂时推定声音与暂时推定声音的可靠度的机构,考虑上述暂时推定声音的值与暂时推定声音的可靠度,来进行上述 (sound signal, sound signal prevents updating) 暂时推定声音的修正。

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (进行上述) comprises searching in the correlation map for frequency bins having a magnitude that exceeds a given fixed threshold .
CN1905006A
CLAIM 6
. 如权利要求5所述的噪声抑制系统,其特征在于:包括根据上述噪声的标准偏差计算出上述暂时推定声音与暂时推定声音的可靠度的机构,考虑上述暂时推定声音的值与暂时推定声音的可靠度,来进行上述 (sound signal, sound signal prevents updating) 暂时推定声音的修正。

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (进行上述) comprises comparing the summed long-term correlation map with an adaptive threshold indicative of sound activity in the sound signal .
CN1905006A
CLAIM 6
. 如权利要求5所述的噪声抑制系统,其特征在于:包括根据上述噪声的标准偏差计算出上述暂时推定声音与暂时推定声音的可靠度的机构,考虑上述暂时推定声音的值与暂时推定声音的可靠度,来进行上述 (sound signal, sound signal prevents updating) 暂时推定声音的修正。

US8990073B2
CLAIM 10
. A method for detecting sound activity (声音信号) in a sound signal (进行上述) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
CN1905006A
CLAIM 6
. 如权利要求5所述的噪声抑制系统,其特征在于:包括根据上述噪声的标准偏差计算出上述暂时推定声音与暂时推定声音的可靠度的机构,考虑上述暂时推定声音的值与暂时推定声音的可靠度,来进行上述 (sound signal, sound signal prevents updating) 暂时推定声音的修正。

CN1905006A
CLAIM 19
. 一种声音识别装置,其特征在于:具有如权利要求1所述的噪声抑制系统,上述噪声抑制系统中,包含将噪声被抑制的声音信号 (detecting sound activity) 输入并进行声音识别的机构。

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound signal (进行上述) is detected .
CN1905006A
CLAIM 6
. 如权利要求5所述的噪声抑制系统,其特征在于:包括根据上述噪声的标准偏差计算出上述暂时推定声音与暂时推定声音的可靠度的机构,考虑上述暂时推定声音的值与暂时推定声音的可靠度,来进行上述 (sound signal, sound signal prevents updating) 暂时推定声音的修正。

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity in the sound signal (进行上述) further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
CN1905006A
CLAIM 6
. 如权利要求5所述的噪声抑制系统,其特征在于:包括根据上述噪声的标准偏差计算出上述暂时推定声音与暂时推定声音的可靠度的机构,考虑上述暂时推定声音的值与暂时推定声音的可靠度,来进行上述 (sound signal, sound signal prevents updating) 暂时推定声音的修正。

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection comprises detecting the sound signal (进行上述) based on a frequency dependent signal-to-noise ratio (SNR) .
CN1905006A
CLAIM 6
. 如权利要求5所述的噪声抑制系统,其特征在于:包括根据上述噪声的标准偏差计算出上述暂时推定声音与暂时推定声音的可靠度的机构,考虑上述暂时推定声音的值与暂时推定声音的可靠度,来进行上述 (sound signal, sound signal prevents updating) 暂时推定声音的修正。

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal (进行上述) further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (信号计算) .
CN1905006A
CLAIM 1
. 一种噪声抑制系统,其特征在于,包括:根据输入信号计算 (SNR calculation) 出噪声平均频谱的机构;根据上述输入信号与上述噪声平均频谱,在频谱区域中求出暂时推定声音的机构;以及,使用预先存储在存储部中的声音的标准模式,修正上述暂时推定声音的机构。

CN1905006A
CLAIM 6
. 如权利要求5所述的噪声抑制系统,其特征在于:包括根据上述噪声的标准偏差计算出上述暂时推定声音与暂时推定声音的可靠度的机构,考虑上述暂时推定声音的值与暂时推定声音的可靠度,来进行上述 (sound signal, sound signal prevents updating) 暂时推定声音的修正。

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal (进行上述) and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
CN1905006A
CLAIM 6
. 如权利要求5所述的噪声抑制系统,其特征在于:包括根据上述噪声的标准偏差计算出上述暂时推定声音与暂时推定声音的可靠度的机构,考虑上述暂时推定声音的值与暂时推定声音的可靠度,来进行上述 (sound signal, sound signal prevents updating) 暂时推定声音的修正。

US8990073B2
CLAIM 18
. A method as defined in claim 14 , comprising classifying the sound signal (进行上述) as one of an inactive sound signal and active sound signal , which comprises determining an inactive sound signal when the average signal-to-noise ratio (SNR av ) is inferior to the calculated threshold .
CN1905006A
CLAIM 6
. 如权利要求5所述的噪声抑制系统,其特征在于:包括根据上述噪声的标准偏差计算出上述暂时推定声音与暂时推定声音的可靠度的机构,考虑上述暂时推定声音的值与暂时推定声音的可靠度,来进行上述 (sound signal, sound signal prevents updating) 暂时推定声音的修正。

US8990073B2
CLAIM 19
. A method as defined in claim 14 , comprising classifying the sound signal (进行上述) as one of an inactive sound signal and active sound signal , which comprises determining an active sound signal when the average signal-to-noise ratio (SNR av ) is larger than the calculated threshold .
CN1905006A
CLAIM 6
. 如权利要求5所述的噪声抑制系统,其特征在于:包括根据上述噪声的标准偏差计算出上述暂时推定声音与暂时推定声音的可靠度的机构,考虑上述暂时推定声音的值与暂时推定声音的可靠度,来进行上述 (sound signal, sound signal prevents updating) 暂时推定声音的修正。

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal (进行上述) prevents updating of noise energy estimates when a music signal is detected .
CN1905006A
CLAIM 6
. 如权利要求5所述的噪声抑制系统,其特征在于:包括根据上述噪声的标准偏差计算出上述暂时推定声音与暂时推定声音的可靠度的机构,考虑上述暂时推定声音的值与暂时推定声音的可靠度,来进行上述 (sound signal, sound signal prevents updating) 暂时推定声音的修正。

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal (进行上述) in a current frame and an energy of the sound signal in a previous frame , for frequency bands (频率方向) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
CN1905006A
CLAIM 6
. 如权利要求5所述的噪声抑制系统,其特征在于:包括根据上述噪声的标准偏差计算出上述暂时推定声音与暂时推定声音的可靠度的机构,考虑上述暂时推定声音的值与暂时推定声音的可靠度,来进行上述 (sound signal, sound signal prevents updating) 暂时推定声音的修正。

CN1905006A
CLAIM 9
. 如权利要求7所述的噪声抑制系统,其特征在于:上述导出噪声降低滤波器的机构,对修正过的推定声音或将修正过的推定声音除以噪声的平均频谱所得到的先验SNR,在时间方向、频率方向 (frequency bands) 以及特征矢量维数中的至少1个方向上进行平滑化。

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter indicative of an activity of the sound signal (进行上述) .
CN1905006A
CLAIM 6
. 如权利要求5所述的噪声抑制系统,其特征在于:包括根据上述噪声的标准偏差计算出上述暂时推定声音与暂时推定声音的可靠度的机构,考虑上述暂时推定声音的值与暂时推定声音的可靠度,来进行上述 (sound signal, sound signal prevents updating) 暂时推定声音的修正。

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal (进行上述) and the complementary non-stationarity parameter .
CN1905006A
CLAIM 6
. 如权利要求5所述的噪声抑制系统,其特征在于:包括根据上述噪声的标准偏差计算出上述暂时推定声音与暂时推定声音的可靠度的机构,考虑上述暂时推定声音的值与暂时推定声音的可靠度,来进行上述 (sound signal, sound signal prevents updating) 暂时推定声音的修正。

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (频率方向) into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value (概率分布) for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
CN1905006A
CLAIM 3
. 如权利要求1所述的噪声抑制系统,其特征在于:上述修正暂时推定声音的机构,假设概率分布 (first energy value) 为上述标准模式,根据构成上述标准模式的概率分布输出暂时推定声音的概率、以及构成上述标准模式的概率分布的平均值,求出声音期待值,将上述声音期待值设为暂时推定声音的修正值。

CN1905006A
CLAIM 9
. 如权利要求7所述的噪声抑制系统,其特征在于:上述导出噪声降低滤波器的机构,对修正过的推定声音或将修正过的推定声音除以噪声的平均频谱所得到的先验SNR,在时间方向、频率方向 (frequency bands) 以及特征矢量维数中的至少1个方向上进行平滑化。

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal (进行上述) using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (平均值) of the long-term correlation map .
CN1905006A
CLAIM 3
. 如权利要求1所述的噪声抑制系统,其特征在于:上述修正暂时推定声音的机构,假设概率分布为上述标准模式,根据构成上述标准模式的概率分布输出暂时推定声音的概率、以及构成上述标准模式的概率分布的平均值 (initial value, term value) ,求出声音期待值,将上述声音期待值设为暂时推定声音的修正值。

CN1905006A
CLAIM 6
. 如权利要求5所述的噪声抑制系统,其特征在于:包括根据上述噪声的标准偏差计算出上述暂时推定声音与暂时推定声音的可靠度的机构,考虑上述暂时推定声音的值与暂时推定声音的可靠度,来进行上述 (sound signal, sound signal prevents updating) 暂时推定声音的修正。

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal (进行上述) using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (平均值) of the long-term correlation map .
CN1905006A
CLAIM 3
. 如权利要求1所述的噪声抑制系统,其特征在于:上述修正暂时推定声音的机构,假设概率分布为上述标准模式,根据构成上述标准模式的概率分布输出暂时推定声音的概率、以及构成上述标准模式的概率分布的平均值 (initial value, term value) ,求出声音期待值,将上述声音期待值设为暂时推定声音的修正值。

CN1905006A
CLAIM 6
. 如权利要求5所述的噪声抑制系统,其特征在于:包括根据上述噪声的标准偏差计算出上述暂时推定声音与暂时推定声音的可靠度的机构,考虑上述暂时推定声音的值与暂时推定声音的可靠度,来进行上述 (sound signal, sound signal prevents updating) 暂时推定声音的修正。

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal (进行上述) in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
CN1905006A
CLAIM 6
. 如权利要求5所述的噪声抑制系统,其特征在于:包括根据上述噪声的标准偏差计算出上述暂时推定声音与暂时推定声音的可靠度的机构,考虑上述暂时推定声音的值与暂时推定声音的可靠度,来进行上述 (sound signal, sound signal prevents updating) 暂时推定声音的修正。

US8990073B2
CLAIM 34
. A device as defined in claim 31 , further comprising a detector of strong tones in the sound signal (进行上述) .
CN1905006A
CLAIM 6
. 如权利要求5所述的噪声抑制系统,其特征在于:包括根据上述噪声的标准偏差计算出上述暂时推定声音与暂时推定声音的可靠度的机构,考虑上述暂时推定声音的值与暂时推定声音的可靠度,来进行上述 (sound signal, sound signal prevents updating) 暂时推定声音的修正。

US8990073B2
CLAIM 35
. A device for detecting sound activity (声音信号) in a sound signal (进行上述) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
CN1905006A
CLAIM 6
. 如权利要求5所述的噪声抑制系统,其特征在于:包括根据上述噪声的标准偏差计算出上述暂时推定声音与暂时推定声音的可靠度的机构,考虑上述暂时推定声音的值与暂时推定声音的可靠度,来进行上述 (sound signal, sound signal prevents updating) 暂时推定声音的修正。

CN1905006A
CLAIM 19
. 一种声音识别装置,其特征在于:具有如权利要求1所述的噪声抑制系统,上述噪声抑制系统中,包含将噪声被抑制的声音信号 (detecting sound activity) 输入并进行声音识别的机构。

US8990073B2
CLAIM 36
. A device for detecting sound activity (声音信号) in a sound signal (进行上述) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
CN1905006A
CLAIM 6
. 如权利要求5所述的噪声抑制系统,其特征在于:包括根据上述噪声的标准偏差计算出上述暂时推定声音与暂时推定声音的可靠度的机构,考虑上述暂时推定声音的值与暂时推定声音的可靠度,来进行上述 (sound signal, sound signal prevents updating) 暂时推定声音的修正。

CN1905006A
CLAIM 19
. 一种声音识别装置,其特征在于:具有如权利要求1所述的噪声抑制系统,上述噪声抑制系统中,包含将噪声被抑制的声音信号 (detecting sound activity) 输入并进行声音识别的机构。

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal (进行上述) for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates .
CN1905006A
CLAIM 6
. 如权利要求5所述的噪声抑制系统,其特征在于:包括根据上述噪声的标准偏差计算出上述暂时推定声音与暂时推定声音的可靠度的机构,考虑上述暂时推定声音的值与暂时推定声音的可靠度,来进行上述 (sound signal, sound signal prevents updating) 暂时推定声音的修正。

US8990073B2
CLAIM 41
. A device as defined in claim 36 , further comprising a calculator of a spectral parameter used for detecting spectral changes and spectral attacks in the sound signal (进行上述) .
CN1905006A
CLAIM 6
. 如权利要求5所述的噪声抑制系统,其特征在于:包括根据上述噪声的标准偏差计算出上述暂时推定声音与暂时推定声音的可靠度的机构,考虑上述暂时推定声音的值与暂时推定声音的可靠度,来进行上述 (sound signal, sound signal prevents updating) 暂时推定声音的修正。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20070016411A1

Filed: 2006-07-12     Issued: 2007-01-18

Method and apparatus to encode/decode low bit-rate audio signal

(Original Assignee) Samsung Electronics Co Ltd     (Current Assignee) Samsung Electronics Co Ltd

Junghoe Kim, Eunmi Oh, Boria Kudryashov, Koostantin Osipoy
US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (high frequency component, high frequency band, more bands, specific band) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US20070016411A1
CLAIM 1
. A method of encoding a low bit-rate audio signal , the method comprising : quantizing and losslessly-encoding a specific frequency component of an audio signal in a frequency domain ;
generating codebooks using the audio signal in the frequency domain ;
detecting an envelope of a frequency component of the audio signal other than the specific frequency component in a specific band (frequency bins, frequency bin, frequency bands, first frequency bands) unit and quantizing and losslessly-encoding the envelope ;
selecting a codebook that is most similar to the other frequency component to be encoded from the codebooks and determining a codebook index (fine structure) ;
losslessly-encoding the determined codebook index ;
and generating a bit stream using losslessly-encoded data generated in the lossless-encoding of the specific frequency component , the envelope , and the determined codebook index .

US20070016411A1
CLAIM 4
. A method of encoding a low bit-rate audio signal , the method comprising : quantizing and losslessly-encoding a significant frequency component of an audio signal in a frequency domain ;
generating codebooks using the audio signal in the frequency domain ;
detecting an envelope of a frequency component of the audio signal other than the significant frequency component in a specific band unit and quantizing and losslessly-encoding the detected envelope of the other frequency component ;
checking whether a codebook having at least a predetermined similarity exists among the generated codebooks with respect to a high frequency band (frequency bins, frequency bin, frequency bands, first frequency bands) to be encoded ;
if the similar codebook exists , selecting the similar codebook , determining a codebook index , and losslessly-encoding the determined codebook index and information indicating that the similar codebook exists ;
if a similar codebook does not exist , losslessly-encoding information indicating that a similar codebook does not exist ;
and generating a bit stream using losslessly-encoded data generated in the lossless encoding of the significant frequency component , the envelope of the other frequency component , the determined codebook index , and the information indicating that the similar codebook does not exist .

US20070016411A1
CLAIM 21
. An encoding apparatus , comprising : a first quantizing/encoding unit to quantize a first frequency component of a full spectrum of an audio signal and to encode the quantized first frequency component ;
a second quantizing/encoding unit to quantize one or more envelopes of one or more bands (frequency bins, frequency bin, frequency bands, first frequency bands) of a second frequency component of the full spectrum and to encode the quantized one or more envelopes ;
a codebook unit to generate one or more codebooks from one or more bands of the first frequency component , to determine whether a similar codebook exists for each of the bands of the second frequency component , and to encode codebook similarity information to indicate similarities between the bands of the second frequency components and the codebooks ;
and a bit stream unit to generate a bitstream including the encoded first frequency component , the encoded envelopes of the bands of the second frequency components , and the encoded similarity information .

US20070016411A1
CLAIM 26
. A method of decoding a low bit-rate audio signal , the method comprising : restoring and dividing a bit stream into a significant frequency component and a frequency component other than the significant frequency component ;
losslessly-decoding and inversely quantizing the significant frequency component ;
losslessly-decoding information as to whether a similar codebook exists ;
if a similar codebook exists , restoring codebook index information and envelope information about the other frequency component ;
generating codebooks using the significant frequency component which is lossless-decoded and inversely quantized and restoring a high frequency component (frequency bins, frequency bin, frequency bands, first frequency bands) using the restored codebook index information and the restored envelope information about the other frequency component ;
and if a similar codebook does not exist , restoring the envelope information and restoring the other frequency component using a signal of a previous band and the restored envelope information .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin (high frequency component, high frequency band, more bands, specific band) by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (high frequency component, high frequency band, more bands, specific band) so as to produce a summed long-term correlation map .
US20070016411A1
CLAIM 1
. A method of encoding a low bit-rate audio signal , the method comprising : quantizing and losslessly-encoding a specific frequency component of an audio signal in a frequency domain ;
generating codebooks using the audio signal in the frequency domain ;
detecting an envelope of a frequency component of the audio signal other than the specific frequency component in a specific band (frequency bins, frequency bin, frequency bands, first frequency bands) unit and quantizing and losslessly-encoding the envelope ;
selecting a codebook that is most similar to the other frequency component to be encoded from the codebooks and determining a codebook index (fine structure) ;
losslessly-encoding the determined codebook index ;
and generating a bit stream using losslessly-encoded data generated in the lossless-encoding of the specific frequency component , the envelope , and the determined codebook index .

US20070016411A1
CLAIM 4
. A method of encoding a low bit-rate audio signal , the method comprising : quantizing and losslessly-encoding a significant frequency component of an audio signal in a frequency domain ;
generating codebooks using the audio signal in the frequency domain ;
detecting an envelope of a frequency component of the audio signal other than the significant frequency component in a specific band unit and quantizing and losslessly-encoding the detected envelope of the other frequency component ;
checking whether a codebook having at least a predetermined similarity exists among the generated codebooks with respect to a high frequency band (frequency bins, frequency bin, frequency bands, first frequency bands) to be encoded ;
if the similar codebook exists , selecting the similar codebook , determining a codebook index , and losslessly-encoding the determined codebook index and information indicating that the similar codebook exists ;
if a similar codebook does not exist , losslessly-encoding information indicating that a similar codebook does not exist ;
and generating a bit stream using losslessly-encoded data generated in the lossless encoding of the significant frequency component , the envelope of the other frequency component , the determined codebook index , and the information indicating that the similar codebook does not exist .

US20070016411A1
CLAIM 21
. An encoding apparatus , comprising : a first quantizing/encoding unit to quantize a first frequency component of a full spectrum of an audio signal and to encode the quantized first frequency component ;
a second quantizing/encoding unit to quantize one or more envelopes of one or more bands (frequency bins, frequency bin, frequency bands, first frequency bands) of a second frequency component of the full spectrum and to encode the quantized one or more envelopes ;
a codebook unit to generate one or more codebooks from one or more bands of the first frequency component , to determine whether a similar codebook exists for each of the bands of the second frequency component , and to encode codebook similarity information to indicate similarities between the bands of the second frequency components and the codebooks ;
and a bit stream unit to generate a bitstream including the encoded first frequency component , the encoded envelopes of the bands of the second frequency components , and the encoded similarity information .

US20070016411A1
CLAIM 26
. A method of decoding a low bit-rate audio signal , the method comprising : restoring and dividing a bit stream into a significant frequency component and a frequency component other than the significant frequency component ;
losslessly-decoding and inversely quantizing the significant frequency component ;
losslessly-decoding information as to whether a similar codebook exists ;
if a similar codebook exists , restoring codebook index information and envelope information about the other frequency component ;
generating codebooks using the significant frequency component which is lossless-decoded and inversely quantized and restoring a high frequency component (frequency bins, frequency bin, frequency bands, first frequency bands) using the restored codebook index information and the restored envelope information about the other frequency component ;
and if a similar codebook does not exist , restoring the envelope information and restoring the other frequency component using a signal of a previous band and the restored envelope information .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map for frequency bins (high frequency component, high frequency band, more bands, specific band) having a magnitude that exceeds a given fixed threshold .
US20070016411A1
CLAIM 1
. A method of encoding a low bit-rate audio signal , the method comprising : quantizing and losslessly-encoding a specific frequency component of an audio signal in a frequency domain ;
generating codebooks using the audio signal in the frequency domain ;
detecting an envelope of a frequency component of the audio signal other than the specific frequency component in a specific band (frequency bins, frequency bin, frequency bands, first frequency bands) unit and quantizing and losslessly-encoding the envelope ;
selecting a codebook that is most similar to the other frequency component to be encoded from the codebooks and determining a codebook index (fine structure) ;
losslessly-encoding the determined codebook index ;
and generating a bit stream using losslessly-encoded data generated in the lossless-encoding of the specific frequency component , the envelope , and the determined codebook index .

US20070016411A1
CLAIM 4
. A method of encoding a low bit-rate audio signal , the method comprising : quantizing and losslessly-encoding a significant frequency component of an audio signal in a frequency domain ;
generating codebooks using the audio signal in the frequency domain ;
detecting an envelope of a frequency component of the audio signal other than the significant frequency component in a specific band unit and quantizing and losslessly-encoding the detected envelope of the other frequency component ;
checking whether a codebook having at least a predetermined similarity exists among the generated codebooks with respect to a high frequency band (frequency bins, frequency bin, frequency bands, first frequency bands) to be encoded ;
if the similar codebook exists , selecting the similar codebook , determining a codebook index , and losslessly-encoding the determined codebook index and information indicating that the similar codebook exists ;
if a similar codebook does not exist , losslessly-encoding information indicating that a similar codebook does not exist ;
and generating a bit stream using losslessly-encoded data generated in the lossless encoding of the significant frequency component , the envelope of the other frequency component , the determined codebook index , and the information indicating that the similar codebook does not exist .

US20070016411A1
CLAIM 21
. An encoding apparatus , comprising : a first quantizing/encoding unit to quantize a first frequency component of a full spectrum of an audio signal and to encode the quantized first frequency component ;
a second quantizing/encoding unit to quantize one or more envelopes of one or more bands (frequency bins, frequency bin, frequency bands, first frequency bands) of a second frequency component of the full spectrum and to encode the quantized one or more envelopes ;
a codebook unit to generate one or more codebooks from one or more bands of the first frequency component , to determine whether a similar codebook exists for each of the bands of the second frequency component , and to encode codebook similarity information to indicate similarities between the bands of the second frequency components and the codebooks ;
and a bit stream unit to generate a bitstream including the encoded first frequency component , the encoded envelopes of the bands of the second frequency components , and the encoded similarity information .

US20070016411A1
CLAIM 26
. A method of decoding a low bit-rate audio signal , the method comprising : restoring and dividing a bit stream into a significant frequency component and a frequency component other than the significant frequency component ;
losslessly-decoding and inversely quantizing the significant frequency component ;
losslessly-decoding information as to whether a similar codebook exists ;
if a similar codebook exists , restoring codebook index information and envelope information about the other frequency component ;
generating codebooks using the significant frequency component which is lossless-decoded and inversely quantized and restoring a high frequency component (frequency bins, frequency bin, frequency bands, first frequency bands) using the restored codebook index information and the restored envelope information about the other frequency component ;
and if a similar codebook does not exist , restoring the envelope information and restoring the other frequency component using a signal of a previous band and the restored envelope information .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold indicative of sound activity (more codebooks) in the sound signal .
US20070016411A1
CLAIM 21
. An encoding apparatus , comprising : a first quantizing/encoding unit to quantize a first frequency component of a full spectrum of an audio signal and to encode the quantized first frequency component ;
a second quantizing/encoding unit to quantize one or more envelopes of one or more bands of a second frequency component of the full spectrum and to encode the quantized one or more envelopes ;
a codebook unit to generate one or more codebooks (sound activity, sound activity detection) from one or more bands of the first frequency component , to determine whether a similar codebook exists for each of the bands of the second frequency component , and to encode codebook similarity information to indicate similarities between the bands of the second frequency components and the codebooks ;
and a bit stream unit to generate a bitstream including the encoded first frequency component , the encoded envelopes of the bands of the second frequency components , and the encoded similarity information .

US8990073B2
CLAIM 10
. A method for detecting sound activity (more codebooks) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US20070016411A1
CLAIM 21
. An encoding apparatus , comprising : a first quantizing/encoding unit to quantize a first frequency component of a full spectrum of an audio signal and to encode the quantized first frequency component ;
a second quantizing/encoding unit to quantize one or more envelopes of one or more bands of a second frequency component of the full spectrum and to encode the quantized one or more envelopes ;
a codebook unit to generate one or more codebooks (sound activity, sound activity detection) from one or more bands of the first frequency component , to determine whether a similar codebook exists for each of the bands of the second frequency component , and to encode codebook similarity information to indicate similarities between the bands of the second frequency components and the codebooks ;
and a bit stream unit to generate a bitstream including the encoded first frequency component , the encoded envelopes of the bands of the second frequency components , and the encoded similarity information .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity (more codebooks) in the sound signal further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
US20070016411A1
CLAIM 21
. An encoding apparatus , comprising : a first quantizing/encoding unit to quantize a first frequency component of a full spectrum of an audio signal and to encode the quantized first frequency component ;
a second quantizing/encoding unit to quantize one or more envelopes of one or more bands of a second frequency component of the full spectrum and to encode the quantized one or more envelopes ;
a codebook unit to generate one or more codebooks (sound activity, sound activity detection) from one or more bands of the first frequency component , to determine whether a similar codebook exists for each of the bands of the second frequency component , and to encode codebook similarity information to indicate similarities between the bands of the second frequency components and the codebooks ;
and a bit stream unit to generate a bitstream including the encoded first frequency component , the encoded envelopes of the bands of the second frequency components , and the encoded similarity information .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (more codebooks) detection comprises detecting the sound signal based on a frequency dependent signal-to-noise ratio (SNR) .
US20070016411A1
CLAIM 21
. An encoding apparatus , comprising : a first quantizing/encoding unit to quantize a first frequency component of a full spectrum of an audio signal and to encode the quantized first frequency component ;
a second quantizing/encoding unit to quantize one or more envelopes of one or more bands of a second frequency component of the full spectrum and to encode the quantized one or more envelopes ;
a codebook unit to generate one or more codebooks (sound activity, sound activity detection) from one or more bands of the first frequency component , to determine whether a similar codebook exists for each of the bands of the second frequency component , and to encode codebook similarity information to indicate similarities between the bands of the second frequency components and the codebooks ;
and a bit stream unit to generate a bitstream including the encoded first frequency component , the encoded envelopes of the bands of the second frequency components , and the encoded similarity information .

US8990073B2
CLAIM 14
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (more codebooks) detection comprises comparing an average signal-to-noise ratio (SNR av ) to a threshold calculated as a function of a long-term signal-to-noise ratio (SNR LT ) .
US20070016411A1
CLAIM 21
. An encoding apparatus , comprising : a first quantizing/encoding unit to quantize a first frequency component of a full spectrum of an audio signal and to encode the quantized first frequency component ;
a second quantizing/encoding unit to quantize one or more envelopes of one or more bands of a second frequency component of the full spectrum and to encode the quantized one or more envelopes ;
a codebook unit to generate one or more codebooks (sound activity, sound activity detection) from one or more bands of the first frequency component , to determine whether a similar codebook exists for each of the bands of the second frequency component , and to encode codebook similarity information to indicate similarities between the bands of the second frequency components and the codebooks ;
and a bit stream unit to generate a bitstream including the encoded first frequency component , the encoded envelopes of the bands of the second frequency components , and the encoded similarity information .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity (more codebooks) detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
US20070016411A1
CLAIM 21
. An encoding apparatus , comprising : a first quantizing/encoding unit to quantize a first frequency component of a full spectrum of an audio signal and to encode the quantized first frequency component ;
a second quantizing/encoding unit to quantize one or more envelopes of one or more bands of a second frequency component of the full spectrum and to encode the quantized one or more envelopes ;
a codebook unit to generate one or more codebooks (sound activity, sound activity detection) from one or more bands of the first frequency component , to determine whether a similar codebook exists for each of the bands of the second frequency component , and to encode codebook similarity information to indicate similarities between the bands of the second frequency components and the codebooks ;
and a bit stream unit to generate a bitstream including the encoded first frequency component , the encoded envelopes of the bands of the second frequency components , and the encoded similarity information .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity (more codebooks) detection further comprises updating the noise estimates for a next frame .
US20070016411A1
CLAIM 21
. An encoding apparatus , comprising : a first quantizing/encoding unit to quantize a first frequency component of a full spectrum of an audio signal and to encode the quantized first frequency component ;
a second quantizing/encoding unit to quantize one or more envelopes of one or more bands of a second frequency component of the full spectrum and to encode the quantized one or more envelopes ;
a codebook unit to generate one or more codebooks (sound activity, sound activity detection) from one or more bands of the first frequency component , to determine whether a similar codebook exists for each of the bands of the second frequency component , and to encode codebook similarity information to indicate similarities between the bands of the second frequency components and the codebooks ;
and a bit stream unit to generate a bitstream including the encoded first frequency component , the encoded envelopes of the bands of the second frequency components , and the encoded similarity information .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame and an energy of the sound signal in a previous frame , for frequency bands (high frequency component, high frequency band, more bands, specific band) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US20070016411A1
CLAIM 1
. A method of encoding a low bit-rate audio signal , the method comprising : quantizing and losslessly-encoding a specific frequency component of an audio signal in a frequency domain ;
generating codebooks using the audio signal in the frequency domain ;
detecting an envelope of a frequency component of the audio signal other than the specific frequency component in a specific band (frequency bins, frequency bin, frequency bands, first frequency bands) unit and quantizing and losslessly-encoding the envelope ;
selecting a codebook that is most similar to the other frequency component to be encoded from the codebooks and determining a codebook index (fine structure) ;
losslessly-encoding the determined codebook index ;
and generating a bit stream using losslessly-encoded data generated in the lossless-encoding of the specific frequency component , the envelope , and the determined codebook index .

US20070016411A1
CLAIM 4
. A method of encoding a low bit-rate audio signal , the method comprising : quantizing and losslessly-encoding a significant frequency component of an audio signal in a frequency domain ;
generating codebooks using the audio signal in the frequency domain ;
detecting an envelope of a frequency component of the audio signal other than the significant frequency component in a specific band unit and quantizing and losslessly-encoding the detected envelope of the other frequency component ;
checking whether a codebook having at least a predetermined similarity exists among the generated codebooks with respect to a high frequency band (frequency bins, frequency bin, frequency bands, first frequency bands) to be encoded ;
if the similar codebook exists , selecting the similar codebook , determining a codebook index , and losslessly-encoding the determined codebook index and information indicating that the similar codebook exists ;
if a similar codebook does not exist , losslessly-encoding information indicating that a similar codebook does not exist ;
and generating a bit stream using losslessly-encoded data generated in the lossless encoding of the significant frequency component , the envelope of the other frequency component , the determined codebook index , and the information indicating that the similar codebook does not exist .

US20070016411A1
CLAIM 21
. An encoding apparatus , comprising : a first quantizing/encoding unit to quantize a first frequency component of a full spectrum of an audio signal and to encode the quantized first frequency component ;
a second quantizing/encoding unit to quantize one or more envelopes of one or more bands (frequency bins, frequency bin, frequency bands, first frequency bands) of a second frequency component of the full spectrum and to encode the quantized one or more envelopes ;
a codebook unit to generate one or more codebooks from one or more bands of the first frequency component , to determine whether a similar codebook exists for each of the bands of the second frequency component , and to encode codebook similarity information to indicate similarities between the bands of the second frequency components and the codebooks ;
and a bit stream unit to generate a bitstream including the encoded first frequency component , the encoded envelopes of the bands of the second frequency components , and the encoded similarity information .

US20070016411A1
CLAIM 26
. A method of decoding a low bit-rate audio signal , the method comprising : restoring and dividing a bit stream into a significant frequency component and a frequency component other than the significant frequency component ;
losslessly-decoding and inversely quantizing the significant frequency component ;
losslessly-decoding information as to whether a similar codebook exists ;
if a similar codebook exists , restoring codebook index information and envelope information about the other frequency component ;
generating codebooks using the significant frequency component which is lossless-decoded and inversely quantized and restoring a high frequency component (frequency bins, frequency bin, frequency bands, first frequency bands) using the restored codebook index information and the restored envelope information about the other frequency component ;
and if a similar codebook does not exist , restoring the envelope information and restoring the other frequency component using a signal of a previous band and the restored envelope information .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (high frequency component, high frequency band, more bands, specific band) into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20070016411A1
CLAIM 1
. A method of encoding a low bit-rate audio signal , the method comprising : quantizing and losslessly-encoding a specific frequency component of an audio signal in a frequency domain ;
generating codebooks using the audio signal in the frequency domain ;
detecting an envelope of a frequency component of the audio signal other than the specific frequency component in a specific band (frequency bins, frequency bin, frequency bands, first frequency bands) unit and quantizing and losslessly-encoding the envelope ;
selecting a codebook that is most similar to the other frequency component to be encoded from the codebooks and determining a codebook index (fine structure) ;
losslessly-encoding the determined codebook index ;
and generating a bit stream using losslessly-encoded data generated in the lossless-encoding of the specific frequency component , the envelope , and the determined codebook index .

US20070016411A1
CLAIM 4
. A method of encoding a low bit-rate audio signal , the method comprising : quantizing and losslessly-encoding a significant frequency component of an audio signal in a frequency domain ;
generating codebooks using the audio signal in the frequency domain ;
detecting an envelope of a frequency component of the audio signal other than the significant frequency component in a specific band unit and quantizing and losslessly-encoding the detected envelope of the other frequency component ;
checking whether a codebook having at least a predetermined similarity exists among the generated codebooks with respect to a high frequency band (frequency bins, frequency bin, frequency bands, first frequency bands) to be encoded ;
if the similar codebook exists , selecting the similar codebook , determining a codebook index , and losslessly-encoding the determined codebook index and information indicating that the similar codebook exists ;
if a similar codebook does not exist , losslessly-encoding information indicating that a similar codebook does not exist ;
and generating a bit stream using losslessly-encoded data generated in the lossless encoding of the significant frequency component , the envelope of the other frequency component , the determined codebook index , and the information indicating that the similar codebook does not exist .

US20070016411A1
CLAIM 21
. An encoding apparatus , comprising : a first quantizing/encoding unit to quantize a first frequency (first frequency) component of a full spectrum of an audio signal and to encode the quantized first frequency component ;
a second quantizing/encoding unit to quantize one or more envelopes of one or more bands (frequency bins, frequency bin, frequency bands, first frequency bands) of a second frequency component of the full spectrum and to encode the quantized one or more envelopes ;
a codebook unit to generate one or more codebooks from one or more bands of the first frequency component , to determine whether a similar codebook exists for each of the bands of the second frequency component , and to encode codebook similarity information to indicate similarities between the bands of the second frequency components and the codebooks ;
and a bit stream unit to generate a bitstream including the encoded first frequency component , the encoded envelopes of the bands of the second frequency components , and the encoded similarity information .

US20070016411A1
CLAIM 26
. A method of decoding a low bit-rate audio signal , the method comprising : restoring and dividing a bit stream into a significant frequency component and a frequency component other than the significant frequency component ;
losslessly-decoding and inversely quantizing the significant frequency component ;
losslessly-decoding information as to whether a similar codebook exists ;
if a similar codebook exists , restoring codebook index information and envelope information about the other frequency component ;
generating codebooks using the significant frequency component which is lossless-decoded and inversely quantized and restoring a high frequency component (frequency bins, frequency bin, frequency bands, first frequency bands) using the restored codebook index information and the restored envelope information about the other frequency component ;
and if a similar codebook does not exist , restoring the envelope information and restoring the other frequency component using a signal of a previous band and the restored envelope information .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin (high frequency component, high frequency band, more bands, specific band) by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (high frequency component, high frequency band, more bands, specific band) so as to produce a summed long-term correlation map .
US20070016411A1
CLAIM 1
. A method of encoding a low bit-rate audio signal , the method comprising : quantizing and losslessly-encoding a specific frequency component of an audio signal in a frequency domain ;
generating codebooks using the audio signal in the frequency domain ;
detecting an envelope of a frequency component of the audio signal other than the specific frequency component in a specific band (frequency bins, frequency bin, frequency bands, first frequency bands) unit and quantizing and losslessly-encoding the envelope ;
selecting a codebook that is most similar to the other frequency component to be encoded from the codebooks and determining a codebook index (fine structure) ;
losslessly-encoding the determined codebook index ;
and generating a bit stream using losslessly-encoded data generated in the lossless-encoding of the specific frequency component , the envelope , and the determined codebook index .

US20070016411A1
CLAIM 4
. A method of encoding a low bit-rate audio signal , the method comprising : quantizing and losslessly-encoding a significant frequency component of an audio signal in a frequency domain ;
generating codebooks using the audio signal in the frequency domain ;
detecting an envelope of a frequency component of the audio signal other than the significant frequency component in a specific band unit and quantizing and losslessly-encoding the detected envelope of the other frequency component ;
checking whether a codebook having at least a predetermined similarity exists among the generated codebooks with respect to a high frequency band (frequency bins, frequency bin, frequency bands, first frequency bands) to be encoded ;
if the similar codebook exists , selecting the similar codebook , determining a codebook index , and losslessly-encoding the determined codebook index and information indicating that the similar codebook exists ;
if a similar codebook does not exist , losslessly-encoding information indicating that a similar codebook does not exist ;
and generating a bit stream using losslessly-encoded data generated in the lossless encoding of the significant frequency component , the envelope of the other frequency component , the determined codebook index , and the information indicating that the similar codebook does not exist .

US20070016411A1
CLAIM 21
. An encoding apparatus , comprising : a first quantizing/encoding unit to quantize a first frequency component of a full spectrum of an audio signal and to encode the quantized first frequency component ;
a second quantizing/encoding unit to quantize one or more envelopes of one or more bands (frequency bins, frequency bin, frequency bands, first frequency bands) of a second frequency component of the full spectrum and to encode the quantized one or more envelopes ;
a codebook unit to generate one or more codebooks from one or more bands of the first frequency component , to determine whether a similar codebook exists for each of the bands of the second frequency component , and to encode codebook similarity information to indicate similarities between the bands of the second frequency components and the codebooks ;
and a bit stream unit to generate a bitstream including the encoded first frequency component , the encoded envelopes of the bands of the second frequency components , and the encoded similarity information .

US20070016411A1
CLAIM 26
. A method of decoding a low bit-rate audio signal , the method comprising : restoring and dividing a bit stream into a significant frequency component and a frequency component other than the significant frequency component ;
losslessly-decoding and inversely quantizing the significant frequency component ;
losslessly-decoding information as to whether a similar codebook exists ;
if a similar codebook exists , restoring codebook index information and envelope information about the other frequency component ;
generating codebooks using the significant frequency component which is lossless-decoded and inversely quantized and restoring a high frequency component (frequency bins, frequency bin, frequency bands, first frequency bands) using the restored codebook index information and the restored envelope information about the other frequency component ;
and if a similar codebook does not exist , restoring the envelope information and restoring the other frequency component using a signal of a previous band and the restored envelope information .

US8990073B2
CLAIM 35
. A device for detecting sound activity (more codebooks) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US20070016411A1
CLAIM 21
. An encoding apparatus , comprising : a first quantizing/encoding unit to quantize a first frequency component of a full spectrum of an audio signal and to encode the quantized first frequency component ;
a second quantizing/encoding unit to quantize one or more envelopes of one or more bands of a second frequency component of the full spectrum and to encode the quantized one or more envelopes ;
a codebook unit to generate one or more codebooks (sound activity, sound activity detection) from one or more bands of the first frequency component , to determine whether a similar codebook exists for each of the bands of the second frequency component , and to encode codebook similarity information to indicate similarities between the bands of the second frequency components and the codebooks ;
and a bit stream unit to generate a bitstream including the encoded first frequency component , the encoded envelopes of the bands of the second frequency components , and the encoded similarity information .

US8990073B2
CLAIM 36
. A device for detecting sound activity (more codebooks) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US20070016411A1
CLAIM 21
. An encoding apparatus , comprising : a first quantizing/encoding unit to quantize a first frequency component of a full spectrum of an audio signal and to encode the quantized first frequency component ;
a second quantizing/encoding unit to quantize one or more envelopes of one or more bands of a second frequency component of the full spectrum and to encode the quantized one or more envelopes ;
a codebook unit to generate one or more codebooks (sound activity, sound activity detection) from one or more bands of the first frequency component , to determine whether a similar codebook exists for each of the bands of the second frequency component , and to encode codebook similarity information to indicate similarities between the bands of the second frequency components and the codebooks ;
and a bit stream unit to generate a bitstream including the encoded first frequency component , the encoded envelopes of the bands of the second frequency components , and the encoded similarity information .

US8990073B2
CLAIM 37
. A device as defined in claim 36 , further comprising a signal-to-noise ratio (SNR)-based sound activity (more codebooks) detector .
US20070016411A1
CLAIM 21
. An encoding apparatus , comprising : a first quantizing/encoding unit to quantize a first frequency component of a full spectrum of an audio signal and to encode the quantized first frequency component ;
a second quantizing/encoding unit to quantize one or more envelopes of one or more bands of a second frequency component of the full spectrum and to encode the quantized one or more envelopes ;
a codebook unit to generate one or more codebooks (sound activity, sound activity detection) from one or more bands of the first frequency component , to determine whether a similar codebook exists for each of the bands of the second frequency component , and to encode codebook similarity information to indicate similarities between the bands of the second frequency components and the codebooks ;
and a bit stream unit to generate a bitstream including the encoded first frequency component , the encoded envelopes of the bands of the second frequency components , and the encoded similarity information .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity (more codebooks) detector comprises a comparator of an average signal to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US20070016411A1
CLAIM 21
. An encoding apparatus , comprising : a first quantizing/encoding unit to quantize a first frequency component of a full spectrum of an audio signal and to encode the quantized first frequency component ;
a second quantizing/encoding unit to quantize one or more envelopes of one or more bands of a second frequency component of the full spectrum and to encode the quantized one or more envelopes ;
a codebook unit to generate one or more codebooks (sound activity, sound activity detection) from one or more bands of the first frequency component , to determine whether a similar codebook exists for each of the bands of the second frequency component , and to encode codebook similarity information to indicate similarities between the bands of the second frequency components and the codebooks ;
and a bit stream unit to generate a bitstream including the encoded first frequency component , the encoded envelopes of the bands of the second frequency components , and the encoded similarity information .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity (more codebooks) detector .
US20070016411A1
CLAIM 21
. An encoding apparatus , comprising : a first quantizing/encoding unit to quantize a first frequency component of a full spectrum of an audio signal and to encode the quantized first frequency component ;
a second quantizing/encoding unit to quantize one or more envelopes of one or more bands of a second frequency component of the full spectrum and to encode the quantized one or more envelopes ;
a codebook unit to generate one or more codebooks (sound activity, sound activity detection) from one or more bands of the first frequency component , to determine whether a similar codebook exists for each of the bands of the second frequency component , and to encode codebook similarity information to indicate similarities between the bands of the second frequency components and the codebooks ;
and a bit stream unit to generate a bitstream including the encoded first frequency component , the encoded envelopes of the bands of the second frequency components , and the encoded similarity information .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
WO2007001068A1

Filed: 2006-06-27     Issued: 2007-01-04

Sound classification system and method capable of adding and correcting a sound type

(Original Assignee) Matsushita Electric Industrial Co., Ltd.     

Chia-Shin Yen, Che-Ming Lin, Koichiro Mizushima
US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (frequency bins) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
WO2007001068A1
CLAIM 5
. The sound classification system according to Claim 1 , wherein said feature extractor analyzes frequency bins (frequency bins) of said sound signal spectrum to serve as the feature of said sound signal .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (frequency bins) so as to produce a summed long-term correlation map .
WO2007001068A1
CLAIM 5
. The sound classification system according to Claim 1 , wherein said feature extractor analyzes frequency bins (frequency bins) of said sound signal spectrum to serve as the feature of said sound signal .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map for frequency bins (frequency bins) having a magnitude that exceeds a given fixed threshold .
WO2007001068A1
CLAIM 5
. The sound classification system according to Claim 1 , wherein said feature extractor analyzes frequency bins (frequency bins) of said sound signal spectrum to serve as the feature of said sound signal .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order (following steps) and a sixteenth order of linear prediction (classification results) residual error energies .
WO2007001068A1
CLAIM 12
. The sound classification system according to Claim 11 , wherein said selection area includes a record window and a scroll-and-select key , said record window being capable of displaying a plurality of entries of sound classification results (linear prediction, activity prediction parameter) for selection by the user so as to correct the type of a sound or to add a new type for the sound , said record window also displaying an icon representing the type of the selected sound , said scroll-and-select key being capable of controlling said record window to display the classification result of the sound of the type that is to be corrected or added .

WO2007001068A1
CLAIM 20
. A method for correcting a sound type , said method being adapted for use in a sound classification system , said sound classification system including a first database for storing the statistical values of the features of a plurality of sounds , a classifier , a second database , a feature database for storing features of a plurality of sample sounds that have been accurately classified , an add/correct command processor , a type adding/correcting device , and a precision calculator , said method comprising the following steps (second order) of : (A) instructing the add/correct command processor to receive a command to correct a sound type ;
(B)storing the statistical values of the features of the sound types in the first database into the second database to make a backup copy of data in the first database ;
(C)instructing the type adding/correcting device to add the feature of a sound requiring type correction to a type in the first database which was selected by the user , and to re-calculate the statistical values of the features of the sounds of the selected type in the first database ;
(D)instructing the classifier to retrieve the features of all the sample sounds in the feature database and to re-determine the classifications of the features of the sample sounds according to the statistical values of the features of each of the sound types in the first database , and instructing the precision calculator to calculate a ratio of accurate classification of the features of the sample sounds by the classifier ;
and (E)instructing the type adding/correcting device to store the feature of the sound of the type to be corrected in the feature database if the ratio of accurate classification of the features of the sample sounds by the classifier is greater than a threshold value , and otherwise , instructing the second database to store the backup copy of data back into the first database .

WO2007001068A1
CLAIM 21
. The method for correcting a sound type according to Claim 20 , wherein the statistical values of each sound type include a mean and a variance of all the features of sound signals (linear prediction residual error energies) in the respective type .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (input window) in order to distinguish a music signal from a background noise signal and prevent update (input window) of noise energy estimates on the music signal .
WO2007001068A1
CLAIM 14
. The sound classification system according to Claim 12 or Claim 13 , wherein said add/correct command processor is capable of displaying an add sound type operation interface upon receipt of a command to add a sound type , said add sound type operation interface including a type name input window (noise character parameter, prevent update) and an add type prompt window , said type name input window including a type name input field , an add type confirm key , and an add type cancel key , said type name input field being capable of receiving the name of a sound type to be added , said add/correct command processor displaying important messages via said add type prompt window , said add/correct command processor inspecting whether the name of the sound type entered into said type name input field already exists and , in the affirmative , displaying that the name of the sound type already exists via said add type prompt window , said add type confirm key being capable of receiving a command to confirm the command to add a sound type , said add type cancel button being capable of canceling the command to add a sound type .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame and an energy of the sound signal in a previous frame , for frequency bands (splay area) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
WO2007001068A1
CLAIM 15
. The sound classification system according to Claim 12 or Claim 13 , wherein said add/correct command processor is capable of displaying a correct sound type operation interface upon receipt of a command to correct a sound type , said correct sound type operation interface including an existing sound type window and a correct type prompt window , said existing sound type window including an existing sound type display area (frequency bands) , a correct type confirm key , and a correct type cancel key , said existing sound type display area being capable of displaying all the existing sound types which are available for selection by the user so as to replace the sound type of the selected sound classification result in the selection area of said add/correct sound type operation interface , said add/correct command processor displaying important messages via said correct type prompt window , said correct type confirm key being capable of receiving a command to confirm a command to correct the sound type , said correct type cancel button being capable of receiving a command to cancel the command to correct the sound type .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (classification results) indicative of an activity of the sound signal .
WO2007001068A1
CLAIM 12
. The sound classification system according to Claim 11 , wherein said selection area includes a record window and a scroll-and-select key , said record window being capable of displaying a plurality of entries of sound classification results (linear prediction, activity prediction parameter) for selection by the user so as to correct the type of a sound or to add a new type for the sound , said record window also displaying an icon representing the type of the selected sound , said scroll-and-select key being capable of controlling said record window to display the classification result of the sound of the type that is to be corrected or added .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (classification results) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
WO2007001068A1
CLAIM 12
. The sound classification system according to Claim 11 , wherein said selection area includes a record window and a scroll-and-select key , said record window being capable of displaying a plurality of entries of sound classification results (linear prediction, activity prediction parameter) for selection by the user so as to correct the type of a sound or to add a new type for the sound , said record window also displaying an icon representing the type of the selected sound , said scroll-and-select key being capable of controlling said record window to display the classification result of the sound of the type that is to be corrected or added .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (classification results) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
WO2007001068A1
CLAIM 12
. The sound classification system according to Claim 11 , wherein said selection area includes a record window and a scroll-and-select key , said record window being capable of displaying a plurality of entries of sound classification results (linear prediction, activity prediction parameter) for selection by the user so as to correct the type of a sound or to add a new type for the sound , said record window also displaying an icon representing the type of the selected sound , said scroll-and-select key being capable of controlling said record window to display the classification result of the sound of the type that is to be corrected or added .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (input window) comprises : dividing a plurality of frequency bands (splay area) into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
WO2007001068A1
CLAIM 14
. The sound classification system according to Claim 12 or Claim 13 , wherein said add/correct command processor is capable of displaying an add sound type operation interface upon receipt of a command to add a sound type , said add sound type operation interface including a type name input window (noise character parameter, prevent update) and an add type prompt window , said type name input window including a type name input field , an add type confirm key , and an add type cancel key , said type name input field being capable of receiving the name of a sound type to be added , said add/correct command processor displaying important messages via said add type prompt window , said add/correct command processor inspecting whether the name of the sound type entered into said type name input field already exists and , in the affirmative , displaying that the name of the sound type already exists via said add type prompt window , said add type confirm key being capable of receiving a command to confirm the command to add a sound type , said add type cancel button being capable of canceling the command to add a sound type .

WO2007001068A1
CLAIM 15
. The sound classification system according to Claim 12 or Claim 13 , wherein said add/correct command processor is capable of displaying a correct sound type operation interface upon receipt of a command to correct a sound type , said correct sound type operation interface including an existing sound type window and a correct type prompt window , said existing sound type window including an existing sound type display area (frequency bands) , a correct type confirm key , and a correct type cancel key , said existing sound type display area being capable of displaying all the existing sound types which are available for selection by the user so as to replace the sound type of the selected sound classification result in the selection area of said add/correct sound type operation interface , said add/correct command processor displaying important messages via said correct type prompt window , said correct type confirm key being capable of receiving a command to confirm a command to correct the sound type , said correct type cancel button being capable of receiving a command to cancel the command to correct the sound type .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (input window) inferior than a given fixed threshold .
WO2007001068A1
CLAIM 14
. The sound classification system according to Claim 12 or Claim 13 , wherein said add/correct command processor is capable of displaying an add sound type operation interface upon receipt of a command to add a sound type , said add sound type operation interface including a type name input window (noise character parameter, prevent update) and an add type prompt window , said type name input window including a type name input field , an add type confirm key , and an add type cancel key , said type name input field being capable of receiving the name of a sound type to be added , said add/correct command processor displaying important messages via said add type prompt window , said add/correct command processor inspecting whether the name of the sound type entered into said type name input field already exists and , in the affirmative , displaying that the name of the sound type already exists via said add type prompt window , said add type confirm key being capable of receiving a command to confirm the command to add a sound type , said add type cancel button being capable of canceling the command to add a sound type .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (frequency bins) so as to produce a summed long-term correlation map .
WO2007001068A1
CLAIM 5
. The sound classification system according to Claim 1 , wherein said feature extractor analyzes frequency bins (frequency bins) of said sound signal spectrum to serve as the feature of said sound signal .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
JP2007011341A

Filed: 2006-06-22     Issued: 2007-01-18

高調波信号の周波数拡張

(Original Assignee) Harman Becker Automotive Systems-Wavemakers Inc; ハーマン ベッカー オートモーティブ システムズ−ウェーブメーカーズ, インコーポレイテッド     

David Giesbrecht, Phillip Hetherington, Xueman Li, リー シェマン, ジェスブレシュト デイビッド, ヘザーリントン フィリップ
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (高調波周波数) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
JP2007011341A
CLAIM 8
前記周波数領域における前記帯域制限高調波信号の前記複素スペクトルに非線形変換を実行するステップにおいて、該非線形変換は、高調波エネルギーが該帯域制限高調波信号の周波数上限より高い少なくとも1つの高調波周波数 (frequency spectrum, frequency dependent signal) に追加されるように、選択される、請求項1に記載の方法。

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (高調波周波数) of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
JP2007011341A
CLAIM 8
前記周波数領域における前記帯域制限高調波信号の前記複素スペクトルに非線形変換を実行するステップにおいて、該非線形変換は、高調波エネルギーが該帯域制限高調波信号の周波数上限より高い少なくとも1つの高調波周波数 (frequency spectrum, frequency dependent signal) に追加されるように、選択される、請求項1に記載の方法。

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (...) ;

wherein the tonal stability tonal stability estimation (モジュール) is performed according to claim 1 .
JP2007011341A
CLAIM 4
前記線形たたみ込みが、 Y(k)=X(k)*X(k) k=0... (background noise signal) N/2 の式に従って実行され、 ここで、*は、線形たたみ込み演算を示し、kは、周波数インデックスであり、Nは、前記帯域制限高調波信号を前記時間領域から前記周波数領域に変換するために使用される前記FFTの長さである、請求項3に記載の方法。

JP2007011341A
CLAIM 21
帯域制限高調波信号の高調波を拡張するためのシステムであって、該システムは、 帯域制限高調波信号を受信するための手段と、 該帯域制限高調波信号の複素スペクトルを生成するために、受信された帯域制限高調波信号を、時間領域から周波数領域に変換するための順変換モジュール (tonal sound, tonal stability tonal stability estimation, tonal sound signal, tonal stability tonal stability parameter estimation) を有する信号プロセッサと、 該帯域制限高調波信号の高調波的に拡張されたスペクトルを生成するために、該周波数領域における該帯域制限信号の該複素スペクトルの非線形変換を実行するための高調波生成モジュールと、 該帯域制限高調波信号の高調波的に拡張された信号を該時間領域に変換し直すための逆変換モジュールと を備える、システム。

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound (モジュール) signal is detected .
JP2007011341A
CLAIM 21
帯域制限高調波信号の高調波を拡張するためのシステムであって、該システムは、 帯域制限高調波信号を受信するための手段と、 該帯域制限高調波信号の複素スペクトルを生成するために、受信された帯域制限高調波信号を、時間領域から周波数領域に変換するための順変換モジュール (tonal sound, tonal stability tonal stability estimation, tonal sound signal, tonal stability tonal stability parameter estimation) を有する信号プロセッサと、 該帯域制限高調波信号の高調波的に拡張されたスペクトルを生成するために、該周波数領域における該帯域制限信号の該複素スペクトルの非線形変換を実行するための高調波生成モジュールと、 該帯域制限高調波信号の高調波的に拡張された信号を該時間領域に変換し直すための逆変換モジュールと を備える、システム。

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal (...) and prevent update of noise energy estimates on the music signal .
JP2007011341A
CLAIM 4
前記線形たたみ込みが、 Y(k)=X(k)*X(k) k=0... (background noise signal) N/2 の式に従って実行され、 ここで、*は、線形たたみ込み演算を示し、kは、周波数インデックスであり、Nは、前記帯域制限高調波信号を前記時間領域から前記周波数領域に変換するために使用される前記FFTの長さである、請求項3に記載の方法。

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (高調波周波数) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
JP2007011341A
CLAIM 8
前記周波数領域における前記帯域制限高調波信号の前記複素スペクトルに非線形変換を実行するステップにおいて、該非線形変換は、高調波エネルギーが該帯域制限高調波信号の周波数上限より高い少なくとも1つの高調波周波数 (frequency spectrum, frequency dependent signal) に追加されるように、選択される、請求項1に記載の方法。

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (高調波周波数) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
JP2007011341A
CLAIM 8
前記周波数領域における前記帯域制限高調波信号の前記複素スペクトルに非線形変換を実行するステップにおいて、該非線形変換は、高調波エネルギーが該帯域制限高調波信号の周波数上限より高い少なくとも1つの高調波周波数 (frequency spectrum, frequency dependent signal) に追加されるように、選択される、請求項1に記載の方法。

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (高調波周波数) of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
JP2007011341A
CLAIM 8
前記周波数領域における前記帯域制限高調波信号の前記複素スペクトルに非線形変換を実行するステップにおいて、該非線形変換は、高調波エネルギーが該帯域制限高調波信号の周波数上限より高い少なくとも1つの高調波周波数 (frequency spectrum, frequency dependent signal) に追加されるように、選択される、請求項1に記載の方法。

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (...) ;

wherein the tonal stability tonal stability parameter estimation (モジュール) means comprises a device according to claim 30 .
JP2007011341A
CLAIM 4
前記線形たたみ込みが、 Y(k)=X(k)*X(k) k=0... (background noise signal) N/2 の式に従って実行され、 ここで、*は、線形たたみ込み演算を示し、kは、周波数インデックスであり、Nは、前記帯域制限高調波信号を前記時間領域から前記周波数領域に変換するために使用される前記FFTの長さである、請求項3に記載の方法。

JP2007011341A
CLAIM 21
帯域制限高調波信号の高調波を拡張するためのシステムであって、該システムは、 帯域制限高調波信号を受信するための手段と、 該帯域制限高調波信号の複素スペクトルを生成するために、受信された帯域制限高調波信号を、時間領域から周波数領域に変換するための順変換モジュール (tonal sound, tonal stability tonal stability estimation, tonal sound signal, tonal stability tonal stability parameter estimation) を有する信号プロセッサと、 該帯域制限高調波信号の高調波的に拡張されたスペクトルを生成するために、該周波数領域における該帯域制限信号の該複素スペクトルの非線形変換を実行するための高調波生成モジュールと、 該帯域制限高調波信号の高調波的に拡張された信号を該時間領域に変換し直すための逆変換モジュールと を備える、システム。

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal (...) ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
JP2007011341A
CLAIM 4
前記線形たたみ込みが、 Y(k)=X(k)*X(k) k=0... (background noise signal) N/2 の式に従って実行され、 ここで、*は、線形たたみ込み演算を示し、kは、周波数インデックスであり、Nは、前記帯域制限高調波信号を前記時間領域から前記周波数領域に変換するために使用される前記FFTの長さである、請求項3に記載の方法。

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal (...) and preventing update of noise energy estimates .
JP2007011341A
CLAIM 4
前記線形たたみ込みが、 Y(k)=X(k)*X(k) k=0... (background noise signal) N/2 の式に従って実行され、 ここで、*は、線形たたみ込み演算を示し、kは、周波数インデックスであり、Nは、前記帯域制限高調波信号を前記時間領域から前記周波数領域に変換するために使用される前記FFTの長さである、請求項3に記載の方法。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20060251178A1

Filed: 2006-04-07     Issued: 2006-11-09

Encoder apparatus and decoder apparatus

(Original Assignee) Panasonic Corp     (Current Assignee) Panasonic Intellectual Property Corp

Masahiro Oshikiri
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (filter characteristic) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map (decoding method) .
US20060251178A1
CLAIM 32
. The decoding apparatus according to claim 30 , wherein said decoding section comprises an estimation section that includes said spectrum of the low frequency band as an internal state and estimates said spectrum of the high frequency band using a filter having said parameter as a filter characteristic (frequency spectrum) , a filter function of said filter is expressed by the following equation , and said estimation section performs said estimation using a zero input response of said filter . P ⁡ (z) = 1 1 - ∑ i = - M M ⁢   ⁢ β i ⁢ z - T + i where , P(z) : Filter function z : z conversion variable 2M+1 : Order of filter β : Weighting factor T : Pitch coefficient

US20060251178A1
CLAIM 37
. A decoding method (second group, term correlation map, second energy values) comprising the steps of : acquiring a spectrum having a low frequency band out of a spectrum having a low frequency band and high-frequency band ;
acquiring a parameter indicating a degree of similarity between said spectrum of the low frequency band and said spectrum of the high frequency band ;
and decoding said spectrum of the low frequency band and said spectrum of the high frequency band using said spectrum of the low frequency band and said parameter .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (filter characteristic) of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US20060251178A1
CLAIM 32
. The decoding apparatus according to claim 30 , wherein said decoding section comprises an estimation section that includes said spectrum of the low frequency band as an internal state and estimates said spectrum of the high frequency band using a filter having said parameter as a filter characteristic (frequency spectrum) , a filter function of said filter is expressed by the following equation , and said estimation section performs said estimation using a zero input response of said filter . P ⁡ (z) = 1 1 - ∑ i = - M M ⁢   ⁢ β i ⁢ z - T + i where , P(z) : Filter function z : z conversion variable 2M+1 : Order of filter β : Weighting factor T : Pitch coefficient

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame and an energy of the sound signal in a previous frame , for frequency bands (band signal, second band, first band) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US20060251178A1
CLAIM 16
. A spectrum coding apparatus that encodes a spectrum including a first band (frequency bands, first frequency bands) and a second band (frequency bands, first frequency bands) , comprising : an acquisition section that acquires information about a spectrum similar to a spectrum of said second band from a spectrum of said first band based on a harmonic structure ;
and a coding section that encodes said information instead of the spectrum of said second band .

US20060251178A1
CLAIM 28
. A scalable coding apparatus that encodes a voice signal or audio signal separated into a low frequency band and high frequency band , comprising : a first coding section that encodes a low-frequency band signal (frequency bands, first frequency bands) of said voice signal or said audio signal ;
and a second coding section that encodes a high-frequency band signal of said voice signal or said audio signal using said low-frequency band signal , wherein said second coding section comprises : a first spectrum generation section that performs frequency domain conversion on said low-frequency band signal and generates a spectrum of the low frequency band ;
a second spectrum generation section that performs frequency domain conversion on said voice signal or said audio signal and generates a spectrum having the low frequency band and high frequency band ;
and the coding apparatus of claim 19 wherein said acquisition section acquires spectra generated by said first and second spectrum generation sections .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (band signal, second band, first band) into a first group of a certain number of first frequency bands and a second group (decoding method) of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values (decoding method) so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20060251178A1
CLAIM 16
. A spectrum coding apparatus that encodes a spectrum including a first band (frequency bands, first frequency bands) and a second band (frequency bands, first frequency bands) , comprising : an acquisition section that acquires information about a spectrum similar to a spectrum of said second band from a spectrum of said first band based on a harmonic structure ;
and a coding section that encodes said information instead of the spectrum of said second band .

US20060251178A1
CLAIM 28
. A scalable coding apparatus that encodes a voice signal or audio signal separated into a low frequency band and high frequency band , comprising : a first coding section that encodes a low-frequency band signal (frequency bands, first frequency bands) of said voice signal or said audio signal ;
and a second coding section that encodes a high-frequency band signal of said voice signal or said audio signal using said low-frequency band signal , wherein said second coding section comprises : a first spectrum generation section that performs frequency domain conversion on said low-frequency band signal and generates a spectrum of the low frequency band ;
a second spectrum generation section that performs frequency domain conversion on said voice signal or said audio signal and generates a spectrum having the low frequency band and high frequency band ;
and the coding apparatus of claim 19 wherein said acquisition section acquires spectra generated by said first and second spectrum generation sections .

US20060251178A1
CLAIM 37
. A decoding method (second group, term correlation map, second energy values) comprising the steps of : acquiring a spectrum having a low frequency band out of a spectrum having a low frequency band and high-frequency band ;
acquiring a parameter indicating a degree of similarity between said spectrum of the low frequency band and said spectrum of the high frequency band ;
and decoding said spectrum of the low frequency band and said spectrum of the high frequency band using said spectrum of the low frequency band and said parameter .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (filter characteristic) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20060251178A1
CLAIM 32
. The decoding apparatus according to claim 30 , wherein said decoding section comprises an estimation section that includes said spectrum of the low frequency band as an internal state and estimates said spectrum of the high frequency band using a filter having said parameter as a filter characteristic (frequency spectrum) , a filter function of said filter is expressed by the following equation , and said estimation section performs said estimation using a zero input response of said filter . P ⁡ (z) = 1 1 - ∑ i = - M M ⁢   ⁢ β i ⁢ z - T + i where , P(z) : Filter function z : z conversion variable 2M+1 : Order of filter β : Weighting factor T : Pitch coefficient

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (filter characteristic) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20060251178A1
CLAIM 32
. The decoding apparatus according to claim 30 , wherein said decoding section comprises an estimation section that includes said spectrum of the low frequency band as an internal state and estimates said spectrum of the high frequency band using a filter having said parameter as a filter characteristic (frequency spectrum) , a filter function of said filter is expressed by the following equation , and said estimation section performs said estimation using a zero input response of said filter . P ⁡ (z) = 1 1 - ∑ i = - M M ⁢   ⁢ β i ⁢ z - T + i where , P(z) : Filter function z : z conversion variable 2M+1 : Order of filter β : Weighting factor T : Pitch coefficient

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (filter characteristic) of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US20060251178A1
CLAIM 32
. The decoding apparatus according to claim 30 , wherein said decoding section comprises an estimation section that includes said spectrum of the low frequency band as an internal state and estimates said spectrum of the high frequency band using a filter having said parameter as a filter characteristic (frequency spectrum) , a filter function of said filter is expressed by the following equation , and said estimation section performs said estimation using a zero input response of said filter . P ⁡ (z) = 1 1 - ∑ i = - M M ⁢   ⁢ β i ⁢ z - T + i where , P(z) : Filter function z : z conversion variable 2M+1 : Order of filter β : Weighting factor T : Pitch coefficient




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20060271356A1

Filed: 2006-04-03     Issued: 2006-11-30

Systems, methods, and apparatus for quantization of spectral envelope representation

(Original Assignee) Qualcomm Inc     (Current Assignee) Qualcomm Inc

Koen Vos
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (speech encoder) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map (corresponding portion) between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US20060271356A1
CLAIM 4
. The method according to claim 1 , said method including calculating the scaled quantization error , said calculating comprising multiplying the quantization error by a scale factor , wherein the scale factor is based on a distance between at least a portion of the first vector and a corresponding portion (correlation map, term correlation map) of the second vector .

US20060271356A1
CLAIM 9
. An apparatus comprising : a speech encoder (frequency spectrum, noise estimator) configured to encode a first frame of a speech signal into at least a first and to encode a second frame of a speech signal into at least a second vector , wherein the first vector represents a spectral envelope of the speech signal during the first frame and the second vector represents a spectral envelope of the speech signal during the second frame , a quantizer configured to quantize a third vector that is based on at least a portion of the first vector to generate a first quantized vector ;
a first adder configured to calculate a quantization error of the first quantized vector ;
and a second adder configured to add a scaled version of the quantization error to at least a portion of the second vector to calculate a fourth vector ;
wherein said quantizer is configured to quantize the fourth vector .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (speech encoder) of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US20060271356A1
CLAIM 9
. An apparatus comprising : a speech encoder (frequency spectrum, noise estimator) configured to encode a first frame of a speech signal into at least a first and to encode a second frame of a speech signal into at least a second vector , wherein the first vector represents a spectral envelope of the speech signal during the first frame and the second vector represents a spectral envelope of the speech signal during the second frame , a quantizer configured to quantize a third vector that is based on at least a portion of the first vector to generate a first quantized vector ;
a first adder configured to calculate a quantization error of the first quantized vector ;
and a second adder configured to add a scaled version of the quantization error to at least a portion of the second vector to calculate a fourth vector ;
wherein said quantizer is configured to quantize the fourth vector .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map (corresponding portion) comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US20060271356A1
CLAIM 4
. The method according to claim 1 , said method including calculating the scaled quantization error , said calculating comprising multiplying the quantization error by a scale factor , wherein the scale factor is based on a distance between at least a portion of the first vector and a corresponding portion (correlation map, term correlation map) of the second vector .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map (corresponding portion) comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis ;

and summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US20060271356A1
CLAIM 4
. The method according to claim 1 , said method including calculating the scaled quantization error , said calculating comprising multiplying the quantization error by a scale factor , wherein the scale factor is based on a distance between at least a portion of the first vector and a corresponding portion (correlation map, term correlation map) of the second vector .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map (corresponding portion) for frequency bins having a magnitude that exceeds a given fixed threshold .
US20060271356A1
CLAIM 4
. The method according to claim 1 , said method including calculating the scaled quantization error , said calculating comprising multiplying the quantization error by a scale factor , wherein the scale factor is based on a distance between at least a portion of the first vector and a corresponding portion (correlation map, term correlation map) of the second vector .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map (corresponding portion) with an adaptive threshold indicative of sound activity in the sound signal .
US20060271356A1
CLAIM 4
. The method according to claim 1 , said method including calculating the scaled quantization error , said calculating comprising multiplying the quantization error by a scale factor , wherein the scale factor is based on a distance between at least a portion of the first vector and a corresponding portion (correlation map, term correlation map) of the second vector .

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (first adder) ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US20060271356A1
CLAIM 9
. An apparatus comprising : a speech encoder configured to encode a first frame of a speech signal into at least a first and to encode a second frame of a speech signal into at least a second vector , wherein the first vector represents a spectral envelope of the speech signal during the first frame and the second vector represents a spectral envelope of the speech signal during the second frame , a quantizer configured to quantize a third vector that is based on at least a portion of the first vector to generate a first quantized vector ;
a first adder (background noise signal) configured to calculate a quantization error of the first quantized vector ;
and a second adder configured to add a scaled version of the quantization error to at least a portion of the second vector to calculate a fourth vector ;
wherein said quantizer is configured to quantize the fourth vector .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction (linear prediction) residual error energies .
US20060271356A1
CLAIM 6
. The method according to claim 1 , wherein each among the first and second vectors includes a representation of a plurality of linear prediction (linear prediction, residual error) filter coefficients .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (speech signal) in order to distinguish a music signal from a background noise signal (first adder) and prevent update of noise energy estimates on the music signal .
US20060271356A1
CLAIM 1
. A method for signal processing , said method comprising : encoding a first frame and a second frame of a speech signal (noise character parameter, activity prediction parameter) to produce corresponding first and second vectors , wherein the first vector represents a spectral envelope of the speech signal during the first frame and the second vector represents a spectral envelope of the speech signal during the second frame ;
generating a first quantized vector , said generating including quantizing a third vector that is based on at least a portion of the first vector ;
calculating a quantization error of the first quantized vector ;
calculating a fourth vector , said calculating including adding a scaled version of the quantization error to at least a portion of the second vector ;
and quantizing the fourth vector .

US20060271356A1
CLAIM 9
. An apparatus comprising : a speech encoder configured to encode a first frame of a speech signal into at least a first and to encode a second frame of a speech signal into at least a second vector , wherein the first vector represents a spectral envelope of the speech signal during the first frame and the second vector represents a spectral envelope of the speech signal during the second frame , a quantizer configured to quantize a third vector that is based on at least a portion of the first vector to generate a first quantized vector ;
a first adder (background noise signal) configured to calculate a quantization error of the first quantized vector ;
and a second adder configured to add a scaled version of the quantization error to at least a portion of the second vector to calculate a fourth vector ;
wherein said quantizer is configured to quantize the fourth vector .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (speech signal) indicative of an activity of the sound signal .
US20060271356A1
CLAIM 1
. A method for signal processing , said method comprising : encoding a first frame and a second frame of a speech signal (noise character parameter, activity prediction parameter) to produce corresponding first and second vectors , wherein the first vector represents a spectral envelope of the speech signal during the first frame and the second vector represents a spectral envelope of the speech signal during the second frame ;
generating a first quantized vector , said generating including quantizing a third vector that is based on at least a portion of the first vector ;
calculating a quantization error of the first quantized vector ;
calculating a fourth vector , said calculating including adding a scaled version of the quantization error to at least a portion of the second vector ;
and quantizing the fourth vector .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (speech signal) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
US20060271356A1
CLAIM 1
. A method for signal processing , said method comprising : encoding a first frame and a second frame of a speech signal (noise character parameter, activity prediction parameter) to produce corresponding first and second vectors , wherein the first vector represents a spectral envelope of the speech signal during the first frame and the second vector represents a spectral envelope of the speech signal during the second frame ;
generating a first quantized vector , said generating including quantizing a third vector that is based on at least a portion of the first vector ;
calculating a quantization error of the first quantized vector ;
calculating a fourth vector , said calculating including adding a scaled version of the quantization error to at least a portion of the second vector ;
and quantizing the fourth vector .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (speech signal) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US20060271356A1
CLAIM 1
. A method for signal processing , said method comprising : encoding a first frame and a second frame of a speech signal (noise character parameter, activity prediction parameter) to produce corresponding first and second vectors , wherein the first vector represents a spectral envelope of the speech signal during the first frame and the second vector represents a spectral envelope of the speech signal during the second frame ;
generating a first quantized vector , said generating including quantizing a third vector that is based on at least a portion of the first vector ;
calculating a quantization error of the first quantized vector ;
calculating a fourth vector , said calculating including adding a scaled version of the quantization error to at least a portion of the second vector ;
and quantizing the fourth vector .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (speech signal) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20060271356A1
CLAIM 1
. A method for signal processing , said method comprising : encoding a first frame and a second frame of a speech signal (noise character parameter, activity prediction parameter) to produce corresponding first and second vectors , wherein the first vector represents a spectral envelope of the speech signal during the first frame and the second vector represents a spectral envelope of the speech signal during the second frame ;
generating a first quantized vector , said generating including quantizing a third vector that is based on at least a portion of the first vector ;
calculating a quantization error of the first quantized vector ;
calculating a fourth vector , said calculating including adding a scaled version of the quantization error to at least a portion of the second vector ;
and quantizing the fourth vector .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (speech signal) inferior than a given fixed threshold .
US20060271356A1
CLAIM 1
. A method for signal processing , said method comprising : encoding a first frame and a second frame of a speech signal (noise character parameter, activity prediction parameter) to produce corresponding first and second vectors , wherein the first vector represents a spectral envelope of the speech signal during the first frame and the second vector represents a spectral envelope of the speech signal during the second frame ;
generating a first quantized vector , said generating including quantizing a third vector that is based on at least a portion of the first vector ;
calculating a quantization error of the first quantized vector ;
calculating a fourth vector , said calculating including adding a scaled version of the quantization error to at least a portion of the second vector ;
and quantizing the fourth vector .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (speech encoder) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map (corresponding portion) between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20060271356A1
CLAIM 4
. The method according to claim 1 , said method including calculating the scaled quantization error , said calculating comprising multiplying the quantization error by a scale factor , wherein the scale factor is based on a distance between at least a portion of the first vector and a corresponding portion (correlation map, term correlation map) of the second vector .

US20060271356A1
CLAIM 9
. An apparatus comprising : a speech encoder (frequency spectrum, noise estimator) configured to encode a first frame of a speech signal into at least a first and to encode a second frame of a speech signal into at least a second vector , wherein the first vector represents a spectral envelope of the speech signal during the first frame and the second vector represents a spectral envelope of the speech signal during the second frame , a quantizer configured to quantize a third vector that is based on at least a portion of the first vector to generate a first quantized vector ;
a first adder configured to calculate a quantization error of the first quantized vector ;
and a second adder configured to add a scaled version of the quantization error to at least a portion of the second vector to calculate a fourth vector ;
wherein said quantizer is configured to quantize the fourth vector .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (speech encoder) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map (corresponding portion) between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20060271356A1
CLAIM 4
. The method according to claim 1 , said method including calculating the scaled quantization error , said calculating comprising multiplying the quantization error by a scale factor , wherein the scale factor is based on a distance between at least a portion of the first vector and a corresponding portion (correlation map, term correlation map) of the second vector .

US20060271356A1
CLAIM 9
. An apparatus comprising : a speech encoder (frequency spectrum, noise estimator) configured to encode a first frame of a speech signal into at least a first and to encode a second frame of a speech signal into at least a second vector , wherein the first vector represents a spectral envelope of the speech signal during the first frame and the second vector represents a spectral envelope of the speech signal during the second frame , a quantizer configured to quantize a third vector that is based on at least a portion of the first vector to generate a first quantized vector ;
a first adder configured to calculate a quantization error of the first quantized vector ;
and a second adder configured to add a scaled version of the quantization error to at least a portion of the second vector to calculate a fourth vector ;
wherein said quantizer is configured to quantize the fourth vector .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (speech encoder) of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US20060271356A1
CLAIM 9
. An apparatus comprising : a speech encoder (frequency spectrum, noise estimator) configured to encode a first frame of a speech signal into at least a first and to encode a second frame of a speech signal into at least a second vector , wherein the first vector represents a spectral envelope of the speech signal during the first frame and the second vector represents a spectral envelope of the speech signal during the second frame , a quantizer configured to quantize a third vector that is based on at least a portion of the first vector to generate a first quantized vector ;
a first adder configured to calculate a quantization error of the first quantized vector ;
and a second adder configured to add a scaled version of the quantization error to at least a portion of the second vector to calculate a fourth vector ;
wherein said quantizer is configured to quantize the fourth vector .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map (corresponding portion) comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US20060271356A1
CLAIM 4
. The method according to claim 1 , said method including calculating the scaled quantization error , said calculating comprising multiplying the quantization error by a scale factor , wherein the scale factor is based on a distance between at least a portion of the first vector and a corresponding portion (correlation map, term correlation map) of the second vector .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (first adder) ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US20060271356A1
CLAIM 9
. An apparatus comprising : a speech encoder configured to encode a first frame of a speech signal into at least a first and to encode a second frame of a speech signal into at least a second vector , wherein the first vector represents a spectral envelope of the speech signal during the first frame and the second vector represents a spectral envelope of the speech signal during the second frame , a quantizer configured to quantize a third vector that is based on at least a portion of the first vector to generate a first quantized vector ;
a first adder (background noise signal) configured to calculate a quantization error of the first quantized vector ;
and a second adder configured to add a scaled version of the quantization error to at least a portion of the second vector to calculate a fourth vector ;
wherein said quantizer is configured to quantize the fourth vector .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal (first adder) ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US20060271356A1
CLAIM 9
. An apparatus comprising : a speech encoder configured to encode a first frame of a speech signal into at least a first and to encode a second frame of a speech signal into at least a second vector , wherein the first vector represents a spectral envelope of the speech signal during the first frame and the second vector represents a spectral envelope of the speech signal during the second frame , a quantizer configured to quantize a third vector that is based on at least a portion of the first vector to generate a first quantized vector ;
a first adder (background noise signal) configured to calculate a quantization error of the first quantized vector ;
and a second adder configured to add a scaled version of the quantization error to at least a portion of the second vector to calculate a fourth vector ;
wherein said quantizer is configured to quantize the fourth vector .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator (speech encoder) for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector .
US20060271356A1
CLAIM 9
. An apparatus comprising : a speech encoder (frequency spectrum, noise estimator) configured to encode a first frame of a speech signal into at least a first and to encode a second frame of a speech signal into at least a second vector , wherein the first vector represents a spectral envelope of the speech signal during the first frame and the second vector represents a spectral envelope of the speech signal during the second frame , a quantizer configured to quantize a third vector that is based on at least a portion of the first vector to generate a first quantized vector ;
a first adder configured to calculate a quantization error of the first quantized vector ;
and a second adder configured to add a scaled version of the quantization error to at least a portion of the second vector to calculate a fourth vector ;
wherein said quantizer is configured to quantize the fourth vector .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal (first adder) and preventing update of noise energy estimates .
US20060271356A1
CLAIM 9
. An apparatus comprising : a speech encoder configured to encode a first frame of a speech signal into at least a first and to encode a second frame of a speech signal into at least a second vector , wherein the first vector represents a spectral envelope of the speech signal during the first frame and the second vector represents a spectral envelope of the speech signal during the second frame , a quantizer configured to quantize a third vector that is based on at least a portion of the first vector to generate a first quantized vector ;
a first adder (background noise signal) configured to calculate a quantization error of the first quantized vector ;
and a second adder configured to add a scaled version of the quantization error to at least a portion of the second vector to calculate a fourth vector ;
wherein said quantizer is configured to quantize the fourth vector .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20070088558A1

Filed: 2006-04-03     Issued: 2007-04-19

Systems, methods, and apparatus for speech signal filtering

(Original Assignee) Qualcomm Inc     (Current Assignee) Qualcomm Inc

Koen Vos, Ananthapadmanabhan Kandhadai
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum (different sampling rates) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US20070088558A1
CLAIM 5
. The apparatus according to claim 1 , wherein the lowband speech signal and the highband speech signal have different sampling rates (current residual spectrum) .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum (different sampling rates) comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US20070088558A1
CLAIM 5
. The apparatus according to claim 1 , wherein the lowband speech signal and the highband speech signal have different sampling rates (current residual spectrum) .

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum (different sampling rates) comprises locating a maximum between each pair of two consecutive minima of the current residual spectrum .
US20070088558A1
CLAIM 5
. The apparatus according to claim 1 , wherein the lowband speech signal and the highband speech signal have different sampling rates (current residual spectrum) .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum (different sampling rates) , calculating a normalized correlation value with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US20070088558A1
CLAIM 5
. The apparatus according to claim 1 , wherein the lowband speech signal and the highband speech signal have different sampling rates (current residual spectrum) .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (gain factors) .
US20070088558A1
CLAIM 9
. The apparatus according to claim 8 , wherein the second speech encoder is configured to encode the highband signal into at least a plurality of highband filter parameters and a plurality of gain factors (SNR calculation) .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum (different sampling rates) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20070088558A1
CLAIM 5
. The apparatus according to claim 1 , wherein the lowband speech signal and the highband speech signal have different sampling rates (current residual spectrum) .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum (different sampling rates) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20070088558A1
CLAIM 5
. The apparatus according to claim 1 , wherein the lowband speech signal and the highband speech signal have different sampling rates (current residual spectrum) .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum (different sampling rates) comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US20070088558A1
CLAIM 5
. The apparatus according to claim 1 , wherein the lowband speech signal and the highband speech signal have different sampling rates (current residual spectrum) .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20060147124A1

Filed: 2006-02-15     Issued: 2006-07-06

Perceptual coding of image signals using separated irrelevancy reduction and redundancy reduction

(Original Assignee) Agere Systems LLC     (Current Assignee) Agere Systems LLC

Bernd Edler, Gerald Schuller
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (magnitude response, band signals) using a frequency spectrum (magnitude response, band signals) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (magnitude response, band signals) of the sound signal (magnitude response, band signals) in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (magnitude response, band signals) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin (magnitude response, band signals) by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (magnitude response, band signals) so as to produce a summed long-term correlation map .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (magnitude response, band signals) .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (magnitude response, band signals) comprises searching in the correlation map for frequency bins (magnitude response, band signals) having a magnitude that exceeds a given fixed threshold .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (magnitude response, band signals) comprises comparing the summed long-term correlation map with an adaptive threshold indicative of sound activity in the sound signal .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal (magnitude response, band signals) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (magnitude response, band signals) from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound signal (magnitude response, band signals) is detected .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity in the sound signal (magnitude response, band signals) further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection comprises detecting the sound signal (magnitude response, band signals) based on a frequency dependent signal-to-noise ratio (SNR) .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal (magnitude response, band signals) further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal (magnitude response, band signals) and a ratio between a second order and a sixteenth order of linear prediction residual error (redundancy reduction) energies .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 2
. The method of claim 1 , wherein said quantizing and encoding step uses a transform or analysis filter bank suitable for redundancy reduction (residual error) .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 18
. A method as defined in claim 14 , comprising classifying the sound signal (magnitude response, band signals) as one of an inactive sound signal and active sound signal , which comprises determining an inactive sound signal when the average signal-to-noise ratio (SNR av ) is inferior to the calculated threshold .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 19
. A method as defined in claim 14 , comprising classifying the sound signal (magnitude response, band signals) as one of an inactive sound signal and active sound signal , which comprises determining an active sound signal when the average signal-to-noise ratio (SNR av ) is larger than the calculated threshold .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal (magnitude response, band signals) prevents updating of noise energy estimates when a music signal (magnitude response, band signals) is detected .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal (magnitude response, band signals) from a background noise signal and prevent update of noise energy estimates on the music signal .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame energy and an average frame (magnitude response, band signals) energy .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal (magnitude response, band signals) in a current frame and an energy of the sound signal in a previous frame , for frequency bands (magnitude response, band signals) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter indicative of an activity of the sound signal (magnitude response, band signals) .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal (magnitude response, band signals) and the complementary non-stationarity parameter .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (magnitude response, band signals) into a first group (magnitude response, band signals) of a certain number of first frequency (magnitude response, band signals) bands and a second group of a rest of the frequency bands ;

calculating a first energy (magnitude response, band signals) value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal (magnitude response, band signals) using a frequency spectrum (magnitude response, band signals) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal (magnitude response, band signals) using a frequency spectrum (magnitude response, band signals) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (magnitude response, band signals) of the sound signal (magnitude response, band signals) in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin (magnitude response, band signals) by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (magnitude response, band signals) so as to produce a summed long-term correlation map .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 34
. A device as defined in claim 31 , further comprising a detector of strong tones in the sound signal (magnitude response, band signals) .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal (magnitude response, band signals) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (magnitude response, band signals) from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal (magnitude response, band signals) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal (magnitude response, band signals) from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal (magnitude response, band signals) to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal (magnitude response, band signals) for distinguishing a music signal (magnitude response, band signals) from a background noise signal and preventing update of noise energy estimates .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US8990073B2
CLAIM 41
. A device as defined in claim 36 , further comprising a calculator of a spectral parameter used for detecting spectral changes and spectral attacks in the sound signal (magnitude response, band signals) .
US20060147124A1
CLAIM 1
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) that approximates an inverse of a corresponding visibility threshold ;
and quantizing and encoding the filter output signal together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .

US20060147124A1
CLAIM 11
. A method for encoding an image signal , comprising the steps of : filtering said image signal using an adaptive filter , said adaptive filter producing a filter output signal and having a magnitude response that approximates an inverse of a corresponding visibility threshold ;
transforming the filter output signal using a plurality of subbands suitable for redundancy reduction ;
and quantizing and encoding the subband signals (average signal, average frame, sound signal, frequency spectrum, music signal, term signal, average frame energy, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) together with side information for filter adaptation control , wherein the spectral and temporal resolutions of one or more subbands utilized in said encoding are selected independent of said adaptive filter .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
WO2006067436A1

Filed: 2005-12-21     Issued: 2006-06-29

Channel impulse response estimation

(Original Assignee) Universitetet I Oslo; Samuels, Adrian, James     

Tobias Dahl, Gudbrand Eggen
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum (containing sample) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (containing sample) , and an initial value of the long term correlation map .
WO2006067436A1
CLAIM 19
. A method as claimed in any preceding claim wherein the calculated inverse matrix comprises a pair of matrices derived from a subset of a singular value decomposition of the impulse signal matrix , the method comprising the step of multiplying said matrices by a vector containing sample (current residual spectrum, current frame, current frame energy, average frame energy) s of said received signal .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum (containing sample) comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame (containing sample) ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
WO2006067436A1
CLAIM 19
. A method as claimed in any preceding claim wherein the calculated inverse matrix comprises a pair of matrices derived from a subset of a singular value decomposition of the impulse signal matrix , the method comprising the step of multiplying said matrices by a vector containing sample (current residual spectrum, current frame, current frame energy, average frame energy) s of said received signal .

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum (containing sample) comprises locating a maximum between each pair of two consecutive minima (singular value decomposition) of the current residual spectrum .
WO2006067436A1
CLAIM 19
. A method as claimed in any preceding claim wherein the calculated inverse matrix comprises a pair of matrices derived from a subset of a singular value decomposition (consecutive minima, two consecutive minima) of the impulse signal matrix , the method comprising the step of multiplying said matrices by a vector containing sample (current residual spectrum, current frame, current frame energy, average frame energy) s of said received signal .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum (containing sample) , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (said subset) between two consecutive minima (singular value decomposition) in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
WO2006067436A1
CLAIM 18
. A method as claimed in any preceding claim comprising calculating a subset of rows and/or columns of said inverse matrix and interpolating between said subset (frequency bins) to complete the calculated inverse matrix .

WO2006067436A1
CLAIM 19
. A method as claimed in any preceding claim wherein the calculated inverse matrix comprises a pair of matrices derived from a subset of a singular value decomposition (consecutive minima, two consecutive minima) of the impulse signal matrix , the method comprising the step of multiplying said matrices by a vector containing sample (current residual spectrum, current frame, current frame energy, average frame energy) s of said received signal .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (said subset) so as to produce a summed long-term correlation map .
WO2006067436A1
CLAIM 18
. A method as claimed in any preceding claim comprising calculating a subset of rows and/or columns of said inverse matrix and interpolating between said subset (frequency bins) to complete the calculated inverse matrix .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map for frequency bins (said subset) having a magnitude that exceeds a given fixed threshold .
WO2006067436A1
CLAIM 18
. A method as claimed in any preceding claim comprising calculating a subset of rows and/or columns of said inverse matrix and interpolating between said subset (frequency bins) to complete the calculated inverse matrix .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame (containing sample) energy and an average frame energy (containing sample) .
WO2006067436A1
CLAIM 19
. A method as claimed in any preceding claim wherein the calculated inverse matrix comprises a pair of matrices derived from a subset of a singular value decomposition of the impulse signal matrix , the method comprising the step of multiplying said matrices by a vector containing sample (current residual spectrum, current frame, current frame energy, average frame energy) s of said received signal .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame (containing sample) and an energy of the sound signal in a previous frame , for frequency bands (same frequency band) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
WO2006067436A1
CLAIM 19
. A method as claimed in any preceding claim wherein the calculated inverse matrix comprises a pair of matrices derived from a subset of a singular value decomposition of the impulse signal matrix , the method comprising the step of multiplying said matrices by a vector containing sample (current residual spectrum, current frame, current frame energy, average frame energy) s of said received signal .

WO2006067436A1
CLAIM 22
. A method as claimed in any preceding claim comprising driving a plurality of transmitters by respective signals from the same frequency band (frequency bands, first frequency bands) , and separating the received signals at the receiver .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (same frequency band) into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
WO2006067436A1
CLAIM 22
. A method as claimed in any preceding claim comprising driving a plurality of transmitters by respective signals from the same frequency band (frequency bands, first frequency bands) , and separating the received signals at the receiver .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum (containing sample) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (containing sample) , and an initial value of the long-term correlation map .
WO2006067436A1
CLAIM 19
. A method as claimed in any preceding claim wherein the calculated inverse matrix comprises a pair of matrices derived from a subset of a singular value decomposition of the impulse signal matrix , the method comprising the step of multiplying said matrices by a vector containing sample (current residual spectrum, current frame, current frame energy, average frame energy) s of said received signal .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum (containing sample) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (containing sample) , and an initial value of the long-term correlation map .
WO2006067436A1
CLAIM 19
. A method as claimed in any preceding claim wherein the calculated inverse matrix comprises a pair of matrices derived from a subset of a singular value decomposition of the impulse signal matrix , the method comprising the step of multiplying said matrices by a vector containing sample (current residual spectrum, current frame, current frame energy, average frame energy) s of said received signal .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum (containing sample) comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame (containing sample) ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
WO2006067436A1
CLAIM 19
. A method as claimed in any preceding claim wherein the calculated inverse matrix comprises a pair of matrices derived from a subset of a singular value decomposition of the impulse signal matrix , the method comprising the step of multiplying said matrices by a vector containing sample (current residual spectrum, current frame, current frame energy, average frame energy) s of said received signal .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (said subset) so as to produce a summed long-term correlation map .
WO2006067436A1
CLAIM 18
. A method as claimed in any preceding claim comprising calculating a subset of rows and/or columns of said inverse matrix and interpolating between said subset (frequency bins) to complete the calculated inverse matrix .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20060036432A1

Filed: 2005-10-12     Issued: 2006-02-16

Apparatus and method applying adaptive spectral whitening in a high-frequency reconstruction coding system

(Original Assignee) Kristofer Kjorling; Per Ekstrand; Fredrik Henn; Lars Villemoes     (Current Assignee) Dolby International AB

Kristofer Kjorling, Per Ekstrand, Fredrik Henn, Lars Villemoes
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (band signals) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US20060036432A1
CLAIM 1
. A method for enhancement of a decoder in an audio source coding system using high-frequency reconstruction , comprising : subband filtering a lowband signal to obtain a plurality of subband signals (first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) ;
and adaptively , spectrally whiten a signal prior to High Frequency Reconstruction or after High Frequency Reconstruction , according to spectral whitening information indicating a required amount of spectral whitening at a given time , in order to obtain a similar tonal character of the highband after the High Frequency Reconstruction as in a highband of an original signal .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (band signals) of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US20060036432A1
CLAIM 1
. A method for enhancement of a decoder in an audio source coding system using high-frequency reconstruction , comprising : subband filtering a lowband signal to obtain a plurality of subband signals (first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) ;
and adaptively , spectrally whiten a signal prior to High Frequency Reconstruction or after High Frequency Reconstruction , according to spectral whitening information indicating a required amount of spectral whitening at a given time , in order to obtain a similar tonal character of the highband after the High Frequency Reconstruction as in a highband of an original signal .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (band signals) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US20060036432A1
CLAIM 1
. A method for enhancement of a decoder in an audio source coding system using high-frequency reconstruction , comprising : subband filtering a lowband signal to obtain a plurality of subband signals (first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) ;
and adaptively , spectrally whiten a signal prior to High Frequency Reconstruction or after High Frequency Reconstruction , according to spectral whitening information indicating a required amount of spectral whitening at a given time , in order to obtain a similar tonal character of the highband after the High Frequency Reconstruction as in a highband of an original signal .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin (band signals) by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (band signals) so as to produce a summed long-term correlation map .
US20060036432A1
CLAIM 1
. A method for enhancement of a decoder in an audio source coding system using high-frequency reconstruction , comprising : subband filtering a lowband signal to obtain a plurality of subband signals (first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) ;
and adaptively , spectrally whiten a signal prior to High Frequency Reconstruction or after High Frequency Reconstruction , according to spectral whitening information indicating a required amount of spectral whitening at a given time , in order to obtain a similar tonal character of the highband after the High Frequency Reconstruction as in a highband of an original signal .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map for frequency bins (band signals) having a magnitude that exceeds a given fixed threshold .
US20060036432A1
CLAIM 1
. A method for enhancement of a decoder in an audio source coding system using high-frequency reconstruction , comprising : subband filtering a lowband signal to obtain a plurality of subband signals (first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) ;
and adaptively , spectrally whiten a signal prior to High Frequency Reconstruction or after High Frequency Reconstruction , according to spectral whitening information indicating a required amount of spectral whitening at a given time , in order to obtain a similar tonal character of the highband after the High Frequency Reconstruction as in a highband of an original signal .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection further comprises updating the noise estimates (subband filter) for a next frame .
US20060036432A1
CLAIM 1
. A method for enhancement of a decoder in an audio source coding system using high-frequency reconstruction , comprising : subband filter (noise estimates) ing a lowband signal to obtain a plurality of subband signals ;
and adaptively , spectrally whiten a signal prior to High Frequency Reconstruction or after High Frequency Reconstruction , according to spectral whitening information indicating a required amount of spectral whitening at a given time , in order to obtain a similar tonal character of the highband after the High Frequency Reconstruction as in a highband of an original signal .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order (following steps) and a sixteenth order of linear prediction (linear prediction) residual error energies .
US20060036432A1
CLAIM 4
. The method of claim 3 , in which the step of spectrally whiten includes linear prediction (linear prediction, residual error) and filtering .

US20060036432A1
CLAIM 12
. The method of claim 1 , in which the step of spectrally whiten includes the following steps (second order) : prefiltering a subband signal ;
feeding an output of the prefiltering into a delay chain having a depth depending on a filter order ;
feeding delayed signals and conjugates thereof to a linear prediction block for calculating coefficients ;
keeping coefficients from every L th calculation by a decimator ;
and filtering the subband signals using a filterblock where predicted coefficients are used and updated for every L th sample , where L is a subband sample time step .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame and an energy of the sound signal in a previous frame , for frequency bands (band signals) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US20060036432A1
CLAIM 1
. A method for enhancement of a decoder in an audio source coding system using high-frequency reconstruction , comprising : subband filtering a lowband signal to obtain a plurality of subband signals (first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) ;
and adaptively , spectrally whiten a signal prior to High Frequency Reconstruction or after High Frequency Reconstruction , according to spectral whitening information indicating a required amount of spectral whitening at a given time , in order to obtain a similar tonal character of the highband after the High Frequency Reconstruction as in a highband of an original signal .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (band signals) into a first group (band signals) of a certain number of first frequency (band signals) bands and a second group of a rest of the frequency bands ;

calculating a first energy (band signals) value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20060036432A1
CLAIM 1
. A method for enhancement of a decoder in an audio source coding system using high-frequency reconstruction , comprising : subband filtering a lowband signal to obtain a plurality of subband signals (first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) ;
and adaptively , spectrally whiten a signal prior to High Frequency Reconstruction or after High Frequency Reconstruction , according to spectral whitening information indicating a required amount of spectral whitening at a given time , in order to obtain a similar tonal character of the highband after the High Frequency Reconstruction as in a highband of an original signal .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (band signals) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20060036432A1
CLAIM 1
. A method for enhancement of a decoder in an audio source coding system using high-frequency reconstruction , comprising : subband filtering a lowband signal to obtain a plurality of subband signals (first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) ;
and adaptively , spectrally whiten a signal prior to High Frequency Reconstruction or after High Frequency Reconstruction , according to spectral whitening information indicating a required amount of spectral whitening at a given time , in order to obtain a similar tonal character of the highband after the High Frequency Reconstruction as in a highband of an original signal .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (band signals) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20060036432A1
CLAIM 1
. A method for enhancement of a decoder in an audio source coding system using high-frequency reconstruction , comprising : subband filtering a lowband signal to obtain a plurality of subband signals (first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) ;
and adaptively , spectrally whiten a signal prior to High Frequency Reconstruction or after High Frequency Reconstruction , according to spectral whitening information indicating a required amount of spectral whitening at a given time , in order to obtain a similar tonal character of the highband after the High Frequency Reconstruction as in a highband of an original signal .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (band signals) of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US20060036432A1
CLAIM 1
. A method for enhancement of a decoder in an audio source coding system using high-frequency reconstruction , comprising : subband filtering a lowband signal to obtain a plurality of subband signals (first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) ;
and adaptively , spectrally whiten a signal prior to High Frequency Reconstruction or after High Frequency Reconstruction , according to spectral whitening information indicating a required amount of spectral whitening at a given time , in order to obtain a similar tonal character of the highband after the High Frequency Reconstruction as in a highband of an original signal .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin (band signals) by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (band signals) so as to produce a summed long-term correlation map .
US20060036432A1
CLAIM 1
. A method for enhancement of a decoder in an audio source coding system using high-frequency reconstruction , comprising : subband filtering a lowband signal to obtain a plurality of subband signals (first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) ;
and adaptively , spectrally whiten a signal prior to High Frequency Reconstruction or after High Frequency Reconstruction , according to spectral whitening information indicating a required amount of spectral whitening at a given time , in order to obtain a similar tonal character of the highband after the High Frequency Reconstruction as in a highband of an original signal .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
JP2006094522A

Filed: 2005-09-22     Issued: 2006-04-06

ノイズ低減による多重チャンネル適応の音声信号処理

(Original Assignee) Harman Becker Automotive Systems Gmbh; ハーマン ベッカー オートモーティブ システムズ ゲーエムベーハー     

Markus Buck, Tim Haulick, Phil Hetherington, Pierre Zakarauskas, ハウリック ティム, ザカラースカス ピエール, ヘザーリントン フィル, バック マルクス
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (音声信号) using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
JP2006094522A
CLAIM 1
ノイズ低減による音声信号 (sound signal) 処理のためのシステムであって、 マイクロホン信号を検出するために少なくとも2つのマイクロホンを有するマイクロホンアレイと、 前処理信号を得るために該マイクロホン信号の時間遅延補償を達成するように構成される手段を有し、マイクロホン信号を受信するために該マイクロホンアレイに接続される前処理手段と、 少なくとも1つのノイズ参照信号、特に、各マイクロホン信号に対して1つのノイズ参照信号を生成するように構成される手段を有し、該前処理信号を受信するために該前処理手段に接続される第1の信号処理手段と、 ビームフォームされた信号を得るために適応するビームフォーマーを有し、該前処理信号を受信するための該前処理手段に接続される第2の信号処理手段と、 該少なくとも1つのノイズ参照信号に基づいて、該ビームフォームされた信号のノイズを低減するように構成される適応するノイズキャンセリング手段と を備える、システム。

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal (音声信号) in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
JP2006094522A
CLAIM 1
ノイズ低減による音声信号 (sound signal) 処理のためのシステムであって、 マイクロホン信号を検出するために少なくとも2つのマイクロホンを有するマイクロホンアレイと、 前処理信号を得るために該マイクロホン信号の時間遅延補償を達成するように構成される手段を有し、マイクロホン信号を受信するために該マイクロホンアレイに接続される前処理手段と、 少なくとも1つのノイズ参照信号、特に、各マイクロホン信号に対して1つのノイズ参照信号を生成するように構成される手段を有し、該前処理信号を受信するために該前処理手段に接続される第1の信号処理手段と、 ビームフォームされた信号を得るために適応するビームフォーマーを有し、該前処理信号を受信するための該前処理手段に接続される第2の信号処理手段と、 該少なくとも1つのノイズ参照信号に基づいて、該ビームフォームされた信号のノイズを低減するように構成される適応するノイズキャンセリング手段と を備える、システム。

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin (ロッキング) by frequency bin basis ;

and summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
JP2006094522A
CLAIM 2
前記少なくとも1つのノイズ参照信号を生成するように構成される手段が、非適応のブロッキング (frequency bin) マトリクスまたは適応のブロッキングマトリクスを備える、請求項1に記載のシステム。

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (音声信号) .
JP2006094522A
CLAIM 1
ノイズ低減による音声信号 (sound signal) 処理のためのシステムであって、 マイクロホン信号を検出するために少なくとも2つのマイクロホンを有するマイクロホンアレイと、 前処理信号を得るために該マイクロホン信号の時間遅延補償を達成するように構成される手段を有し、マイクロホン信号を受信するために該マイクロホンアレイに接続される前処理手段と、 少なくとも1つのノイズ参照信号、特に、各マイクロホン信号に対して1つのノイズ参照信号を生成するように構成される手段を有し、該前処理信号を受信するために該前処理手段に接続される第1の信号処理手段と、 ビームフォームされた信号を得るために適応するビームフォーマーを有し、該前処理信号を受信するための該前処理手段に接続される第2の信号処理手段と、 該少なくとも1つのノイズ参照信号に基づいて、該ビームフォームされた信号のノイズを低減するように構成される適応するノイズキャンセリング手段と を備える、システム。

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (音声信号) comprises searching in the correlation map for frequency bins having a magnitude that exceeds a given fixed threshold .
JP2006094522A
CLAIM 1
ノイズ低減による音声信号 (sound signal) 処理のためのシステムであって、 マイクロホン信号を検出するために少なくとも2つのマイクロホンを有するマイクロホンアレイと、 前処理信号を得るために該マイクロホン信号の時間遅延補償を達成するように構成される手段を有し、マイクロホン信号を受信するために該マイクロホンアレイに接続される前処理手段と、 少なくとも1つのノイズ参照信号、特に、各マイクロホン信号に対して1つのノイズ参照信号を生成するように構成される手段を有し、該前処理信号を受信するために該前処理手段に接続される第1の信号処理手段と、 ビームフォームされた信号を得るために適応するビームフォーマーを有し、該前処理信号を受信するための該前処理手段に接続される第2の信号処理手段と、 該少なくとも1つのノイズ参照信号に基づいて、該ビームフォームされた信号のノイズを低減するように構成される適応するノイズキャンセリング手段と を備える、システム。

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (音声信号) comprises comparing the summed long-term correlation map with an adaptive threshold indicative of sound activity in the sound signal .
JP2006094522A
CLAIM 1
ノイズ低減による音声信号 (sound signal) 処理のためのシステムであって、 マイクロホン信号を検出するために少なくとも2つのマイクロホンを有するマイクロホンアレイと、 前処理信号を得るために該マイクロホン信号の時間遅延補償を達成するように構成される手段を有し、マイクロホン信号を受信するために該マイクロホンアレイに接続される前処理手段と、 少なくとも1つのノイズ参照信号、特に、各マイクロホン信号に対して1つのノイズ参照信号を生成するように構成される手段を有し、該前処理信号を受信するために該前処理手段に接続される第1の信号処理手段と、 ビームフォームされた信号を得るために適応するビームフォーマーを有し、該前処理信号を受信するための該前処理手段に接続される第2の信号処理手段と、 該少なくとも1つのノイズ参照信号に基づいて、該ビームフォームされた信号のノイズを低減するように構成される適応するノイズキャンセリング手段と を備える、システム。

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal (音声信号) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
JP2006094522A
CLAIM 1
ノイズ低減による音声信号 (sound signal) 処理のためのシステムであって、 マイクロホン信号を検出するために少なくとも2つのマイクロホンを有するマイクロホンアレイと、 前処理信号を得るために該マイクロホン信号の時間遅延補償を達成するように構成される手段を有し、マイクロホン信号を受信するために該マイクロホンアレイに接続される前処理手段と、 少なくとも1つのノイズ参照信号、特に、各マイクロホン信号に対して1つのノイズ参照信号を生成するように構成される手段を有し、該前処理信号を受信するために該前処理手段に接続される第1の信号処理手段と、 ビームフォームされた信号を得るために適応するビームフォーマーを有し、該前処理信号を受信するための該前処理手段に接続される第2の信号処理手段と、 該少なくとも1つのノイズ参照信号に基づいて、該ビームフォームされた信号のノイズを低減するように構成される適応するノイズキャンセリング手段と を備える、システム。

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound signal (音声信号) (サブバンド) is detected .
JP2006094522A
CLAIM 1
ノイズ低減による音声信号 (sound signal) 処理のためのシステムであって、 マイクロホン信号を検出するために少なくとも2つのマイクロホンを有するマイクロホンアレイと、 前処理信号を得るために該マイクロホン信号の時間遅延補償を達成するように構成される手段を有し、マイクロホン信号を受信するために該マイクロホンアレイに接続される前処理手段と、 少なくとも1つのノイズ参照信号、特に、各マイクロホン信号に対して1つのノイズ参照信号を生成するように構成される手段を有し、該前処理信号を受信するために該前処理手段に接続される第1の信号処理手段と、 ビームフォームされた信号を得るために適応するビームフォーマーを有し、該前処理信号を受信するための該前処理手段に接続される第2の信号処理手段と、 該少なくとも1つのノイズ参照信号に基づいて、該ビームフォームされた信号のノイズを低減するように構成される適応するノイズキャンセリング手段と を備える、システム。

JP2006094522A
CLAIM 7
前記前処理手段および/または前記第の1処理手段および/または前記第2の処理手段が、時間領域または周波数領域またはサブバンド (tonal sound signal) 周波数領域で、処理を達成するように構成される、請求項1から請求項6のいずれか1項に記載のシステム。

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity in the sound signal (音声信号) further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
JP2006094522A
CLAIM 1
ノイズ低減による音声信号 (sound signal) 処理のためのシステムであって、 マイクロホン信号を検出するために少なくとも2つのマイクロホンを有するマイクロホンアレイと、 前処理信号を得るために該マイクロホン信号の時間遅延補償を達成するように構成される手段を有し、マイクロホン信号を受信するために該マイクロホンアレイに接続される前処理手段と、 少なくとも1つのノイズ参照信号、特に、各マイクロホン信号に対して1つのノイズ参照信号を生成するように構成される手段を有し、該前処理信号を受信するために該前処理手段に接続される第1の信号処理手段と、 ビームフォームされた信号を得るために適応するビームフォーマーを有し、該前処理信号を受信するための該前処理手段に接続される第2の信号処理手段と、 該少なくとも1つのノイズ参照信号に基づいて、該ビームフォームされた信号のノイズを低減するように構成される適応するノイズキャンセリング手段と を備える、システム。

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection comprises detecting the sound signal (音声信号) based on a frequency dependent signal-to-noise ratio (SNR) .
JP2006094522A
CLAIM 1
ノイズ低減による音声信号 (sound signal) 処理のためのシステムであって、 マイクロホン信号を検出するために少なくとも2つのマイクロホンを有するマイクロホンアレイと、 前処理信号を得るために該マイクロホン信号の時間遅延補償を達成するように構成される手段を有し、マイクロホン信号を受信するために該マイクロホンアレイに接続される前処理手段と、 少なくとも1つのノイズ参照信号、特に、各マイクロホン信号に対して1つのノイズ参照信号を生成するように構成される手段を有し、該前処理信号を受信するために該前処理手段に接続される第1の信号処理手段と、 ビームフォームされた信号を得るために適応するビームフォーマーを有し、該前処理信号を受信するための該前処理手段に接続される第2の信号処理手段と、 該少なくとも1つのノイズ参照信号に基づいて、該ビームフォームされた信号のノイズを低減するように構成される適応するノイズキャンセリング手段と を備える、システム。

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal (音声信号) further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
JP2006094522A
CLAIM 1
ノイズ低減による音声信号 (sound signal) 処理のためのシステムであって、 マイクロホン信号を検出するために少なくとも2つのマイクロホンを有するマイクロホンアレイと、 前処理信号を得るために該マイクロホン信号の時間遅延補償を達成するように構成される手段を有し、マイクロホン信号を受信するために該マイクロホンアレイに接続される前処理手段と、 少なくとも1つのノイズ参照信号、特に、各マイクロホン信号に対して1つのノイズ参照信号を生成するように構成される手段を有し、該前処理信号を受信するために該前処理手段に接続される第1の信号処理手段と、 ビームフォームされた信号を得るために適応するビームフォーマーを有し、該前処理信号を受信するための該前処理手段に接続される第2の信号処理手段と、 該少なくとも1つのノイズ参照信号に基づいて、該ビームフォームされた信号のノイズを低減するように構成される適応するノイズキャンセリング手段と を備える、システム。

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal (音声信号) and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
JP2006094522A
CLAIM 1
ノイズ低減による音声信号 (sound signal) 処理のためのシステムであって、 マイクロホン信号を検出するために少なくとも2つのマイクロホンを有するマイクロホンアレイと、 前処理信号を得るために該マイクロホン信号の時間遅延補償を達成するように構成される手段を有し、マイクロホン信号を受信するために該マイクロホンアレイに接続される前処理手段と、 少なくとも1つのノイズ参照信号、特に、各マイクロホン信号に対して1つのノイズ参照信号を生成するように構成される手段を有し、該前処理信号を受信するために該前処理手段に接続される第1の信号処理手段と、 ビームフォームされた信号を得るために適応するビームフォーマーを有し、該前処理信号を受信するための該前処理手段に接続される第2の信号処理手段と、 該少なくとも1つのノイズ参照信号に基づいて、該ビームフォームされた信号のノイズを低減するように構成される適応するノイズキャンセリング手段と を備える、システム。

US8990073B2
CLAIM 18
. A method as defined in claim 14 , comprising classifying the sound signal (音声信号) as one of an inactive sound signal and active sound signal , which comprises determining an inactive sound signal when the average signal-to-noise ratio (SNR av ) is inferior to the calculated threshold .
JP2006094522A
CLAIM 1
ノイズ低減による音声信号 (sound signal) 処理のためのシステムであって、 マイクロホン信号を検出するために少なくとも2つのマイクロホンを有するマイクロホンアレイと、 前処理信号を得るために該マイクロホン信号の時間遅延補償を達成するように構成される手段を有し、マイクロホン信号を受信するために該マイクロホンアレイに接続される前処理手段と、 少なくとも1つのノイズ参照信号、特に、各マイクロホン信号に対して1つのノイズ参照信号を生成するように構成される手段を有し、該前処理信号を受信するために該前処理手段に接続される第1の信号処理手段と、 ビームフォームされた信号を得るために適応するビームフォーマーを有し、該前処理信号を受信するための該前処理手段に接続される第2の信号処理手段と、 該少なくとも1つのノイズ参照信号に基づいて、該ビームフォームされた信号のノイズを低減するように構成される適応するノイズキャンセリング手段と を備える、システム。

US8990073B2
CLAIM 19
. A method as defined in claim 14 , comprising classifying the sound signal (音声信号) as one of an inactive sound signal and active sound signal , which comprises determining an active sound signal when the average signal-to-noise ratio (SNR av ) is larger than the calculated threshold .
JP2006094522A
CLAIM 1
ノイズ低減による音声信号 (sound signal) 処理のためのシステムであって、 マイクロホン信号を検出するために少なくとも2つのマイクロホンを有するマイクロホンアレイと、 前処理信号を得るために該マイクロホン信号の時間遅延補償を達成するように構成される手段を有し、マイクロホン信号を受信するために該マイクロホンアレイに接続される前処理手段と、 少なくとも1つのノイズ参照信号、特に、各マイクロホン信号に対して1つのノイズ参照信号を生成するように構成される手段を有し、該前処理信号を受信するために該前処理手段に接続される第1の信号処理手段と、 ビームフォームされた信号を得るために適応するビームフォーマーを有し、該前処理信号を受信するための該前処理手段に接続される第2の信号処理手段と、 該少なくとも1つのノイズ参照信号に基づいて、該ビームフォームされた信号のノイズを低減するように構成される適応するノイズキャンセリング手段と を備える、システム。

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal (音声信号) prevents updating of noise energy estimates when a music signal is detected .
JP2006094522A
CLAIM 1
ノイズ低減による音声信号 (sound signal) 処理のためのシステムであって、 マイクロホン信号を検出するために少なくとも2つのマイクロホンを有するマイクロホンアレイと、 前処理信号を得るために該マイクロホン信号の時間遅延補償を達成するように構成される手段を有し、マイクロホン信号を受信するために該マイクロホンアレイに接続される前処理手段と、 少なくとも1つのノイズ参照信号、特に、各マイクロホン信号に対して1つのノイズ参照信号を生成するように構成される手段を有し、該前処理信号を受信するために該前処理手段に接続される第1の信号処理手段と、 ビームフォームされた信号を得るために適応するビームフォーマーを有し、該前処理信号を受信するための該前処理手段に接続される第2の信号処理手段と、 該少なくとも1つのノイズ参照信号に基づいて、該ビームフォームされた信号のノイズを低減するように構成される適応するノイズキャンセリング手段と を備える、システム。

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal (音声信号) in a current frame and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
JP2006094522A
CLAIM 1
ノイズ低減による音声信号 (sound signal) 処理のためのシステムであって、 マイクロホン信号を検出するために少なくとも2つのマイクロホンを有するマイクロホンアレイと、 前処理信号を得るために該マイクロホン信号の時間遅延補償を達成するように構成される手段を有し、マイクロホン信号を受信するために該マイクロホンアレイに接続される前処理手段と、 少なくとも1つのノイズ参照信号、特に、各マイクロホン信号に対して1つのノイズ参照信号を生成するように構成される手段を有し、該前処理信号を受信するために該前処理手段に接続される第1の信号処理手段と、 ビームフォームされた信号を得るために適応するビームフォーマーを有し、該前処理信号を受信するための該前処理手段に接続される第2の信号処理手段と、 該少なくとも1つのノイズ参照信号に基づいて、該ビームフォームされた信号のノイズを低減するように構成される適応するノイズキャンセリング手段と を備える、システム。

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter indicative of an activity of the sound signal (音声信号) .
JP2006094522A
CLAIM 1
ノイズ低減による音声信号 (sound signal) 処理のためのシステムであって、 マイクロホン信号を検出するために少なくとも2つのマイクロホンを有するマイクロホンアレイと、 前処理信号を得るために該マイクロホン信号の時間遅延補償を達成するように構成される手段を有し、マイクロホン信号を受信するために該マイクロホンアレイに接続される前処理手段と、 少なくとも1つのノイズ参照信号、特に、各マイクロホン信号に対して1つのノイズ参照信号を生成するように構成される手段を有し、該前処理信号を受信するために該前処理手段に接続される第1の信号処理手段と、 ビームフォームされた信号を得るために適応するビームフォーマーを有し、該前処理信号を受信するための該前処理手段に接続される第2の信号処理手段と、 該少なくとも1つのノイズ参照信号に基づいて、該ビームフォームされた信号のノイズを低減するように構成される適応するノイズキャンセリング手段と を備える、システム。

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal (音声信号) and the complementary non-stationarity parameter .
JP2006094522A
CLAIM 1
ノイズ低減による音声信号 (sound signal) 処理のためのシステムであって、 マイクロホン信号を検出するために少なくとも2つのマイクロホンを有するマイクロホンアレイと、 前処理信号を得るために該マイクロホン信号の時間遅延補償を達成するように構成される手段を有し、マイクロホン信号を受信するために該マイクロホンアレイに接続される前処理手段と、 少なくとも1つのノイズ参照信号、特に、各マイクロホン信号に対して1つのノイズ参照信号を生成するように構成される手段を有し、該前処理信号を受信するために該前処理手段に接続される第1の信号処理手段と、 ビームフォームされた信号を得るために適応するビームフォーマーを有し、該前処理信号を受信するための該前処理手段に接続される第2の信号処理手段と、 該少なくとも1つのノイズ参照信号に基づいて、該ビームフォームされた信号のノイズを低減するように構成される適応するノイズキャンセリング手段と を備える、システム。

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal (音声信号) using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
JP2006094522A
CLAIM 1
ノイズ低減による音声信号 (sound signal) 処理のためのシステムであって、 マイクロホン信号を検出するために少なくとも2つのマイクロホンを有するマイクロホンアレイと、 前処理信号を得るために該マイクロホン信号の時間遅延補償を達成するように構成される手段を有し、マイクロホン信号を受信するために該マイクロホンアレイに接続される前処理手段と、 少なくとも1つのノイズ参照信号、特に、各マイクロホン信号に対して1つのノイズ参照信号を生成するように構成される手段を有し、該前処理信号を受信するために該前処理手段に接続される第1の信号処理手段と、 ビームフォームされた信号を得るために適応するビームフォーマーを有し、該前処理信号を受信するための該前処理手段に接続される第2の信号処理手段と、 該少なくとも1つのノイズ参照信号に基づいて、該ビームフォームされた信号のノイズを低減するように構成される適応するノイズキャンセリング手段と を備える、システム。

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal (音声信号) using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
JP2006094522A
CLAIM 1
ノイズ低減による音声信号 (sound signal) 処理のためのシステムであって、 マイクロホン信号を検出するために少なくとも2つのマイクロホンを有するマイクロホンアレイと、 前処理信号を得るために該マイクロホン信号の時間遅延補償を達成するように構成される手段を有し、マイクロホン信号を受信するために該マイクロホンアレイに接続される前処理手段と、 少なくとも1つのノイズ参照信号、特に、各マイクロホン信号に対して1つのノイズ参照信号を生成するように構成される手段を有し、該前処理信号を受信するために該前処理手段に接続される第1の信号処理手段と、 ビームフォームされた信号を得るために適応するビームフォーマーを有し、該前処理信号を受信するための該前処理手段に接続される第2の信号処理手段と、 該少なくとも1つのノイズ参照信号に基づいて、該ビームフォームされた信号のノイズを低減するように構成される適応するノイズキャンセリング手段と を備える、システム。

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal (音声信号) in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
JP2006094522A
CLAIM 1
ノイズ低減による音声信号 (sound signal) 処理のためのシステムであって、 マイクロホン信号を検出するために少なくとも2つのマイクロホンを有するマイクロホンアレイと、 前処理信号を得るために該マイクロホン信号の時間遅延補償を達成するように構成される手段を有し、マイクロホン信号を受信するために該マイクロホンアレイに接続される前処理手段と、 少なくとも1つのノイズ参照信号、特に、各マイクロホン信号に対して1つのノイズ参照信号を生成するように構成される手段を有し、該前処理信号を受信するために該前処理手段に接続される第1の信号処理手段と、 ビームフォームされた信号を得るために適応するビームフォーマーを有し、該前処理信号を受信するための該前処理手段に接続される第2の信号処理手段と、 該少なくとも1つのノイズ参照信号に基づいて、該ビームフォームされた信号のノイズを低減するように構成される適応するノイズキャンセリング手段と を備える、システム。

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin (ロッキング) by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
JP2006094522A
CLAIM 2
前記少なくとも1つのノイズ参照信号を生成するように構成される手段が、非適応のブロッキング (frequency bin) マトリクスまたは適応のブロッキングマトリクスを備える、請求項1に記載のシステム。

US8990073B2
CLAIM 34
. A device as defined in claim 31 , further comprising a detector of strong tones in the sound signal (音声信号) .
JP2006094522A
CLAIM 1
ノイズ低減による音声信号 (sound signal) 処理のためのシステムであって、 マイクロホン信号を検出するために少なくとも2つのマイクロホンを有するマイクロホンアレイと、 前処理信号を得るために該マイクロホン信号の時間遅延補償を達成するように構成される手段を有し、マイクロホン信号を受信するために該マイクロホンアレイに接続される前処理手段と、 少なくとも1つのノイズ参照信号、特に、各マイクロホン信号に対して1つのノイズ参照信号を生成するように構成される手段を有し、該前処理信号を受信するために該前処理手段に接続される第1の信号処理手段と、 ビームフォームされた信号を得るために適応するビームフォーマーを有し、該前処理信号を受信するための該前処理手段に接続される第2の信号処理手段と、 該少なくとも1つのノイズ参照信号に基づいて、該ビームフォームされた信号のノイズを低減するように構成される適応するノイズキャンセリング手段と を備える、システム。

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal (音声信号) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
JP2006094522A
CLAIM 1
ノイズ低減による音声信号 (sound signal) 処理のためのシステムであって、 マイクロホン信号を検出するために少なくとも2つのマイクロホンを有するマイクロホンアレイと、 前処理信号を得るために該マイクロホン信号の時間遅延補償を達成するように構成される手段を有し、マイクロホン信号を受信するために該マイクロホンアレイに接続される前処理手段と、 少なくとも1つのノイズ参照信号、特に、各マイクロホン信号に対して1つのノイズ参照信号を生成するように構成される手段を有し、該前処理信号を受信するために該前処理手段に接続される第1の信号処理手段と、 ビームフォームされた信号を得るために適応するビームフォーマーを有し、該前処理信号を受信するための該前処理手段に接続される第2の信号処理手段と、 該少なくとも1つのノイズ参照信号に基づいて、該ビームフォームされた信号のノイズを低減するように構成される適応するノイズキャンセリング手段と を備える、システム。

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal (音声信号) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator (フレーム) of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
JP2006094522A
CLAIM 1
ノイズ低減による音声信号 (sound signal) 処理のためのシステムであって、 マイクロホン信号を検出するために少なくとも2つのマイクロホンを有するマイクロホンアレイと、 前処理信号を得るために該マイクロホン信号の時間遅延補償を達成するように構成される手段を有し、マイクロホン信号を受信するために該マイクロホンアレイに接続される前処理手段と、 少なくとも1つのノイズ参照信号、特に、各マイクロホン信号に対して1つのノイズ参照信号を生成するように構成される手段を有し、該前処理信号を受信するために該前処理手段に接続される第1の信号処理手段と、 ビームフォームされた信号を得るために適応するビームフォーマーを有し、該前処理信号を受信するための該前処理手段に接続される第2の信号処理手段と、 該少なくとも1つのノイズ参照信号に基づいて、該ビームフォームされた信号のノイズを低減するように構成される適応するノイズキャンセリング手段と を備える、システム。

JP2006094522A
CLAIM 10
フレーム (tonal stability tonal stability estimator) をさらに備え、前記マイクロホンアレイが、該フレーム中または該フレーム上に、所定の、特に、定位置に配置される、請求項1から請求項9のいずれか1項に記載のシステム。

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal (音声信号) for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates .
JP2006094522A
CLAIM 1
ノイズ低減による音声信号 (sound signal) 処理のためのシステムであって、 マイクロホン信号を検出するために少なくとも2つのマイクロホンを有するマイクロホンアレイと、 前処理信号を得るために該マイクロホン信号の時間遅延補償を達成するように構成される手段を有し、マイクロホン信号を受信するために該マイクロホンアレイに接続される前処理手段と、 少なくとも1つのノイズ参照信号、特に、各マイクロホン信号に対して1つのノイズ参照信号を生成するように構成される手段を有し、該前処理信号を受信するために該前処理手段に接続される第1の信号処理手段と、 ビームフォームされた信号を得るために適応するビームフォーマーを有し、該前処理信号を受信するための該前処理手段に接続される第2の信号処理手段と、 該少なくとも1つのノイズ参照信号に基づいて、該ビームフォームされた信号のノイズを低減するように構成される適応するノイズキャンセリング手段と を備える、システム。

US8990073B2
CLAIM 41
. A device as defined in claim 36 , further comprising a calculator of a spectral parameter used for detecting spectral changes and spectral attacks in the sound signal (音声信号) .
JP2006094522A
CLAIM 1
ノイズ低減による音声信号 (sound signal) 処理のためのシステムであって、 マイクロホン信号を検出するために少なくとも2つのマイクロホンを有するマイクロホンアレイと、 前処理信号を得るために該マイクロホン信号の時間遅延補償を達成するように構成される手段を有し、マイクロホン信号を受信するために該マイクロホンアレイに接続される前処理手段と、 少なくとも1つのノイズ参照信号、特に、各マイクロホン信号に対して1つのノイズ参照信号を生成するように構成される手段を有し、該前処理信号を受信するために該前処理手段に接続される第1の信号処理手段と、 ビームフォームされた信号を得るために適応するビームフォーマーを有し、該前処理信号を受信するための該前処理手段に接続される第2の信号処理手段と、 該少なくとも1つのノイズ参照信号に基づいて、該ビームフォームされた信号のノイズを低減するように構成される適応するノイズキャンセリング手段と を備える、システム。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
JP2006085176A

Filed: 2005-09-14     Issued: 2006-03-30

帯域制限オーディオ信号の帯域拡大

(Original Assignee) Harman Becker Automotive Systems Gmbh; ハーマン ベッカー オートモーティブ システムズ ゲーエムベーハー     

Bernd Iser, Gerhard Uwe Schmidt, ウーヴェ シュミット ゲルハルト, イーザー ベルント
US8990073B2
CLAIM 1
. A method for estimating a tonal stability (ノイズ) of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (ワーク) , and an initial value of the long term correlation map .
JP2006085176A
CLAIM 10
前記マッピング手段が、前記少なくとも1つの帯域制限パラメータと前記少なくとも1つの広帯域パラメータ間の相互関係を提供するコードブックおよび/または人工ニューラルネットワーク (current frame, average frame) を備える、請求項1から9のいずれか1項に記載のシステム。

JP2006085176A
CLAIM 11
前記オーディオ信号生成手段が、サイン波発生器またはサイン波発生器とノイズ (tonal stability, tonal sound, tonal stability tonal stability, tonal stability tonal stability estimation, tonal sound signal, SNR LT, tonal stability tonal stability parameter estimation, tonal stability tonal stability estimator) 発生器を備える、請求項1から10のいずれか1項に記載にシステム。

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame (ワーク) ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
JP2006085176A
CLAIM 10
前記マッピング手段が、前記少なくとも1つの帯域制限パラメータと前記少なくとも1つの広帯域パラメータ間の相互関係を提供するコードブックおよび/または人工ニューラルネットワーク (current frame, average frame) を備える、請求項1から9のいずれか1項に記載のシステム。

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability (ノイズ) tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
JP2006085176A
CLAIM 11
前記オーディオ信号生成手段が、サイン波発生器またはサイン波発生器とノイズ (tonal stability, tonal sound, tonal stability tonal stability, tonal stability tonal stability estimation, tonal sound signal, SNR LT, tonal stability tonal stability parameter estimation, tonal stability tonal stability estimator) 発生器を備える、請求項1から10のいずれか1項に記載にシステム。

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound (ノイズ) signal is detected .
JP2006085176A
CLAIM 11
前記オーディオ信号生成手段が、サイン波発生器またはサイン波発生器とノイズ (tonal stability, tonal sound, tonal stability tonal stability, tonal stability tonal stability estimation, tonal sound signal, SNR LT, tonal stability tonal stability parameter estimation, tonal stability tonal stability estimator) 発生器を備える、請求項1から10のいずれか1項に記載にシステム。

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision (の決定) based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
JP2006085176A
CLAIM 2
前記帯域制限パラメータが、帯域制限スペクトルエンベロープおよび/またはピッチおよび/または短時間出力および/またはハイバンド通過対ローバンド通過出力比および/または信号対雑音比決定の特徴パラメータを含み、 前記広帯域パラメータが、広帯域スペクトルエンベロープおよび/または広帯域スペクトルエンベロープのおよび/または広帯域励起信号の決定 (binary decision, update decision) ための特徴パラメータを含む、請求項1に記載のシステム。

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability (ノイズ) tonal stability of the sound signal prevents updating of noise energy estimates when a music signal is detected .
JP2006085176A
CLAIM 11
前記オーディオ信号生成手段が、サイン波発生器またはサイン波発生器とノイズ (tonal stability, tonal sound, tonal stability tonal stability, tonal stability tonal stability estimation, tonal sound signal, SNR LT, tonal stability tonal stability parameter estimation, tonal stability tonal stability estimator) 発生器を備える、請求項1から10のいずれか1項に記載にシステム。

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame (ワーク) energy and an average frame (ワーク) energy .
JP2006085176A
CLAIM 10
前記マッピング手段が、前記少なくとも1つの帯域制限パラメータと前記少なくとも1つの広帯域パラメータ間の相互関係を提供するコードブックおよび/または人工ニューラルネットワーク (current frame, average frame) を備える、請求項1から9のいずれか1項に記載のシステム。

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame (ワーク) and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
JP2006085176A
CLAIM 10
前記マッピング手段が、前記少なくとも1つの帯域制限パラメータと前記少なくとも1つの広帯域パラメータ間の相互関係を提供するコードブックおよび/または人工ニューラルネットワーク (current frame, average frame) を備える、請求項1から9のいずれか1項に記載のシステム。

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (特徴パラメータ) indicative of an activity of the sound signal .
JP2006085176A
CLAIM 2
前記帯域制限パラメータが、帯域制限スペクトルエンベロープおよび/またはピッチおよび/または短時間出力および/またはハイバンド通過対ローバンド通過出力比および/または信号対雑音比決定の特徴パラメータ (activity prediction parameter) を含み、 前記広帯域パラメータが、広帯域スペクトルエンベロープおよび/または広帯域スペクトルエンベロープのおよび/または広帯域励起信号の決定ための特徴パラメータを含む、請求項1に記載のシステム。

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (特徴パラメータ) comprises : calculating a long-term value of a binary decision (の決定) obtained from estimating the parameter related to the tonal stability (ノイズ) tonal stability of the sound signal and the complementary non-stationarity parameter .
JP2006085176A
CLAIM 2
前記帯域制限パラメータが、帯域制限スペクトルエンベロープおよび/またはピッチおよび/または短時間出力および/またはハイバンド通過対ローバンド通過出力比および/または信号対雑音比決定の特徴パラメータ (activity prediction parameter) を含み、 前記広帯域パラメータが、広帯域スペクトルエンベロープおよび/または広帯域スペクトルエンベロープのおよび/または広帯域励起信号の決定 (binary decision, update decision) ための特徴パラメータを含む、請求項1に記載のシステム。

JP2006085176A
CLAIM 11
前記オーディオ信号生成手段が、サイン波発生器またはサイン波発生器とノイズ (tonal stability, tonal sound, tonal stability tonal stability, tonal stability tonal stability estimation, tonal sound signal, SNR LT, tonal stability tonal stability parameter estimation, tonal stability tonal stability estimator) 発生器を備える、請求項1から10のいずれか1項に記載にシステム。

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (特徴パラメータ) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
JP2006085176A
CLAIM 2
前記帯域制限パラメータが、帯域制限スペクトルエンベロープおよび/またはピッチおよび/または短時間出力および/またはハイバンド通過対ローバンド通過出力比および/または信号対雑音比決定の特徴パラメータ (activity prediction parameter) を含み、 前記広帯域パラメータが、広帯域スペクトルエンベロープおよび/または広帯域スペクトルエンベロープのおよび/または広帯域励起信号の決定ための特徴パラメータを含む、請求項1に記載のシステム。

US8990073B2
CLAIM 30
. A device for estimating a tonal stability (ノイズ) tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (ワーク) , and an initial value of the long-term correlation map .
JP2006085176A
CLAIM 10
前記マッピング手段が、前記少なくとも1つの帯域制限パラメータと前記少なくとも1つの広帯域パラメータ間の相互関係を提供するコードブックおよび/または人工ニューラルネットワーク (current frame, average frame) を備える、請求項1から9のいずれか1項に記載のシステム。

JP2006085176A
CLAIM 11
前記オーディオ信号生成手段が、サイン波発生器またはサイン波発生器とノイズ (tonal stability, tonal sound, tonal stability tonal stability, tonal stability tonal stability estimation, tonal sound signal, SNR LT, tonal stability tonal stability parameter estimation, tonal stability tonal stability estimator) 発生器を備える、請求項1から10のいずれか1項に記載にシステム。

US8990073B2
CLAIM 31
. A device for estimating a tonal stability (ノイズ) tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (ワーク) , and an initial value of the long-term correlation map .
JP2006085176A
CLAIM 10
前記マッピング手段が、前記少なくとも1つの帯域制限パラメータと前記少なくとも1つの広帯域パラメータ間の相互関係を提供するコードブックおよび/または人工ニューラルネットワーク (current frame, average frame) を備える、請求項1から9のいずれか1項に記載のシステム。

JP2006085176A
CLAIM 11
前記オーディオ信号生成手段が、サイン波発生器またはサイン波発生器とノイズ (tonal stability, tonal sound, tonal stability tonal stability, tonal stability tonal stability estimation, tonal sound signal, SNR LT, tonal stability tonal stability parameter estimation, tonal stability tonal stability estimator) 発生器を備える、請求項1から10のいずれか1項に記載にシステム。

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame (ワーク) ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
JP2006085176A
CLAIM 10
前記マッピング手段が、前記少なくとも1つの帯域制限パラメータと前記少なくとも1つの広帯域パラメータ間の相互関係を提供するコードブックおよび/または人工ニューラルネットワーク (current frame, average frame) を備える、請求項1から9のいずれか1項に記載のシステム。

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability (ノイズ) tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
JP2006085176A
CLAIM 11
前記オーディオ信号生成手段が、サイン波発生器またはサイン波発生器とノイズ (tonal stability, tonal sound, tonal stability tonal stability, tonal stability tonal stability estimation, tonal sound signal, SNR LT, tonal stability tonal stability parameter estimation, tonal stability tonal stability estimator) 発生器を備える、請求項1から10のいずれか1項に記載にシステム。

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability (ノイズ) tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
JP2006085176A
CLAIM 11
前記オーディオ信号生成手段が、サイン波発生器またはサイン波発生器とノイズ (tonal stability, tonal sound, tonal stability tonal stability, tonal stability tonal stability estimation, tonal sound signal, SNR LT, tonal stability tonal stability parameter estimation, tonal stability tonal stability estimator) 発生器を備える、請求項1から10のいずれか1項に記載にシステム。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
JP2007065226A

Filed: 2005-08-31     Issued: 2007-03-15

ボーカル・フライ検出装置及びコンピュータプログラム

(Original Assignee) Advanced Telecommunication Research Institute International; 株式会社国際電気通信基礎技術研究所     

Norihiro Hagita, Hiroshi Ishiguro, Carlos Toshinori Ishii, カルロス・トシノリ・イシイ, 浩 石黒, 紀博 萩田
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (のフィルタ) , the correlation map of a current frame , and an initial value of the long term correlation map .
JP2007065226A
CLAIM 3
前記発話信号を前記第1のフレーム化手段及び前記第2のフレーム化手段に与えるに先立って、前記発話信号の所定の周波数帯域の成分以外の成分を除波するためのフィルタ (update factor) リング手段をさらに含む、請求項1又は請求項2に記載のボーカル・フライ検出装置。

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity in the sound signal further comprises using a signal-to-noise ratio (SNR)-based sound activity detection (検出手段, 検出装置) .
JP2007065226A
CLAIM 1
発話信号中のボーカル・フライ区間を検出するためのボーカル・フライ検出装置 (sound activity detection) であって、 発話信号を、第1のフレーム長でかつ第1のフレームシフト量の第1のフレームでフレーム化するための第1のフレーム化手段と、 前記第1のフレーム化手段の出力する一連の第1のフレームの各々のパワーのピークを検出するためのパワーピーク検出手段 (sound activity detection) と、 前記発話信号を、前記第1のフレーム長よりも大きな第2のフレーム長で、かつ前記第1のフレームシフト量よりも大きな第2のフレームシフト量の第2のフレームでフレーム化するための第2のフレーム化手段と、 前記第2のフレーム化手段の出力する一連の第2のフレームの各々の内部における周期性の有無を判定するための周期性判定手段と、 前記パワーピーク検出手段により検出されたパワーピークのうちで、前記周期性判定手段により周期性がないと判定された前記第2のフレーム内のパワーピークを選択するためのパワーピーク選択手段と、 前記パワーピーク選択手段により選択されたパワーピークの各々について、当該パワーピークを含む所定区間内の他のパワーピークとの間の相互相関が所定のしきい値よりも大きなパワーピークを探索し、前記発話信号中の、当該パワーピークを含む所定の区間をボーカル・フライ区間として検出するための手段とを含む、ボーカル・フライ検出装置

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection (検出手段, 検出装置) comprises detecting the sound signal based on a frequency dependent signal-to-noise ratio (SNR) .
JP2007065226A
CLAIM 1
発話信号中のボーカル・フライ区間を検出するためのボーカル・フライ検出装置 (sound activity detection) であって、 発話信号を、第1のフレーム長でかつ第1のフレームシフト量の第1のフレームでフレーム化するための第1のフレーム化手段と、 前記第1のフレーム化手段の出力する一連の第1のフレームの各々のパワーのピークを検出するためのパワーピーク検出手段 (sound activity detection) と、 前記発話信号を、前記第1のフレーム長よりも大きな第2のフレーム長で、かつ前記第1のフレームシフト量よりも大きな第2のフレームシフト量の第2のフレームでフレーム化するための第2のフレーム化手段と、 前記第2のフレーム化手段の出力する一連の第2のフレームの各々の内部における周期性の有無を判定するための周期性判定手段と、 前記パワーピーク検出手段により検出されたパワーピークのうちで、前記周期性判定手段により周期性がないと判定された前記第2のフレーム内のパワーピークを選択するためのパワーピーク選択手段と、 前記パワーピーク選択手段により選択されたパワーピークの各々について、当該パワーピークを含む所定区間内の他のパワーピークとの間の相互相関が所定のしきい値よりも大きなパワーピークを探索し、前記発話信号中の、当該パワーピークを含む所定の区間をボーカル・フライ区間として検出するための手段とを含む、ボーカル・フライ検出装置

US8990073B2
CLAIM 14
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection (検出手段, 検出装置) comprises comparing an average signal-to-noise ratio (SNR av ) to a threshold calculated as a function of a long-term signal-to-noise ratio (SNR LT ) .
JP2007065226A
CLAIM 1
発話信号中のボーカル・フライ区間を検出するためのボーカル・フライ検出装置 (sound activity detection) であって、 発話信号を、第1のフレーム長でかつ第1のフレームシフト量の第1のフレームでフレーム化するための第1のフレーム化手段と、 前記第1のフレーム化手段の出力する一連の第1のフレームの各々のパワーのピークを検出するためのパワーピーク検出手段 (sound activity detection) と、 前記発話信号を、前記第1のフレーム長よりも大きな第2のフレーム長で、かつ前記第1のフレームシフト量よりも大きな第2のフレームシフト量の第2のフレームでフレーム化するための第2のフレーム化手段と、 前記第2のフレーム化手段の出力する一連の第2のフレームの各々の内部における周期性の有無を判定するための周期性判定手段と、 前記パワーピーク検出手段により検出されたパワーピークのうちで、前記周期性判定手段により周期性がないと判定された前記第2のフレーム内のパワーピークを選択するためのパワーピーク選択手段と、 前記パワーピーク選択手段により選択されたパワーピークの各々について、当該パワーピークを含む所定区間内の他のパワーピークとの間の相互相関が所定のしきい値よりも大きなパワーピークを探索し、前記発話信号中の、当該パワーピークを含む所定の区間をボーカル・フライ区間として検出するための手段とを含む、ボーカル・フライ検出装置

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection (検出手段, 検出装置) in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
JP2007065226A
CLAIM 1
発話信号中のボーカル・フライ区間を検出するためのボーカル・フライ検出装置 (sound activity detection) であって、 発話信号を、第1のフレーム長でかつ第1のフレームシフト量の第1のフレームでフレーム化するための第1のフレーム化手段と、 前記第1のフレーム化手段の出力する一連の第1のフレームの各々のパワーのピークを検出するためのパワーピーク検出手段 (sound activity detection) と、 前記発話信号を、前記第1のフレーム長よりも大きな第2のフレーム長で、かつ前記第1のフレームシフト量よりも大きな第2のフレームシフト量の第2のフレームでフレーム化するための第2のフレーム化手段と、 前記第2のフレーム化手段の出力する一連の第2のフレームの各々の内部における周期性の有無を判定するための周期性判定手段と、 前記パワーピーク検出手段により検出されたパワーピークのうちで、前記周期性判定手段により周期性がないと判定された前記第2のフレーム内のパワーピークを選択するためのパワーピーク選択手段と、 前記パワーピーク選択手段により選択されたパワーピークの各々について、当該パワーピークを含む所定区間内の他のパワーピークとの間の相互相関が所定のしきい値よりも大きなパワーピークを探索し、前記発話信号中の、当該パワーピークを含む所定の区間をボーカル・フライ区間として検出するための手段とを含む、ボーカル・フライ検出装置

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection (検出手段, 検出装置) further comprises updating the noise estimates for a next frame .
JP2007065226A
CLAIM 1
発話信号中のボーカル・フライ区間を検出するためのボーカル・フライ検出装置 (sound activity detection) であって、 発話信号を、第1のフレーム長でかつ第1のフレームシフト量の第1のフレームでフレーム化するための第1のフレーム化手段と、 前記第1のフレーム化手段の出力する一連の第1のフレームの各々のパワーのピークを検出するためのパワーピーク検出手段 (sound activity detection) と、 前記発話信号を、前記第1のフレーム長よりも大きな第2のフレーム長で、かつ前記第1のフレームシフト量よりも大きな第2のフレームシフト量の第2のフレームでフレーム化するための第2のフレーム化手段と、 前記第2のフレーム化手段の出力する一連の第2のフレームの各々の内部における周期性の有無を判定するための周期性判定手段と、 前記パワーピーク検出手段により検出されたパワーピークのうちで、前記周期性判定手段により周期性がないと判定された前記第2のフレーム内のパワーピークを選択するためのパワーピーク選択手段と、 前記パワーピーク選択手段により選択されたパワーピークの各々について、当該パワーピークを含む所定区間内の他のパワーピークとの間の相互相関が所定のしきい値よりも大きなパワーピークを探索し、前記発話信号中の、当該パワーピークを含む所定の区間をボーカル・フライ区間として検出するための手段とを含む、ボーカル・フライ検出装置

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (のフィルタ) , the correlation map of a current frame , and an initial value of the long-term correlation map .
JP2007065226A
CLAIM 3
前記発話信号を前記第1のフレーム化手段及び前記第2のフレーム化手段に与えるに先立って、前記発話信号の所定の周波数帯域の成分以外の成分を除波するためのフィルタ (update factor) リング手段をさらに含む、請求項1又は請求項2に記載のボーカル・フライ検出装置。

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (のフィルタ) , the correlation map of a current frame , and an initial value of the long-term correlation map .
JP2007065226A
CLAIM 3
前記発話信号を前記第1のフレーム化手段及び前記第2のフレーム化手段に与えるに先立って、前記発話信号の所定の周波数帯域の成分以外の成分を除波するためのフィルタ (update factor) リング手段をさらに含む、請求項1又は請求項2に記載のボーカル・フライ検出装置。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20070050189A1

Filed: 2005-08-31     Issued: 2007-03-01

Method and apparatus for comfort noise generation in speech communication systems

(Original Assignee) Motorola Solutions Inc     (Current Assignee) Google Technology Holdings LLC

Edgardo Cruz-Zeno, James Ashley
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long term correlation map .
US20070050189A1
CLAIM 2
. The method according to claim 1 , wherein the step of estimating a background noise characteristic comprises : consecutively determining a current estimated background noise energy value for each of a plurality of frequency channels of a current frame (current frame) of the plurality of information frames from estimated background noise energy values for corresponding frequency channels of previous frames of the plurality of information frames and estimated channel energy values for corresponding frequency channels of the current frame .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame (current frame) ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US20070050189A1
CLAIM 2
. The method according to claim 1 , wherein the step of estimating a background noise characteristic comprises : consecutively determining a current estimated background noise energy value for each of a plurality of frequency channels of a current frame (current frame) of the plurality of information frames from estimated background noise energy values for corresponding frequency channels of previous frames of the plurality of information frames and estimated channel energy values for corresponding frequency channels of the current frame .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character (noise character) parameter in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
US20070050189A1
CLAIM 1
. A method for comfort noise generation in a speech communication system , comprising : receiving a plurality of information frames indicative of speech plus background noise ;
estimating one or more background noise character (noise character) istics based on the plurality of information frames ;
and generating a comfort noise signal based on the one or more background noise characteristics .

US20070050189A1
CLAIM 2
. The method according to claim 1 , wherein the step of estimating a background noise characteristic comprises : consecutively determining a current estimated background noise energy value for each of a plurality of frequency channels of a current frame of the plurality of information frames from estimated background noise energy values for corresponding frequency channels of previous frame (activity prediction parameter, noise character parameter) s of the plurality of information frames and estimated channel energy values for corresponding frequency channels of the current frame .

US20070050189A1
CLAIM 8
. The method according to claim 1 , further comprising : generating a speech signal (activity prediction parameter, noise character parameter) from the plurality of information frames ;
and generating an output signal by switching between the comfort noise signal and the speech signal based on a voice activity detection .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame (current frame) energy and an average frame energy .
US20070050189A1
CLAIM 2
. The method according to claim 1 , wherein the step of estimating a background noise characteristic comprises : consecutively determining a current estimated background noise energy value for each of a plurality of frequency channels of a current frame (current frame) of the plurality of information frames from estimated background noise energy values for corresponding frequency channels of previous frames of the plurality of information frames and estimated channel energy values for corresponding frequency channels of the current frame .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame (current frame) and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US20070050189A1
CLAIM 2
. The method according to claim 1 , wherein the step of estimating a background noise characteristic comprises : consecutively determining a current estimated background noise energy value for each of a plurality of frequency channels of a current frame (current frame) of the plurality of information frames from estimated background noise energy values for corresponding frequency channels of previous frames of the plurality of information frames and estimated channel energy values for corresponding frequency channels of the current frame .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (previous frame, speech signal) indicative of an activity of the sound signal .
US20070050189A1
CLAIM 2
. The method according to claim 1 , wherein the step of estimating a background noise characteristic comprises : consecutively determining a current estimated background noise energy value for each of a plurality of frequency channels of a current frame of the plurality of information frames from estimated background noise energy values for corresponding frequency channels of previous frame (activity prediction parameter, noise character parameter) s of the plurality of information frames and estimated channel energy values for corresponding frequency channels of the current frame .

US20070050189A1
CLAIM 8
. The method according to claim 1 , further comprising : generating a speech signal (activity prediction parameter, noise character parameter) from the plurality of information frames ;
and generating an output signal by switching between the comfort noise signal and the speech signal based on a voice activity detection .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (previous frame, speech signal) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
US20070050189A1
CLAIM 2
. The method according to claim 1 , wherein the step of estimating a background noise characteristic comprises : consecutively determining a current estimated background noise energy value for each of a plurality of frequency channels of a current frame of the plurality of information frames from estimated background noise energy values for corresponding frequency channels of previous frame (activity prediction parameter, noise character parameter) s of the plurality of information frames and estimated channel energy values for corresponding frequency channels of the current frame .

US20070050189A1
CLAIM 8
. The method according to claim 1 , further comprising : generating a speech signal (activity prediction parameter, noise character parameter) from the plurality of information frames ;
and generating an output signal by switching between the comfort noise signal and the speech signal based on a voice activity detection .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (previous frame, speech signal) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US20070050189A1
CLAIM 2
. The method according to claim 1 , wherein the step of estimating a background noise characteristic comprises : consecutively determining a current estimated background noise energy value for each of a plurality of frequency channels of a current frame of the plurality of information frames from estimated background noise energy values for corresponding frequency channels of previous frame (activity prediction parameter, noise character parameter) s of the plurality of information frames and estimated channel energy values for corresponding frequency channels of the current frame .

US20070050189A1
CLAIM 8
. The method according to claim 1 , further comprising : generating a speech signal (activity prediction parameter, noise character parameter) from the plurality of information frames ;
and generating an output signal by switching between the comfort noise signal and the speech signal based on a voice activity detection .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character (noise character) parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20070050189A1
CLAIM 1
. A method for comfort noise generation in a speech communication system , comprising : receiving a plurality of information frames indicative of speech plus background noise ;
estimating one or more background noise character (noise character) istics based on the plurality of information frames ;
and generating a comfort noise signal based on the one or more background noise characteristics .

US20070050189A1
CLAIM 2
. The method according to claim 1 , wherein the step of estimating a background noise characteristic comprises : consecutively determining a current estimated background noise energy value for each of a plurality of frequency channels of a current frame of the plurality of information frames from estimated background noise energy values for corresponding frequency channels of previous frame (activity prediction parameter, noise character parameter) s of the plurality of information frames and estimated channel energy values for corresponding frequency channels of the current frame .

US20070050189A1
CLAIM 8
. The method according to claim 1 , further comprising : generating a speech signal (activity prediction parameter, noise character parameter) from the plurality of information frames ;
and generating an output signal by switching between the comfort noise signal and the speech signal based on a voice activity detection .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character (noise character) parameter inferior than a given fixed threshold .
US20070050189A1
CLAIM 1
. A method for comfort noise generation in a speech communication system , comprising : receiving a plurality of information frames indicative of speech plus background noise ;
estimating one or more background noise character (noise character) istics based on the plurality of information frames ;
and generating a comfort noise signal based on the one or more background noise characteristics .

US20070050189A1
CLAIM 2
. The method according to claim 1 , wherein the step of estimating a background noise characteristic comprises : consecutively determining a current estimated background noise energy value for each of a plurality of frequency channels of a current frame of the plurality of information frames from estimated background noise energy values for corresponding frequency channels of previous frame (activity prediction parameter, noise character parameter) s of the plurality of information frames and estimated channel energy values for corresponding frequency channels of the current frame .

US20070050189A1
CLAIM 8
. The method according to claim 1 , further comprising : generating a speech signal (activity prediction parameter, noise character parameter) from the plurality of information frames ;
and generating an output signal by switching between the comfort noise signal and the speech signal based on a voice activity detection .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long-term correlation map .
US20070050189A1
CLAIM 2
. The method according to claim 1 , wherein the step of estimating a background noise characteristic comprises : consecutively determining a current estimated background noise energy value for each of a plurality of frequency channels of a current frame (current frame) of the plurality of information frames from estimated background noise energy values for corresponding frequency channels of previous frames of the plurality of information frames and estimated channel energy values for corresponding frequency channels of the current frame .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long-term correlation map .
US20070050189A1
CLAIM 2
. The method according to claim 1 , wherein the step of estimating a background noise characteristic comprises : consecutively determining a current estimated background noise energy value for each of a plurality of frequency channels of a current frame (current frame) of the plurality of information frames from estimated background noise energy values for corresponding frequency channels of previous frames of the plurality of information frames and estimated channel energy values for corresponding frequency channels of the current frame .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame (current frame) ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US20070050189A1
CLAIM 2
. The method according to claim 1 , wherein the step of estimating a background noise characteristic comprises : consecutively determining a current estimated background noise energy value for each of a plurality of frequency channels of a current frame (current frame) of the plurality of information frames from estimated background noise energy values for corresponding frequency channels of previous frames of the plurality of information frames and estimated channel energy values for corresponding frequency channels of the current frame .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character (noise character) of the sound signal for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates .
US20070050189A1
CLAIM 1
. A method for comfort noise generation in a speech communication system , comprising : receiving a plurality of information frames indicative of speech plus background noise ;
estimating one or more background noise character (noise character) istics based on the plurality of information frames ;
and generating a comfort noise signal based on the one or more background noise characteristics .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20060053007A1

Filed: 2005-08-29     Issued: 2006-03-09

Detection of voice activity in an audio signal

(Original Assignee) Nokia Oyj     (Current Assignee) Nokia Solutions and Networks Oy

Riitta Niemisto
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (frequency spectrum) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US20060053007A1
CLAIM 1
. A device comprising a voice activity detector for detecting voice activity in a speech signal using digital data formed on the basis of samples of an audio signal , the voice activity detector of the device comprising : a first element adapted to examine , whether the signal has a highpass nature ;
and a second element adapted to examine the frequency spectrum (frequency spectrum) of the signal ;
wherein the voice activity detector is adapted to provide an indication of speech when one of the following conditions is fulfilled : the first element has determined that the signal has a highpass nature ;
or the second element has determined that the signal does not have a flat frequency response .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (frequency spectrum) of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US20060053007A1
CLAIM 1
. A device comprising a voice activity detector for detecting voice activity in a speech signal using digital data formed on the basis of samples of an audio signal , the voice activity detector of the device comprising : a first element adapted to examine , whether the signal has a highpass nature ;
and a second element adapted to examine the frequency spectrum (frequency spectrum) of the signal ;
wherein the voice activity detector is adapted to provide an indication of speech when one of the following conditions is fulfilled : the first element has determined that the signal has a highpass nature ;
or the second element has determined that the signal does not have a flat frequency response .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (previous frame, speech signal) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
US20060053007A1
CLAIM 1
. A device comprising a voice activity detector for detecting voice activity in a speech signal (activity prediction parameter, noise character parameter) using digital data formed on the basis of samples of an audio signal , the voice activity detector of the device comprising : a first element adapted to examine , whether the signal has a highpass nature ;
and a second element adapted to examine the frequency spectrum of the signal ;
wherein the voice activity detector is adapted to provide an indication of speech when one of the following conditions is fulfilled : the first element has determined that the signal has a highpass nature ;
or the second element has determined that the signal does not have a flat frequency response .

US20060053007A1
CLAIM 6
. The device according to claim 1 , wherein the voice activity detector (6) is adapted to calculate a first order predictor A(z)=1−az −1 corresponding to a current and a previous frame (activity prediction parameter, noise character parameter) of the digital data , in which the predictor coefficient a is computed by a = ∑ x ⁡ (t) ⁢ x ⁡ (t - 1) ∑ x ⁡ (t) 2 .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (previous frame, speech signal) indicative of an activity of the sound signal .
US20060053007A1
CLAIM 1
. A device comprising a voice activity detector for detecting voice activity in a speech signal (activity prediction parameter, noise character parameter) using digital data formed on the basis of samples of an audio signal , the voice activity detector of the device comprising : a first element adapted to examine , whether the signal has a highpass nature ;
and a second element adapted to examine the frequency spectrum of the signal ;
wherein the voice activity detector is adapted to provide an indication of speech when one of the following conditions is fulfilled : the first element has determined that the signal has a highpass nature ;
or the second element has determined that the signal does not have a flat frequency response .

US20060053007A1
CLAIM 6
. The device according to claim 1 , wherein the voice activity detector (6) is adapted to calculate a first order predictor A(z)=1−az −1 corresponding to a current and a previous frame (activity prediction parameter, noise character parameter) of the digital data , in which the predictor coefficient a is computed by a = ∑ x ⁡ (t) ⁢ x ⁡ (t - 1) ∑ x ⁡ (t) 2 .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (previous frame, speech signal) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
US20060053007A1
CLAIM 1
. A device comprising a voice activity detector for detecting voice activity in a speech signal (activity prediction parameter, noise character parameter) using digital data formed on the basis of samples of an audio signal , the voice activity detector of the device comprising : a first element adapted to examine , whether the signal has a highpass nature ;
and a second element adapted to examine the frequency spectrum of the signal ;
wherein the voice activity detector is adapted to provide an indication of speech when one of the following conditions is fulfilled : the first element has determined that the signal has a highpass nature ;
or the second element has determined that the signal does not have a flat frequency response .

US20060053007A1
CLAIM 6
. The device according to claim 1 , wherein the voice activity detector (6) is adapted to calculate a first order predictor A(z)=1−az −1 corresponding to a current and a previous frame (activity prediction parameter, noise character parameter) of the digital data , in which the predictor coefficient a is computed by a = ∑ x ⁡ (t) ⁢ x ⁡ (t - 1) ∑ x ⁡ (t) 2 .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (previous frame, speech signal) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US20060053007A1
CLAIM 1
. A device comprising a voice activity detector for detecting voice activity in a speech signal (activity prediction parameter, noise character parameter) using digital data formed on the basis of samples of an audio signal , the voice activity detector of the device comprising : a first element adapted to examine , whether the signal has a highpass nature ;
and a second element adapted to examine the frequency spectrum of the signal ;
wherein the voice activity detector is adapted to provide an indication of speech when one of the following conditions is fulfilled : the first element has determined that the signal has a highpass nature ;
or the second element has determined that the signal does not have a flat frequency response .

US20060053007A1
CLAIM 6
. The device according to claim 1 , wherein the voice activity detector (6) is adapted to calculate a first order predictor A(z)=1−az −1 corresponding to a current and a previous frame (activity prediction parameter, noise character parameter) of the digital data , in which the predictor coefficient a is computed by a = ∑ x ⁡ (t) ⁢ x ⁡ (t - 1) ∑ x ⁡ (t) 2 .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (previous frame, speech signal) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20060053007A1
CLAIM 1
. A device comprising a voice activity detector for detecting voice activity in a speech signal (activity prediction parameter, noise character parameter) using digital data formed on the basis of samples of an audio signal , the voice activity detector of the device comprising : a first element adapted to examine , whether the signal has a highpass nature ;
and a second element adapted to examine the frequency spectrum of the signal ;
wherein the voice activity detector is adapted to provide an indication of speech when one of the following conditions is fulfilled : the first element has determined that the signal has a highpass nature ;
or the second element has determined that the signal does not have a flat frequency response .

US20060053007A1
CLAIM 6
. The device according to claim 1 , wherein the voice activity detector (6) is adapted to calculate a first order predictor A(z)=1−az −1 corresponding to a current and a previous frame (activity prediction parameter, noise character parameter) of the digital data , in which the predictor coefficient a is computed by a = ∑ x ⁡ (t) ⁢ x ⁡ (t - 1) ∑ x ⁡ (t) 2 .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (previous frame, speech signal) inferior than a given fixed threshold .
US20060053007A1
CLAIM 1
. A device comprising a voice activity detector for detecting voice activity in a speech signal (activity prediction parameter, noise character parameter) using digital data formed on the basis of samples of an audio signal , the voice activity detector of the device comprising : a first element adapted to examine , whether the signal has a highpass nature ;
and a second element adapted to examine the frequency spectrum of the signal ;
wherein the voice activity detector is adapted to provide an indication of speech when one of the following conditions is fulfilled : the first element has determined that the signal has a highpass nature ;
or the second element has determined that the signal does not have a flat frequency response .

US20060053007A1
CLAIM 6
. The device according to claim 1 , wherein the voice activity detector (6) is adapted to calculate a first order predictor A(z)=1−az −1 corresponding to a current and a previous frame (activity prediction parameter, noise character parameter) of the digital data , in which the predictor coefficient a is computed by a = ∑ x ⁡ (t) ⁢ x ⁡ (t - 1) ∑ x ⁡ (t) 2 .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (frequency spectrum) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20060053007A1
CLAIM 1
. A device comprising a voice activity detector for detecting voice activity in a speech signal using digital data formed on the basis of samples of an audio signal , the voice activity detector of the device comprising : a first element adapted to examine , whether the signal has a highpass nature ;
and a second element adapted to examine the frequency spectrum (frequency spectrum) of the signal ;
wherein the voice activity detector is adapted to provide an indication of speech when one of the following conditions is fulfilled : the first element has determined that the signal has a highpass nature ;
or the second element has determined that the signal does not have a flat frequency response .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (frequency spectrum) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20060053007A1
CLAIM 1
. A device comprising a voice activity detector for detecting voice activity in a speech signal using digital data formed on the basis of samples of an audio signal , the voice activity detector of the device comprising : a first element adapted to examine , whether the signal has a highpass nature ;
and a second element adapted to examine the frequency spectrum (frequency spectrum) of the signal ;
wherein the voice activity detector is adapted to provide an indication of speech when one of the following conditions is fulfilled : the first element has determined that the signal has a highpass nature ;
or the second element has determined that the signal does not have a flat frequency response .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (frequency spectrum) of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US20060053007A1
CLAIM 1
. A device comprising a voice activity detector for detecting voice activity in a speech signal using digital data formed on the basis of samples of an audio signal , the voice activity detector of the device comprising : a first element adapted to examine , whether the signal has a highpass nature ;
and a second element adapted to examine the frequency spectrum (frequency spectrum) of the signal ;
wherein the voice activity detector is adapted to provide an indication of speech when one of the following conditions is fulfilled : the first element has determined that the signal has a highpass nature ;
or the second element has determined that the signal does not have a flat frequency response .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20050278174A1

Filed: 2005-07-20     Issued: 2005-12-15

Audio coder

(Original Assignee) Fujitsu Ltd     (Current Assignee) Fujitsu Ltd

Hitoshi Sasaki, Yasuji Ota
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (reproduced signals) using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals (sound signal, sound signal prevents updating) by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal (reproduced signals) in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals (sound signal, sound signal prevents updating) by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (sampled values) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values (frequency bins, frequency bands, first frequency bands) and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (sampled values) so as to produce a summed long-term correlation map .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values (frequency bins, frequency bands, first frequency bands) and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (reproduced signals) .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals (sound signal, sound signal prevents updating) by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (reproduced signals) comprises searching in the correlation map for frequency bins (sampled values) having a magnitude that exceeds a given fixed threshold .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals (sound signal, sound signal prevents updating) by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values (frequency bins, frequency bands, first frequency bands) and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (reproduced signals) comprises comparing the summed long-term correlation map with an adaptive threshold indicative of sound activity in the sound signal .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals (sound signal, sound signal prevents updating) by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal (reproduced signals) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals (sound signal, sound signal prevents updating) by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound signal (reproduced signals) is detected .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals (sound signal, sound signal prevents updating) by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity in the sound signal (reproduced signals) further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals (sound signal, sound signal prevents updating) by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection comprises detecting the sound signal (reproduced signals) based on a frequency dependent signal-to-noise ratio (SNR) .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals (sound signal, sound signal prevents updating) by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal (reproduced signals) further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals (sound signal, sound signal prevents updating) by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal (reproduced signals) and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals (sound signal, sound signal prevents updating) by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .

US8990073B2
CLAIM 18
. A method as defined in claim 14 , comprising classifying the sound signal (reproduced signals) as one of an inactive sound signal and active sound signal , which comprises determining an inactive sound signal when the average signal-to-noise ratio (SNR av ) is inferior to the calculated threshold .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals (sound signal, sound signal prevents updating) by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .

US8990073B2
CLAIM 19
. A method as defined in claim 14 , comprising classifying the sound signal (reproduced signals) as one of an inactive sound signal and active sound signal , which comprises determining an active sound signal when the average signal-to-noise ratio (SNR av ) is larger than the calculated threshold .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals (sound signal, sound signal prevents updating) by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal (reproduced signals) prevents updating of noise energy estimates when a music signal is detected .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals (sound signal, sound signal prevents updating) by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal (reproduced signals) in a current frame and an energy of the sound signal in a previous frame , for frequency bands (sampled values) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals (sound signal, sound signal prevents updating) by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values (frequency bins, frequency bands, first frequency bands) and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter indicative of an activity of the sound signal (reproduced signals) .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals (sound signal, sound signal prevents updating) by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal (reproduced signals) and the complementary non-stationarity parameter .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals (sound signal, sound signal prevents updating) by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (sampled values) into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values (frequency bins, frequency bands, first frequency bands) and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal (reproduced signals) using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals (sound signal, sound signal prevents updating) by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal (reproduced signals) using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals (sound signal, sound signal prevents updating) by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal (reproduced signals) in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals (sound signal, sound signal prevents updating) by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (sampled values) so as to produce a summed long-term correlation map .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values (frequency bins, frequency bands, first frequency bands) and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .

US8990073B2
CLAIM 34
. A device as defined in claim 31 , further comprising a detector of strong tones in the sound signal (reproduced signals) .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals (sound signal, sound signal prevents updating) by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal (reproduced signals) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals (sound signal, sound signal prevents updating) by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal (reproduced signals) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals (sound signal, sound signal prevents updating) by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal (reproduced signals) for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals (sound signal, sound signal prevents updating) by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .

US8990073B2
CLAIM 41
. A device as defined in claim 36 , further comprising a calculator of a spectral parameter used for detecting spectral changes and spectral attacks in the sound signal (reproduced signals) .
US20050278174A1
CLAIM 1
. An audio coder for coding an audio signal , the coder comprising : a candidate code storage section for storing , at the time of determining a code corresponding to a sampled value of the audio signal , a plurality of combinations of candidate codes in a neighborhood interval of the sampled value ;
a decoded signal generation section for generating reproduced signals (sound signal, sound signal prevents updating) by decoding the codes stored in the candidate code storage section ;
and an error evaluation section for calculating , for each candidate code , a sum of squares of differentials between input sampled values and reproduced signals , detecting a combination of candidate codes by which a smallest sum is obtained , that is to say , which minimizes a quantization error , and outputting a code included in the detected combination of candidate codes .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20060171419A1

Filed: 2005-05-05     Issued: 2006-08-03

Method for discontinuous transmission and accurate reproduction of background noise information

(Original Assignee) Qualcomm Inc     (Current Assignee) Qualcomm Inc

Serafin Spindola, Peter Black, Rohit Kapoor
US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (packet format) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US20060171419A1
CLAIM 30
. The apparatus for communicating background noise , according to claim 28 , wherein said encoder comprises : signal processor having at least one input and at least one output ;
a model estimator having at least one input and at least one output , wherein said at least one input is operably connected to said at least one output of said signal processor ;
a rate determinator having at least one input and at least one output , wherein said at least one input is operably connected to a first of said at least one outputs of said model parameter estimator ;
a ⅛ rate encoder having at least one input and at least one output ;
a full rate encoder having at least one input and at least one output ;
a first switch having at least one input and at least one output , wherein said at least one input is operably connected to said at least one output of said model parameter estimator , a first of said at least one outputs is operably connected to said at least one input of said ⅛ rate encoder and a second of said at least one outputs is operably connected to said at least one input of said full rate encoder ;
a second switch having at least one input and at least one output , wherein a first of said at least one inputs is operably connected to said at least one output of said ⅛ rate encoder and a second of said at least one inputs is operably connected to said at least one output of said full rate encoder ;
and a packet format (frequency bins) ter having at least one input and at least one output , wherein said at least one input is operably connected to said at least one output of said second switch .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (packet format) so as to produce a summed long-term correlation map .
US20060171419A1
CLAIM 30
. The apparatus for communicating background noise , according to claim 28 , wherein said encoder comprises : signal processor having at least one input and at least one output ;
a model estimator having at least one input and at least one output , wherein said at least one input is operably connected to said at least one output of said signal processor ;
a rate determinator having at least one input and at least one output , wherein said at least one input is operably connected to a first of said at least one outputs of said model parameter estimator ;
a ⅛ rate encoder having at least one input and at least one output ;
a full rate encoder having at least one input and at least one output ;
a first switch having at least one input and at least one output , wherein said at least one input is operably connected to said at least one output of said model parameter estimator , a first of said at least one outputs is operably connected to said at least one input of said ⅛ rate encoder and a second of said at least one outputs is operably connected to said at least one input of said full rate encoder ;
a second switch having at least one input and at least one output , wherein a first of said at least one inputs is operably connected to said at least one output of said ⅛ rate encoder and a second of said at least one inputs is operably connected to said at least one output of said full rate encoder ;
and a packet format (frequency bins) ter having at least one input and at least one output , wherein said at least one input is operably connected to said at least one output of said second switch .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map for frequency bins (packet format) having a magnitude that exceeds a given fixed threshold .
US20060171419A1
CLAIM 30
. The apparatus for communicating background noise , according to claim 28 , wherein said encoder comprises : signal processor having at least one input and at least one output ;
a model estimator having at least one input and at least one output , wherein said at least one input is operably connected to said at least one output of said signal processor ;
a rate determinator having at least one input and at least one output , wherein said at least one input is operably connected to a first of said at least one outputs of said model parameter estimator ;
a ⅛ rate encoder having at least one input and at least one output ;
a full rate encoder having at least one input and at least one output ;
a first switch having at least one input and at least one output , wherein said at least one input is operably connected to said at least one output of said model parameter estimator , a first of said at least one outputs is operably connected to said at least one input of said ⅛ rate encoder and a second of said at least one outputs is operably connected to said at least one input of said full rate encoder ;
a second switch having at least one input and at least one output , wherein a first of said at least one inputs is operably connected to said at least one output of said ⅛ rate encoder and a second of said at least one inputs is operably connected to said at least one output of said full rate encoder ;
and a packet format (frequency bins) ter having at least one input and at least one output , wherein said at least one input is operably connected to said at least one output of said second switch .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection further comprises updating the noise estimates (said model) for a next frame .
US20060171419A1
CLAIM 30
. The apparatus for communicating background noise , according to claim 28 , wherein said encoder comprises : signal processor having at least one input and at least one output ;
a model estimator having at least one input and at least one output , wherein said at least one input is operably connected to said at least one output of said signal processor ;
a rate determinator having at least one input and at least one output , wherein said at least one input is operably connected to a first of said at least one outputs of said model (noise estimates, noise estimator) parameter estimator ;
a ⅛ rate encoder having at least one input and at least one output ;
a full rate encoder having at least one input and at least one output ;
a first switch having at least one input and at least one output , wherein said at least one input is operably connected to said at least one output of said model parameter estimator , a first of said at least one outputs is operably connected to said at least one input of said ⅛ rate encoder and a second of said at least one outputs is operably connected to said at least one input of said full rate encoder ;
a second switch having at least one input and at least one output , wherein a first of said at least one inputs is operably connected to said at least one output of said ⅛ rate encoder and a second of said at least one inputs is operably connected to said at least one output of said full rate encoder ;
and a packet formatter having at least one input and at least one output , wherein said at least one input is operably connected to said at least one output of said second switch .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group (signal processor) of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20060171419A1
CLAIM 30
. The apparatus for communicating background noise , according to claim 28 , wherein said encoder comprises : signal processor (second group, sound activity detector) having at least one input and at least one output ;
a model estimator having at least one input and at least one output , wherein said at least one input is operably connected to said at least one output of said signal processor ;
a rate determinator having at least one input and at least one output , wherein said at least one input is operably connected to a first of said at least one outputs of said model parameter estimator ;
a ⅛ rate encoder having at least one input and at least one output ;
a full rate encoder having at least one input and at least one output ;
a first switch having at least one input and at least one output , wherein said at least one input is operably connected to said at least one output of said model parameter estimator , a first of said at least one outputs is operably connected to said at least one input of said ⅛ rate encoder and a second of said at least one outputs is operably connected to said at least one input of said full rate encoder ;
a second switch having at least one input and at least one output , wherein a first of said at least one inputs is operably connected to said at least one output of said ⅛ rate encoder and a second of said at least one inputs is operably connected to said at least one output of said full rate encoder ;
and a packet formatter having at least one input and at least one output , wherein said at least one input is operably connected to said at least one output of said second switch .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (packet format) so as to produce a summed long-term correlation map .
US20060171419A1
CLAIM 30
. The apparatus for communicating background noise , according to claim 28 , wherein said encoder comprises : signal processor having at least one input and at least one output ;
a model estimator having at least one input and at least one output , wherein said at least one input is operably connected to said at least one output of said signal processor ;
a rate determinator having at least one input and at least one output , wherein said at least one input is operably connected to a first of said at least one outputs of said model parameter estimator ;
a ⅛ rate encoder having at least one input and at least one output ;
a full rate encoder having at least one input and at least one output ;
a first switch having at least one input and at least one output , wherein said at least one input is operably connected to said at least one output of said model parameter estimator , a first of said at least one outputs is operably connected to said at least one input of said ⅛ rate encoder and a second of said at least one outputs is operably connected to said at least one input of said full rate encoder ;
a second switch having at least one input and at least one output , wherein a first of said at least one inputs is operably connected to said at least one output of said ⅛ rate encoder and a second of said at least one inputs is operably connected to said at least one output of said full rate encoder ;
and a packet format (frequency bins) ter having at least one input and at least one output , wherein said at least one input is operably connected to said at least one output of said second switch .

US8990073B2
CLAIM 37
. A device as defined in claim 36 , further comprising a signal-to-noise ratio (SNR)-based sound activity detector (signal processor) .
US20060171419A1
CLAIM 30
. The apparatus for communicating background noise , according to claim 28 , wherein said encoder comprises : signal processor (second group, sound activity detector) having at least one input and at least one output ;
a model estimator having at least one input and at least one output , wherein said at least one input is operably connected to said at least one output of said signal processor ;
a rate determinator having at least one input and at least one output , wherein said at least one input is operably connected to a first of said at least one outputs of said model parameter estimator ;
a ⅛ rate encoder having at least one input and at least one output ;
a full rate encoder having at least one input and at least one output ;
a first switch having at least one input and at least one output , wherein said at least one input is operably connected to said at least one output of said model parameter estimator , a first of said at least one outputs is operably connected to said at least one input of said ⅛ rate encoder and a second of said at least one outputs is operably connected to said at least one input of said full rate encoder ;
a second switch having at least one input and at least one output , wherein a first of said at least one inputs is operably connected to said at least one output of said ⅛ rate encoder and a second of said at least one inputs is operably connected to said at least one output of said full rate encoder ;
and a packet formatter having at least one input and at least one output , wherein said at least one input is operably connected to said at least one output of said second switch .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector (signal processor) comprises a comparator of an average signal to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US20060171419A1
CLAIM 30
. The apparatus for communicating background noise , according to claim 28 , wherein said encoder comprises : signal processor (second group, sound activity detector) having at least one input and at least one output ;
a model estimator having at least one input and at least one output , wherein said at least one input is operably connected to said at least one output of said signal processor ;
a rate determinator having at least one input and at least one output , wherein said at least one input is operably connected to a first of said at least one outputs of said model parameter estimator ;
a ⅛ rate encoder having at least one input and at least one output ;
a full rate encoder having at least one input and at least one output ;
a first switch having at least one input and at least one output , wherein said at least one input is operably connected to said at least one output of said model parameter estimator , a first of said at least one outputs is operably connected to said at least one input of said ⅛ rate encoder and a second of said at least one outputs is operably connected to said at least one input of said full rate encoder ;
a second switch having at least one input and at least one output , wherein a first of said at least one inputs is operably connected to said at least one output of said ⅛ rate encoder and a second of said at least one inputs is operably connected to said at least one output of said full rate encoder ;
and a packet formatter having at least one input and at least one output , wherein said at least one input is operably connected to said at least one output of said second switch .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator (said model) for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector (signal processor) .
US20060171419A1
CLAIM 30
. The apparatus for communicating background noise , according to claim 28 , wherein said encoder comprises : signal processor (second group, sound activity detector) having at least one input and at least one output ;
a model estimator having at least one input and at least one output , wherein said at least one input is operably connected to said at least one output of said signal processor ;
a rate determinator having at least one input and at least one output , wherein said at least one input is operably connected to a first of said at least one outputs of said model (noise estimates, noise estimator) parameter estimator ;
a ⅛ rate encoder having at least one input and at least one output ;
a full rate encoder having at least one input and at least one output ;
a first switch having at least one input and at least one output , wherein said at least one input is operably connected to said at least one output of said model parameter estimator , a first of said at least one outputs is operably connected to said at least one input of said ⅛ rate encoder and a second of said at least one outputs is operably connected to said at least one input of said full rate encoder ;
a second switch having at least one input and at least one output , wherein a first of said at least one inputs is operably connected to said at least one output of said ⅛ rate encoder and a second of said at least one inputs is operably connected to said at least one output of said full rate encoder ;
and a packet formatter having at least one input and at least one output , wherein said at least one input is operably connected to said at least one output of said second switch .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US7216074B2

Filed: 2005-04-25     Issued: 2007-05-08

System for bandwidth extension of narrow-band speech

(Original Assignee) AT&T Corp     (Current Assignee) Nuance Communications Inc

David Malah, Richard Vandervoort Cox
US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis (sampling rate) ;

and summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US7216074B2
CLAIM 4
. The system of claim 1 , wherein the module that generates the second signal further generates the second signal by combining the second signal with the first signal interpolated to a second signal sampling rate (frequency bin basis) .

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (first coefficient) ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US7216074B2
CLAIM 10
. The computer-readable medium of claim 9 , wherein first coefficient (background noise signal) s are narrowband coefficient sand the second area coefficients are wideband area coefficients .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal (first coefficient) and prevent update of noise energy estimates on the music signal .
US7216074B2
CLAIM 10
. The computer-readable medium of claim 9 , wherein first coefficient (background noise signal) s are narrowband coefficient sand the second area coefficients are wideband area coefficients .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame and an energy of the sound signal in a previous frame , for frequency bands (band signal) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US7216074B2
CLAIM 2
. The system of claim 1 , wherein the first signal is a narrowband signal (frequency bands, first frequency bands) and second signal is a wideband signal .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (band signal) into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US7216074B2
CLAIM 2
. The system of claim 1 , wherein the first signal is a narrowband signal (frequency bands, first frequency bands) and second signal is a wideband signal .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis (sampling rate) ;

and an adder for summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US7216074B2
CLAIM 4
. The system of claim 1 , wherein the module that generates the second signal further generates the second signal by combining the second signal with the first signal interpolated to a second signal sampling rate (frequency bin basis) .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (first coefficient) ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US7216074B2
CLAIM 10
. The computer-readable medium of claim 9 , wherein first coefficient (background noise signal) s are narrowband coefficient sand the second area coefficients are wideband area coefficients .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal (first coefficient) ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US7216074B2
CLAIM 10
. The computer-readable medium of claim 9 , wherein first coefficient (background noise signal) s are narrowband coefficient sand the second area coefficients are wideband area coefficients .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal (first coefficient) and preventing update of noise energy estimates .
US7216074B2
CLAIM 10
. The computer-readable medium of claim 9 , wherein first coefficient (background noise signal) s are narrowband coefficient sand the second area coefficients are wideband area coefficients .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20060241937A1

Filed: 2005-04-21     Issued: 2006-10-26

Method and apparatus for automatically discriminating information bearing audio segments and background noise audio segments

(Original Assignee) Motorola Solutions Inc     (Current Assignee) Motorola Solutions Inc

Changxue Ma
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (joint time) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US20060241937A1
CLAIM 7
. The method according to claim 1 further comprising : performing joint time (frequency spectrum) frequency analysis on the series of samples to compute a plurality of time-frequency magnitudes that includes magnitudes corresponding to different times and magnitudes corresponding to different frequencies ;
computing a variance of the time-frequency magnitudes ;
and inputting the variance of the time-frequency magnitudes to the decision function .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (joint time) of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US20060241937A1
CLAIM 7
. The method according to claim 1 further comprising : performing joint time (frequency spectrum) frequency analysis on the series of samples to compute a plurality of time-frequency magnitudes that includes magnitudes corresponding to different times and magnitudes corresponding to different frequencies ;
computing a variance of the time-frequency magnitudes ;
and inputting the variance of the time-frequency magnitudes to the decision function .

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (frequency analysis) from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US20060241937A1
CLAIM 7
. The method according to claim 1 further comprising : performing joint time frequency analysis (music signal) on the series of samples to compute a plurality of time-frequency magnitudes that includes magnitudes corresponding to different times and magnitudes corresponding to different frequencies ;
computing a variance of the time-frequency magnitudes ;
and inputting the variance of the time-frequency magnitudes to the decision function .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal prevents updating of noise energy estimates when a music signal (frequency analysis) is detected .
US20060241937A1
CLAIM 7
. The method according to claim 1 further comprising : performing joint time frequency analysis (music signal) on the series of samples to compute a plurality of time-frequency magnitudes that includes magnitudes corresponding to different times and magnitudes corresponding to different frequencies ;
computing a variance of the time-frequency magnitudes ;
and inputting the variance of the time-frequency magnitudes to the decision function .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal (frequency analysis) from a background noise signal and prevent update of noise energy estimates on the music signal .
US20060241937A1
CLAIM 7
. The method according to claim 1 further comprising : performing joint time frequency analysis (music signal) on the series of samples to compute a plurality of time-frequency magnitudes that includes magnitudes corresponding to different times and magnitudes corresponding to different frequencies ;
computing a variance of the time-frequency magnitudes ;
and inputting the variance of the time-frequency magnitudes to the decision function .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame and an energy of the sound signal in a previous frame , for frequency bands (frequency bands) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US20060241937A1
CLAIM 5
. The method according to claim 1 further comprising : processing the series of samples to obtain a plurality of measurements of the magnitude corresponding to a plurality of frequency bands (frequency bands) ;
computing a variance of the plurality of measurements of magnitude ;
and inputting the variance of the plurality of measurements of magnitude to the decision function .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (frequency bands) into a first group of a certain number of first frequency (different times) bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20060241937A1
CLAIM 5
. The method according to claim 1 further comprising : processing the series of samples to obtain a plurality of measurements of the magnitude corresponding to a plurality of frequency bands (frequency bands) ;
computing a variance of the plurality of measurements of magnitude ;
and inputting the variance of the plurality of measurements of magnitude to the decision function .

US20060241937A1
CLAIM 7
. The method according to claim 1 further comprising : performing joint time frequency analysis on the series of samples to compute a plurality of time-frequency magnitudes that includes magnitudes corresponding to different times (first frequency) and magnitudes corresponding to different frequencies ;
computing a variance of the time-frequency magnitudes ;
and inputting the variance of the time-frequency magnitudes to the decision function .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (joint time) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20060241937A1
CLAIM 7
. The method according to claim 1 further comprising : performing joint time (frequency spectrum) frequency analysis on the series of samples to compute a plurality of time-frequency magnitudes that includes magnitudes corresponding to different times and magnitudes corresponding to different frequencies ;
computing a variance of the time-frequency magnitudes ;
and inputting the variance of the time-frequency magnitudes to the decision function .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (joint time) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20060241937A1
CLAIM 7
. The method according to claim 1 further comprising : performing joint time (frequency spectrum) frequency analysis on the series of samples to compute a plurality of time-frequency magnitudes that includes magnitudes corresponding to different times and magnitudes corresponding to different frequencies ;
computing a variance of the time-frequency magnitudes ;
and inputting the variance of the time-frequency magnitudes to the decision function .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (joint time) of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US20060241937A1
CLAIM 7
. The method according to claim 1 further comprising : performing joint time (frequency spectrum) frequency analysis on the series of samples to compute a plurality of time-frequency magnitudes that includes magnitudes corresponding to different times and magnitudes corresponding to different frequencies ;
computing a variance of the time-frequency magnitudes ;
and inputting the variance of the time-frequency magnitudes to the decision function .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (frequency analysis) from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US20060241937A1
CLAIM 7
. The method according to claim 1 further comprising : performing joint time frequency analysis (music signal) on the series of samples to compute a plurality of time-frequency magnitudes that includes magnitudes corresponding to different times and magnitudes corresponding to different frequencies ;
computing a variance of the time-frequency magnitudes ;
and inputting the variance of the time-frequency magnitudes to the decision function .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal (frequency analysis) from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US20060241937A1
CLAIM 7
. The method according to claim 1 further comprising : performing joint time frequency analysis (music signal) on the series of samples to compute a plurality of time-frequency magnitudes that includes magnitudes corresponding to different times and magnitudes corresponding to different frequencies ;
computing a variance of the time-frequency magnitudes ;
and inputting the variance of the time-frequency magnitudes to the decision function .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal (frequency analysis) from a background noise signal and preventing update of noise energy estimates .
US20060241937A1
CLAIM 7
. The method according to claim 1 further comprising : performing joint time frequency analysis (music signal) on the series of samples to compute a plurality of time-frequency magnitudes that includes magnitudes corresponding to different times and magnitudes corresponding to different frequencies ;
computing a variance of the time-frequency magnitudes ;
and inputting the variance of the time-frequency magnitudes to the decision function .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20050246164A1

Filed: 2005-04-15     Issued: 2005-11-03

Coding of audio signals

(Original Assignee) Nokia Oyj     (Current Assignee) Nokia Oyj

Pasi Ojala, Jari Makinen, Ari Lakaniemi
US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (said time) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
US20050246164A1
CLAIM 5
. The encoder according to claim 4 , wherein the value defined for said time (noise character parameter) parameter is 320 ms .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (said time) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency (time t) bands and a second group (steps a) of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20050246164A1
CLAIM 4
. The encoder according to claim 1 , wherein a time parameter is defined indicative of the length of the time t (first frequency) he mode change lasts .

US20050246164A1
CLAIM 5
. The encoder according to claim 4 , wherein the value defined for said time (noise character parameter) parameter is 320 ms .

US20050246164A1
CLAIM 6
. The encoder according to claim 4 , wherein a step value is defined indicative of how large steps a (second group) re to be used at the gradual change of the encoding properties .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (said time) inferior than a given fixed threshold .
US20050246164A1
CLAIM 5
. The encoder according to claim 4 , wherein the value defined for said time (noise character parameter) parameter is 320 ms .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20060098809A1

Filed: 2005-04-08     Issued: 2006-05-11

Periodic signal enhancement system

(Original Assignee) QNX Software Systems Wavemakers Inc     (Current Assignee) 2236008 Ontario Inc ; 8758271 Canada Inc

Rajeev Nongpiur, David Giesbrecht, Phillip Hetherington
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum (adaptive filter coefficient) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (first stage) of the long term correlation map .
US20060098809A1
CLAIM 9
. The signal enhancement system of claim 1 , further comprising a first stage (initial value) filter coupled between the signal input and the delay logic , the first stage filter comprising quasi-stationary frequency tracking and attenuation logic .

US20060098809A1
CLAIM 27
. The pitch detector of claim 26 , where the pitch detection logic is operable to determine a pitch estimate according to : f a = f s (c + Δ F0 ⁢   ⁢ MAX) where f a is the pitch estimate , f s is a sampling frequency , c is an index of a peak in the adaptive filter coefficient (current residual spectrum) s , and Δ F0MAX is a maximum pitch period expressed in terms of samples .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum (adaptive filter coefficient) comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US20060098809A1
CLAIM 27
. The pitch detector of claim 26 , where the pitch detection logic is operable to determine a pitch estimate according to : f a = f s (c + Δ F0 ⁢   ⁢ MAX) where f a is the pitch estimate , f s is a sampling frequency , c is an index of a peak in the adaptive filter coefficient (current residual spectrum) s , and Δ F0MAX is a maximum pitch period expressed in terms of samples .

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum (adaptive filter coefficient) comprises locating a maximum between each pair of two consecutive minima (signal enhancement) of the current residual spectrum .
US20060098809A1
CLAIM 1
. A signal enhancement (consecutive minima, two consecutive minima) system comprising : a signal input ;
partitioned delay logic coupled to the signal input ;
a partitioned adaptive filter coupled to the partitioned delay logic and comprising multiple adaptive filter outputs ;
gain logic coupled to an the adaptive filter output ;
filter reinforcement logic coupled after the gain logic ;
and signal reinforcement logic coupled to the signal input and the gain logic and comprising an enhanced signal output .

US20060098809A1
CLAIM 27
. The pitch detector of claim 26 , where the pitch detection logic is operable to determine a pitch estimate according to : f a = f s (c + Δ F0 ⁢   ⁢ MAX) where f a is the pitch estimate , f s is a sampling frequency , c is an index of a peak in the adaptive filter coefficient (current residual spectrum) s , and Δ F0MAX is a maximum pitch period expressed in terms of samples .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum (adaptive filter coefficient) , calculating a normalized correlation value with the previous residual spectrum , over frequency bins between two consecutive minima (signal enhancement) in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US20060098809A1
CLAIM 1
. A signal enhancement (consecutive minima, two consecutive minima) system comprising : a signal input ;
partitioned delay logic coupled to the signal input ;
a partitioned adaptive filter coupled to the partitioned delay logic and comprising multiple adaptive filter outputs ;
gain logic coupled to an the adaptive filter output ;
filter reinforcement logic coupled after the gain logic ;
and signal reinforcement logic coupled to the signal input and the gain logic and comprising an enhanced signal output .

US20060098809A1
CLAIM 27
. The pitch detector of claim 26 , where the pitch detection logic is operable to determine a pitch estimate according to : f a = f s (c + Δ F0 ⁢   ⁢ MAX) where f a is the pitch estimate , f s is a sampling frequency , c is an index of a peak in the adaptive filter coefficient (current residual spectrum) s , and Δ F0MAX is a maximum pitch period expressed in terms of samples .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin (periodic signal) by frequency bin basis ;

and summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US20060098809A1
CLAIM 10
. A method for signal enhancement comprising : receiving an input signal ;
delaying the input signal by multiple delays ;
processing the multiply delayed input signal in a partitioned adaptive filter comprising multiple adaptive filter outputs ;
biasing an adaptive filter output ;
generating a summed adaptive filter output after biasing ;
and reinforcing periodic signal (frequency bin) content in the input signal with the summed adaptive filter output .

US8990073B2
CLAIM 10
. A method for detecting sound activity (detection output) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US20060098809A1
CLAIM 38
. A voice detector comprising : a signal input ;
an adaptive filter coupled to the signal input , the adaptive filter comprising filter coefficients and operable to adapt based on an error signal ;
voice detection logic coupled to the adaptive filter , the voice detection logic operable to determine a detection measure based on the filter coefficients of the adaptive filter to detect voiced speech in the signal ;
a voice detection output (detecting sound activity) coupled to the voice detection logic .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (noise ratio) .
US20060098809A1
CLAIM 6
. The signal enhancement system of claim 1 , where the gain logic comprises a gain parameter that increases with decreasing signal-to-noise ratio (noise ratio, SNR LT, SNR calculation) .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection further comprises updating the noise estimates (adaptive filters) for a next frame .
US20060098809A1
CLAIM 2
. The signal enhancement system of claim 1 , where the partitioned adaptive filter comprises multiple adaptive filters (noise estimates, noise estimator) , each adaptive filter comprising filter coefficients .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum (adaptive filter coefficient) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (first stage) of the long-term correlation map .
US20060098809A1
CLAIM 9
. The signal enhancement system of claim 1 , further comprising a first stage (initial value) filter coupled between the signal input and the delay logic , the first stage filter comprising quasi-stationary frequency tracking and attenuation logic .

US20060098809A1
CLAIM 27
. The pitch detector of claim 26 , where the pitch detection logic is operable to determine a pitch estimate according to : f a = f s (c + Δ F0 ⁢   ⁢ MAX) where f a is the pitch estimate , f s is a sampling frequency , c is an index of a peak in the adaptive filter coefficient (current residual spectrum) s , and Δ F0MAX is a maximum pitch period expressed in terms of samples .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum (adaptive filter coefficient) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (first stage) of the long-term correlation map .
US20060098809A1
CLAIM 9
. The signal enhancement system of claim 1 , further comprising a first stage (initial value) filter coupled between the signal input and the delay logic , the first stage filter comprising quasi-stationary frequency tracking and attenuation logic .

US20060098809A1
CLAIM 27
. The pitch detector of claim 26 , where the pitch detection logic is operable to determine a pitch estimate according to : f a = f s (c + Δ F0 ⁢   ⁢ MAX) where f a is the pitch estimate , f s is a sampling frequency , c is an index of a peak in the adaptive filter coefficient (current residual spectrum) s , and Δ F0MAX is a maximum pitch period expressed in terms of samples .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum (adaptive filter coefficient) comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US20060098809A1
CLAIM 27
. The pitch detector of claim 26 , where the pitch detection logic is operable to determine a pitch estimate according to : f a = f s (c + Δ F0 ⁢   ⁢ MAX) where f a is the pitch estimate , f s is a sampling frequency , c is an index of a peak in the adaptive filter coefficient (current residual spectrum) s , and Δ F0MAX is a maximum pitch period expressed in terms of samples .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin (periodic signal) by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US20060098809A1
CLAIM 10
. A method for signal enhancement comprising : receiving an input signal ;
delaying the input signal by multiple delays ;
processing the multiply delayed input signal in a partitioned adaptive filter comprising multiple adaptive filter outputs ;
biasing an adaptive filter output ;
generating a summed adaptive filter output after biasing ;
and reinforcing periodic signal (frequency bin) content in the input signal with the summed adaptive filter output .

US8990073B2
CLAIM 35
. A device for detecting sound activity (detection output) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US20060098809A1
CLAIM 38
. A voice detector comprising : a signal input ;
an adaptive filter coupled to the signal input , the adaptive filter comprising filter coefficients and operable to adapt based on an error signal ;
voice detection logic coupled to the adaptive filter , the voice detection logic operable to determine a detection measure based on the filter coefficients of the adaptive filter to detect voiced speech in the signal ;
a voice detection output (detecting sound activity) coupled to the voice detection logic .

US8990073B2
CLAIM 36
. A device for detecting sound activity (detection output) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US20060098809A1
CLAIM 38
. A voice detector comprising : a signal input ;
an adaptive filter coupled to the signal input , the adaptive filter comprising filter coefficients and operable to adapt based on an error signal ;
voice detection logic coupled to the adaptive filter , the voice detection logic operable to determine a detection measure based on the filter coefficients of the adaptive filter to detect voiced speech in the signal ;
a voice detection output (detecting sound activity) coupled to the voice detection logic .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal to noise ratio (noise ratio) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US20060098809A1
CLAIM 6
. The signal enhancement system of claim 1 , where the gain logic comprises a gain parameter that increases with decreasing signal-to-noise ratio (noise ratio, SNR LT, SNR calculation) .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator (adaptive filters) for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector .
US20060098809A1
CLAIM 2
. The signal enhancement system of claim 1 , where the partitioned adaptive filter comprises multiple adaptive filters (noise estimates, noise estimator) , each adaptive filter comprising filter coefficients .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20050216261A1

Filed: 2005-03-18     Issued: 2005-09-29

Signal processing apparatus and method

(Original Assignee) Canon Inc     (Current Assignee) Canon Inc

Philip Garner, Toshiaki Fukada, Yasuhiro Komori
US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (predetermined threshold value, second threshold) indicative of sound activity in the sound signal .
US20050216261A1
CLAIM 1
. A signal processing apparatus comprising : dividing means for dividing an input signal into frames each of which has a predetermined time length ;
detection means for detecting the presence of a signal in the frame ;
filter means for smoothing a detection result from said detection means by using a detection result from said detection means for a past frame ;
and state evaluation means for comparing an output from said filter means with a predetermined threshold value (adaptive threshold) to evaluate a state of the signal on the basis of a comparison result .

US20050216261A1
CLAIM 5
. The apparatus according to claim 4 , wherein the predetermined threshold value includes a first threshold value which distinguishes the possible speech state from the speech state , and a second threshold (adaptive threshold) value which distinguishes the possible speech state or the possible silence state from the silence state , and said state evaluation means evaluates that the state of the signal has changed to the speech state when the output from said filter means equals or exceeds the first threshold value , and that the state of the signal has changed to the silence state when the output from said filter means is below the second threshold value .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (speech signal, first threshold value) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
US20050216261A1
CLAIM 3
. The apparatus according to claim 2 , wherein the audio signal is a speech signal (noise character parameter, activity prediction parameter) .

US20050216261A1
CLAIM 5
. The apparatus according to claim 4 , wherein the predetermined threshold value includes a first threshold value (noise character parameter, activity prediction parameter) which distinguishes the possible speech state from the speech state , and a second threshold value which distinguishes the possible speech state or the possible silence state from the silence state , and said state evaluation means evaluates that the state of the signal has changed to the speech state when the output from said filter means equals or exceeds the first threshold value , and that the state of the signal has changed to the silence state when the output from said filter means is below the second threshold value .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (speech signal, first threshold value) indicative of an activity of the sound signal .
US20050216261A1
CLAIM 3
. The apparatus according to claim 2 , wherein the audio signal is a speech signal (noise character parameter, activity prediction parameter) .

US20050216261A1
CLAIM 5
. The apparatus according to claim 4 , wherein the predetermined threshold value includes a first threshold value (noise character parameter, activity prediction parameter) which distinguishes the possible speech state from the speech state , and a second threshold value which distinguishes the possible speech state or the possible silence state from the silence state , and said state evaluation means evaluates that the state of the signal has changed to the speech state when the output from said filter means equals or exceeds the first threshold value , and that the state of the signal has changed to the silence state when the output from said filter means is below the second threshold value .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (speech signal, first threshold value) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
US20050216261A1
CLAIM 3
. The apparatus according to claim 2 , wherein the audio signal is a speech signal (noise character parameter, activity prediction parameter) .

US20050216261A1
CLAIM 5
. The apparatus according to claim 4 , wherein the predetermined threshold value includes a first threshold value (noise character parameter, activity prediction parameter) which distinguishes the possible speech state from the speech state , and a second threshold value which distinguishes the possible speech state or the possible silence state from the silence state , and said state evaluation means evaluates that the state of the signal has changed to the speech state when the output from said filter means equals or exceeds the first threshold value , and that the state of the signal has changed to the silence state when the output from said filter means is below the second threshold value .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (speech signal, first threshold value) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US20050216261A1
CLAIM 3
. The apparatus according to claim 2 , wherein the audio signal is a speech signal (noise character parameter, activity prediction parameter) .

US20050216261A1
CLAIM 5
. The apparatus according to claim 4 , wherein the predetermined threshold value includes a first threshold value (noise character parameter, activity prediction parameter) which distinguishes the possible speech state from the speech state , and a second threshold value which distinguishes the possible speech state or the possible silence state from the silence state , and said state evaluation means evaluates that the state of the signal has changed to the speech state when the output from said filter means equals or exceeds the first threshold value , and that the state of the signal has changed to the silence state when the output from said filter means is below the second threshold value .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (speech signal, first threshold value) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20050216261A1
CLAIM 3
. The apparatus according to claim 2 , wherein the audio signal is a speech signal (noise character parameter, activity prediction parameter) .

US20050216261A1
CLAIM 5
. The apparatus according to claim 4 , wherein the predetermined threshold value includes a first threshold value (noise character parameter, activity prediction parameter) which distinguishes the possible speech state from the speech state , and a second threshold value which distinguishes the possible speech state or the possible silence state from the silence state , and said state evaluation means evaluates that the state of the signal has changed to the speech state when the output from said filter means equals or exceeds the first threshold value , and that the state of the signal has changed to the silence state when the output from said filter means is below the second threshold value .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (speech signal, first threshold value) inferior than a given fixed threshold .
US20050216261A1
CLAIM 3
. The apparatus according to claim 2 , wherein the audio signal is a speech signal (noise character parameter, activity prediction parameter) .

US20050216261A1
CLAIM 5
. The apparatus according to claim 4 , wherein the predetermined threshold value includes a first threshold value (noise character parameter, activity prediction parameter) which distinguishes the possible speech state from the speech state , and a second threshold value which distinguishes the possible speech state or the possible silence state from the silence state , and said state evaluation means evaluates that the state of the signal has changed to the speech state when the output from said filter means equals or exceeds the first threshold value , and that the state of the signal has changed to the silence state when the output from said filter means is below the second threshold value .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20050203735A1

Filed: 2005-03-09     Issued: 2005-09-15

Signal noise reduction

(Original Assignee) International Business Machines Corp     (Current Assignee) International Business Machines Corp

Osamu Ichikawa
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (noise component) using a frequency spectrum (frequency spectrum) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US20050203735A1
CLAIM 1
. A noise reduction device comprising : first rank calculating means for calculating a rank for each element included in a first region , depending on a value of the element , the first region having predetermined sizes in the time axis direction and in the frequency axis direction in a noise section of an observed signal indicating variation of a frequency spectrum (frequency spectrum) with time ;
second rank calculating means for calculating a rank for each element included in a second region , depending on a value of the element , the second region having predetermined sizes in the time axis direction and in the frequency axis direction in the observed signal ;
and subtraction means for subtracting , from the values of the respective elements in the second region , values based on the values of the respective elements in the first region whose ranks correspond to the ranks of the respective elements in the second region .

US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (frequency spectrum) of the sound signal (noise component) in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US20050203735A1
CLAIM 1
. A noise reduction device comprising : first rank calculating means for calculating a rank for each element included in a first region , depending on a value of the element , the first region having predetermined sizes in the time axis direction and in the frequency axis direction in a noise section of an observed signal indicating variation of a frequency spectrum (frequency spectrum) with time ;
second rank calculating means for calculating a rank for each element included in a second region , depending on a value of the element , the second region having predetermined sizes in the time axis direction and in the frequency axis direction in the observed signal ;
and subtraction means for subtracting , from the values of the respective elements in the second region , values based on the values of the respective elements in the first region whose ranks correspond to the ranks of the respective elements in the second region .

US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis (method steps) ;

and summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US20050203735A1
CLAIM 18
. A program storage device readable by machine , tangibly embodying a program of instructions executable by the machine to perform method steps (frequency bin basis) for noise reduction , said method steps comprising the steps of claim 9 .

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (noise component) .
US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (noise component) comprises searching in the correlation map for frequency bins having a magnitude that exceeds a given fixed threshold .
US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (noise component) comprises comparing the summed long-term correlation map with an adaptive threshold indicative of sound activity (noise component) in the sound signal .
US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US8990073B2
CLAIM 10
. A method for detecting sound activity (noise component) in a sound signal (noise component) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound signal (noise component) is detected .
US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity (noise component) in the sound signal (noise component) further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (noise component) detection comprises detecting the sound signal (noise component) based on a frequency dependent signal-to-noise ratio (SNR) .
US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US8990073B2
CLAIM 14
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (noise component) detection comprises comparing an average signal-to-noise ratio (SNR av ) to a threshold calculated as a function of a long-term signal-to-noise ratio (SNR LT ) .
US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity (noise component) detection in the sound signal (noise component) further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity (noise component) detection further comprises updating the noise estimates for a next frame .
US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal (noise component) and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US8990073B2
CLAIM 18
. A method as defined in claim 14 , comprising classifying the sound signal (noise component) as one of an inactive sound signal and active sound signal , which comprises determining an inactive sound signal when the average signal-to-noise ratio (SNR av ) is inferior to the calculated threshold .
US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US8990073B2
CLAIM 19
. A method as defined in claim 14 , comprising classifying the sound signal (noise component) as one of an inactive sound signal and active sound signal , which comprises determining an active sound signal when the average signal-to-noise ratio (SNR av ) is larger than the calculated threshold .
US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal (noise component) prevents updating of noise energy estimates when a music signal is detected .
US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (speech signal) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
US20050203735A1
CLAIM 8
. The noise reduction device according to claim 1 , wherein the observed signal is what is obtained by converting a speech signal (noise character parameter, activity prediction parameter) , which a noise component is superimposed on , into a time series of a shot time spectrum by a predetermined frame length and by a predetermined frame cycle ;
the element is present in each frame for each frequency sub-band ;
the first region has a size to be obtained by multiplying a predetermined number of frames by a predetermined number of frequency sub-bands ;
and the second region has a size to be obtained by multiplying a predetermined number of frames by the same number of frequency sub-bands as the first region has .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal (noise component) in a current frame and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (speech signal) indicative of an activity of the sound signal (noise component) .
US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US20050203735A1
CLAIM 8
. The noise reduction device according to claim 1 , wherein the observed signal is what is obtained by converting a speech signal (noise character parameter, activity prediction parameter) , which a noise component is superimposed on , into a time series of a shot time spectrum by a predetermined frame length and by a predetermined frame cycle ;
the element is present in each frame for each frequency sub-band ;
the first region has a size to be obtained by multiplying a predetermined number of frames by a predetermined number of frequency sub-bands ;
and the second region has a size to be obtained by multiplying a predetermined number of frames by the same number of frequency sub-bands as the first region has .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (speech signal) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal (noise component) and the complementary non-stationarity parameter .
US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US20050203735A1
CLAIM 8
. The noise reduction device according to claim 1 , wherein the observed signal is what is obtained by converting a speech signal (noise character parameter, activity prediction parameter) , which a noise component is superimposed on , into a time series of a shot time spectrum by a predetermined frame length and by a predetermined frame cycle ;
the element is present in each frame for each frequency sub-band ;
the first region has a size to be obtained by multiplying a predetermined number of frames by a predetermined number of frequency sub-bands ;
and the second region has a size to be obtained by multiplying a predetermined number of frames by the same number of frequency sub-bands as the first region has .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (speech signal) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US20050203735A1
CLAIM 8
. The noise reduction device according to claim 1 , wherein the observed signal is what is obtained by converting a speech signal (noise character parameter, activity prediction parameter) , which a noise component is superimposed on , into a time series of a shot time spectrum by a predetermined frame length and by a predetermined frame cycle ;
the element is present in each frame for each frequency sub-band ;
the first region has a size to be obtained by multiplying a predetermined number of frames by a predetermined number of frequency sub-bands ;
and the second region has a size to be obtained by multiplying a predetermined number of frames by the same number of frequency sub-bands as the first region has .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (speech signal) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20050203735A1
CLAIM 8
. The noise reduction device according to claim 1 , wherein the observed signal is what is obtained by converting a speech signal (noise character parameter, activity prediction parameter) , which a noise component is superimposed on , into a time series of a shot time spectrum by a predetermined frame length and by a predetermined frame cycle ;
the element is present in each frame for each frequency sub-band ;
the first region has a size to be obtained by multiplying a predetermined number of frames by a predetermined number of frequency sub-bands ;
and the second region has a size to be obtained by multiplying a predetermined number of frames by the same number of frequency sub-bands as the first region has .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (speech signal) inferior than a given fixed threshold .
US20050203735A1
CLAIM 8
. The noise reduction device according to claim 1 , wherein the observed signal is what is obtained by converting a speech signal (noise character parameter, activity prediction parameter) , which a noise component is superimposed on , into a time series of a shot time spectrum by a predetermined frame length and by a predetermined frame cycle ;
the element is present in each frame for each frequency sub-band ;
the first region has a size to be obtained by multiplying a predetermined number of frames by a predetermined number of frequency sub-bands ;
and the second region has a size to be obtained by multiplying a predetermined number of frames by the same number of frequency sub-bands as the first region has .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal (noise component) using a frequency spectrum (frequency spectrum) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20050203735A1
CLAIM 1
. A noise reduction device comprising : first rank calculating means for calculating a rank for each element included in a first region , depending on a value of the element , the first region having predetermined sizes in the time axis direction and in the frequency axis direction in a noise section of an observed signal indicating variation of a frequency spectrum (frequency spectrum) with time ;
second rank calculating means for calculating a rank for each element included in a second region , depending on a value of the element , the second region having predetermined sizes in the time axis direction and in the frequency axis direction in the observed signal ;
and subtraction means for subtracting , from the values of the respective elements in the second region , values based on the values of the respective elements in the first region whose ranks correspond to the ranks of the respective elements in the second region .

US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal (noise component) using a frequency spectrum (frequency spectrum) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20050203735A1
CLAIM 1
. A noise reduction device comprising : first rank calculating means for calculating a rank for each element included in a first region , depending on a value of the element , the first region having predetermined sizes in the time axis direction and in the frequency axis direction in a noise section of an observed signal indicating variation of a frequency spectrum (frequency spectrum) with time ;
second rank calculating means for calculating a rank for each element included in a second region , depending on a value of the element , the second region having predetermined sizes in the time axis direction and in the frequency axis direction in the observed signal ;
and subtraction means for subtracting , from the values of the respective elements in the second region , values based on the values of the respective elements in the first region whose ranks correspond to the ranks of the respective elements in the second region .

US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (frequency spectrum) of the sound signal (noise component) in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US20050203735A1
CLAIM 1
. A noise reduction device comprising : first rank calculating means for calculating a rank for each element included in a first region , depending on a value of the element , the first region having predetermined sizes in the time axis direction and in the frequency axis direction in a noise section of an observed signal indicating variation of a frequency spectrum (frequency spectrum) with time ;
second rank calculating means for calculating a rank for each element included in a second region , depending on a value of the element , the second region having predetermined sizes in the time axis direction and in the frequency axis direction in the observed signal ;
and subtraction means for subtracting , from the values of the respective elements in the second region , values based on the values of the respective elements in the first region whose ranks correspond to the ranks of the respective elements in the second region .

US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis (method steps) ;

and an adder for summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US20050203735A1
CLAIM 18
. A program storage device readable by machine , tangibly embodying a program of instructions executable by the machine to perform method steps (frequency bin basis) for noise reduction , said method steps comprising the steps of claim 9 .

US8990073B2
CLAIM 34
. A device as defined in claim 31 , further comprising a detector of strong tones in the sound signal (noise component) .
US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US8990073B2
CLAIM 35
. A device for detecting sound activity (noise component) in a sound signal (noise component) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US8990073B2
CLAIM 36
. A device for detecting sound activity (noise component) in a sound signal (noise component) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US8990073B2
CLAIM 37
. A device as defined in claim 36 , further comprising a signal-to-noise ratio (SNR)-based sound activity (noise component) detector .
US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity (noise component) detector comprises a comparator of an average signal (noise component) to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity (noise component) detector .
US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal (noise component) for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates .
US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .

US8990073B2
CLAIM 41
. A device as defined in claim 36 , further comprising a calculator of a spectral parameter used for detecting spectral changes and spectral attacks in the sound signal (noise component) .
US20050203735A1
CLAIM 3
. The noise reduction device according to claim 1 , comprising region setting means for setting a plurality of the first and the second regions in the respective frequency axis directions for each of predetermined increases of a frequency , and for concurrently changing the sizes of the first and second regions , respectively depending on a condition of a distribution of noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) s in the frequency axis direction .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
CN1922659A

Filed: 2005-02-22     Issued: 2007-02-28

编码模式选择

(Original Assignee) 诺基亚公司     

雅里·马基南
US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (预定阈值, 超过预定, 阈值进行) indicative of sound activity in the sound signal .
CN1922659A
CLAIM 5
. 根据权利要求4所述的编码器(200),其特征在于安排基于不稳定的LTP参数和/或超过预定阈值 (adaptive threshold) 的平均频率来确定噪声。

CN1922659A
CLAIM 48
. 根据权利要求47所述的计算机程序产品,其特征在于它包括用来检查所述LTP参数的稳定性和/或将平均频率与预定义的阈值进行 (adaptive threshold) 比较以确定所述音频信号上的噪声的机器可执行步骤。

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (来计算) .
CN1922659A
CLAIM 2
. 根据权利要求1所述的编码器(200),其特征在于所述参数分析块(202)还包括用来至少基于所述LTP参数来计算 (SNR calculation) 和分析归一化相关性的装置。

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection further comprises updating the noise estimates (的噪声) for a next frame (至少基) .
CN1922659A
CLAIM 2
. 根据权利要求1所述的编码器(200),其特征在于所述参数分析块(202)还包括用来至少基 (next frame) 于所述LTP参数来计算和分析归一化相关性的装置。

CN1922659A
CLAIM 48
. 根据权利要求47所述的计算机程序产品,其特征在于它包括用来检查所述LTP参数的稳定性和/或将平均频率与预定义的阈值进行比较以确定所述音频信号上的噪声 (noise estimates) 的机器可执行步骤。

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame (至少基) comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
CN1922659A
CLAIM 2
. 根据权利要求1所述的编码器(200),其特征在于所述参数分析块(202)还包括用来至少基 (next frame) 于所述LTP参数来计算和分析归一化相关性的装置。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20050192798A1

Filed: 2005-02-22     Issued: 2005-09-01

Classification of audio signals

(Original Assignee) Nokia Oyj     (Current Assignee) Nokia Technologies Oy

Janne Vainio, Hannu Mikkola, Pasi Ojala, Jari Makinen
US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal prevents updating (narrower bandwidth) of noise energy estimates when a music signal is detected .
US20050192798A1
CLAIM 1
. An encoder comprising an input for inputting frames of an audio signal in a frequency band , at least a first excitation block for performing a first excitation for a speech like audio signal , and a second excitation block for performing a second excitation for a non-speech like audio signal , wherein the encoder further comprises a filter for dividing the frequency band into a plurality of sub bands each having a narrower bandwidth (term signal, sound signal prevents updating) than said frequency band , and an excitation selection block for selecting one excitation block among said at least first excitation block and said second excitation block for performing the excitation for a frame of the audio signal on the basis of the properties of the audio signal at least in one of said sub bands .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group (first group) of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20050192798A1
CLAIM 3
. The encoder according to claim 2 , wherein the encoder is adapted to define at least a first and a second group of sub bands , said second group containing sub bands of higher frequencies than said first group (first group) , and the encoder is further adapted to define a relation between normalised signal energy of said first group of sub bands and normalised signal energy of said second group of sub bands for the frames of the audio signal , and to use said relation in the selection of the excitation block .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
CN1957398A

Filed: 2005-02-18     Issued: 2007-05-02

在基于代数码激励线性预测/变换编码激励的音频压缩期间低频加重的方法和设备

(Original Assignee) 沃伊斯亚吉公司     

布鲁诺·贝塞特
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (频率分量) of the sound signal , the method comprising : calculating a current residual spectrum (系数计算, 获得的) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima (响应之间) of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (的因子) , the correlation map of a current frame , and an initial value of the long term correlation map .
CN1957398A
CLAIM 7
. 按照权利要求1的用于低频加重声音信号的频谱的方法,其中,计算每个块的因子 (update factor) 包括:使用下述关系式来计算具有小于具有最大能量的块的位置指标的位置指标m的每个块的比率RmRm=Emax/Em其中,Emax是所计算的最大能量,Em是对应于位置指标m的块的所计算的能量。

CN1957398A
CLAIM 35
. 一种用于通过带宽扩展方案编码通过将全带宽声音信号分离为HF信号和LF信号而获得的 (current residual spectrum) HF信号的HF编码方法,包括:对于所述LF和HF信号执行LPC分析,以产生对所述LF和HF信号的频谱包络建模的LPC系数;从所述LCP系数计算 (current residual spectrum) HF匹配增益的估计;计算所述HF信号的能量;处理所述LF信号以产生所述HF信号的合成版本;计算所述HF信号的合成版本的能量;计算在所计算的HF信号的能量和所计算的HF信号的合成版本的能量之间的比率,并且将所计算的比率表达为HF补偿增益;以及计算在HF匹配增益的估计和HF补偿增益之间的差,以获得增益校正;其中,所述编码的HF信号包括LPC参数和增益校正。

CN1957398A
CLAIM 36
. 按照权利要求35的HF编码方法,其中,所述HF信号由高于6400Hz的频率分量 (frequency spectrum, frequency bins, frequency bin basis) 构成。

CN1957398A
CLAIM 41
. 按照权利要求35的HF编码方法,其中,将所计算的比率表达为HF增益包括:以dB来表达在HF信号的所计算的能量和HF信号的合成版本的所计算的能量之间的所计算的比率。41a . 按照权利要求35的HF编码方法,其中,计算HF匹配增益包括:计算在奈奎斯特频率在LF LPC滤波器和HF LPC滤波器的频率响应之间 (successive minima) 的比率。

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum (系数计算, 获得的) comprises : searching for the minima in the frequency spectrum (频率分量) of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
CN1957398A
CLAIM 35
. 一种用于通过带宽扩展方案编码通过将全带宽声音信号分离为HF信号和LF信号而获得的 (current residual spectrum) HF信号的HF编码方法,包括:对于所述LF和HF信号执行LPC分析,以产生对所述LF和HF信号的频谱包络建模的LPC系数;从所述LCP系数计算 (current residual spectrum) HF匹配增益的估计;计算所述HF信号的能量;处理所述LF信号以产生所述HF信号的合成版本;计算所述HF信号的合成版本的能量;计算在所计算的HF信号的能量和所计算的HF信号的合成版本的能量之间的比率,并且将所计算的比率表达为HF补偿增益;以及计算在HF匹配增益的估计和HF补偿增益之间的差,以获得增益校正;其中,所述编码的HF信号包括LPC参数和增益校正。

CN1957398A
CLAIM 36
. 按照权利要求35的HF编码方法,其中,所述HF信号由高于6400Hz的频率分量 (frequency spectrum, frequency bins, frequency bin basis) 构成。

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum (系数计算, 获得的) comprises locating a maximum between each pair of two consecutive minima of the current residual spectrum .
CN1957398A
CLAIM 35
. 一种用于通过带宽扩展方案编码通过将全带宽声音信号分离为HF信号和LF信号而获得的 (current residual spectrum) HF信号的HF编码方法,包括:对于所述LF和HF信号执行LPC分析,以产生对所述LF和HF信号的频谱包络建模的LPC系数;从所述LCP系数计算 (current residual spectrum) HF匹配增益的估计;计算所述HF信号的能量;处理所述LF信号以产生所述HF信号的合成版本;计算所述HF信号的合成版本的能量;计算在所计算的HF信号的能量和所计算的HF信号的合成版本的能量之间的比率,并且将所计算的比率表达为HF补偿增益;以及计算在HF匹配增益的估计和HF补偿增益之间的差,以获得增益校正;其中,所述编码的HF信号包括LPC参数和增益校正。

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum (系数计算, 获得的) , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (频率分量) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
CN1957398A
CLAIM 35
. 一种用于通过带宽扩展方案编码通过将全带宽声音信号分离为HF信号和LF信号而获得的 (current residual spectrum) HF信号的HF编码方法,包括:对于所述LF和HF信号执行LPC分析,以产生对所述LF和HF信号的频谱包络建模的LPC系数;从所述LCP系数计算 (current residual spectrum) HF匹配增益的估计;计算所述HF信号的能量;处理所述LF信号以产生所述HF信号的合成版本;计算所述HF信号的合成版本的能量;计算在所计算的HF信号的能量和所计算的HF信号的合成版本的能量之间的比率,并且将所计算的比率表达为HF补偿增益;以及计算在HF匹配增益的估计和HF补偿增益之间的差,以获得增益校正;其中,所述编码的HF信号包括LPC参数和增益校正。

CN1957398A
CLAIM 36
. 按照权利要求35的HF编码方法,其中,所述HF信号由高于6400Hz的频率分量 (frequency spectrum, frequency bins, frequency bin basis) 构成。

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis (频率分量) ;

and summing the filtered correlation map over the frequency bins (频率分量) so as to produce a summed long-term correlation map .
CN1957398A
CLAIM 36
. 按照权利要求35的HF编码方法,其中,所述HF信号由高于6400Hz的频率分量 (frequency spectrum, frequency bins, frequency bin basis) 构成。

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map for frequency bins (频率分量) having a magnitude that exceeds a given fixed threshold .
CN1957398A
CLAIM 36
. 按照权利要求35的HF编码方法,其中,所述HF信号由高于6400Hz的频率分量 (frequency spectrum, frequency bins, frequency bin basis) 构成。

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (产生增益) from a background noise signal (产生增益) ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
CN1957398A
CLAIM 42
. 按照权利要求35的HF编码方法,其中:-执行LPC分析包括:计算HF量化的LPC系数`AHF(z);以及-计算HF匹配增益的估计包括:通过经由1/(1+0 . 9z-1)形式的一极滤波器滤波单位冲击δ(n)而计算每个采样在奈奎斯特频率的衰减正弦h(n)的64个采样;通过LF LPC滤波器`A(z)来滤波所述衰减正弦h(n),以获得低频余项,其中,`A(z)表示来自LF编码器的LF量化LPC系数;通过HF LPC合成滤波器1/`AHF(z)来滤波所滤波的衰减正弦h(n),以获得合成信号x(n);以及计算所述合成信号x(n)的能量的乘法逆元素,并且在对数域中表达它,以产生增益 (music signal, background noise signal) gmatch;以及内插所述增益gmatch以产生HF匹配增益的估计。

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (计算装置, 来计算) .
CN1957398A
CLAIM 1
. 一种用于低频加重在频域中被变换并且包括以多个块编组的变换系数的声音信号的频谱的方法,包括:计算具有位置指标的一个块的最大能量;对于具有小于具有最大能量的块的位置指标的位置指标的每个块,计算因子,对于每个块,所述因子的计算包括:-计算所述块的能量;以及-从计算的最大能量和所计算的所述块的能量来计算 (SNR calculation) 所述因子;以及对于每个块,根据所述因子确定应用到所述块的变换系数的增益。

CN1957398A
CLAIM 13
. 一种用于低频加重在频域中被变换并且包括以多个块编组的变换系数的声音信号的频谱的设备,包括:装置,用于计算具有位置指标的一个块的最大能量;装置,用于对于具有小于具有最大能量的块的位置指标的位置指标的每个块,计算因子,对于每个块,该因子计算装置 (SNR calculation) 包括:-装置,用于计算所述块的能量;以及-装置,用于根据计算的最大能量和所计算的所述块的能量来计算所述因子;以及装置,用于对于每个块、根据所述因子来确定应用到所述块的变换系数的增益。

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal prevents updating of noise energy estimates when a music signal (产生增益) is detected .
CN1957398A
CLAIM 42
. 按照权利要求35的HF编码方法,其中:-执行LPC分析包括:计算HF量化的LPC系数`AHF(z);以及-计算HF匹配增益的估计包括:通过经由1/(1+0 . 9z-1)形式的一极滤波器滤波单位冲击δ(n)而计算每个采样在奈奎斯特频率的衰减正弦h(n)的64个采样;通过LF LPC滤波器`A(z)来滤波所述衰减正弦h(n),以获得低频余项,其中,`A(z)表示来自LF编码器的LF量化LPC系数;通过HF LPC合成滤波器1/`AHF(z)来滤波所滤波的衰减正弦h(n),以获得合成信号x(n);以及计算所述合成信号x(n)的能量的乘法逆元素,并且在对数域中表达它,以产生增益 (music signal, background noise signal) gmatch;以及内插所述增益gmatch以产生HF匹配增益的估计。

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal (产生增益) from a background noise signal (产生增益) and prevent update of noise energy estimates on the music signal .
CN1957398A
CLAIM 42
. 按照权利要求35的HF编码方法,其中:-执行LPC分析包括:计算HF量化的LPC系数`AHF(z);以及-计算HF匹配增益的估计包括:通过经由1/(1+0 . 9z-1)形式的一极滤波器滤波单位冲击δ(n)而计算每个采样在奈奎斯特频率的衰减正弦h(n)的64个采样;通过LF LPC滤波器`A(z)来滤波所述衰减正弦h(n),以获得低频余项,其中,`A(z)表示来自LF编码器的LF量化LPC系数;通过HF LPC合成滤波器1/`AHF(z)来滤波所滤波的衰减正弦h(n),以获得合成信号x(n);以及计算所述合成信号x(n)的能量的乘法逆元素,并且在对数域中表达它,以产生增益 (music signal, background noise signal) gmatch;以及内插所述增益gmatch以产生HF匹配增益的估计。

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision (持续时间) obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
CN1957398A
CLAIM 71
. 按照权利要求67的从第一声音信号编码模式向第二声音信号编码模式转换的方法,包括:在已经从加权信号去除建立窗口的零输入响应后,将所述加权信号建立窗口到预定持续时间 (binary decision) 的TCX帧中。

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency (宽扩展) bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
CN1957398A
CLAIM 35
. 一种用于通过带宽扩展 (first frequency) 方案编码通过将全带宽声音信号分离为HF信号和LF信号而获得的HF信号的HF编码方法,包括:对于所述LF和HF信号执行LPC分析,以产生对所述LF和HF信号的频谱包络建模的LPC系数;从所述LCP系数计算HF匹配增益的估计;计算所述HF信号的能量;处理所述LF信号以产生所述HF信号的合成版本;计算所述HF信号的合成版本的能量;计算在所计算的HF信号的能量和所计算的HF信号的合成版本的能量之间的比率,并且将所计算的比率表达为HF补偿增益;以及计算在HF匹配增益的估计和HF补偿增益之间的差,以获得增益校正;其中,所述编码的HF信号包括LPC参数和增益校正。

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (频率分量) of the sound signal , the device comprising : means for calculating a current residual spectrum (系数计算, 获得的) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima (响应之间) of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (的因子) , the correlation map of a current frame , and an initial value of the long-term correlation map .
CN1957398A
CLAIM 7
. 按照权利要求1的用于低频加重声音信号的频谱的方法,其中,计算每个块的因子 (update factor) 包括:使用下述关系式来计算具有小于具有最大能量的块的位置指标的位置指标m的每个块的比率RmRm=Emax/Em其中,Emax是所计算的最大能量,Em是对应于位置指标m的块的所计算的能量。

CN1957398A
CLAIM 35
. 一种用于通过带宽扩展方案编码通过将全带宽声音信号分离为HF信号和LF信号而获得的 (current residual spectrum) HF信号的HF编码方法,包括:对于所述LF和HF信号执行LPC分析,以产生对所述LF和HF信号的频谱包络建模的LPC系数;从所述LCP系数计算 (current residual spectrum) HF匹配增益的估计;计算所述HF信号的能量;处理所述LF信号以产生所述HF信号的合成版本;计算所述HF信号的合成版本的能量;计算在所计算的HF信号的能量和所计算的HF信号的合成版本的能量之间的比率,并且将所计算的比率表达为HF补偿增益;以及计算在HF匹配增益的估计和HF补偿增益之间的差,以获得增益校正;其中,所述编码的HF信号包括LPC参数和增益校正。

CN1957398A
CLAIM 36
. 按照权利要求35的HF编码方法,其中,所述HF信号由高于6400Hz的频率分量 (frequency spectrum, frequency bins, frequency bin basis) 构成。

CN1957398A
CLAIM 41
. 按照权利要求35的HF编码方法,其中,将所计算的比率表达为HF增益包括:以dB来表达在HF信号的所计算的能量和HF信号的合成版本的所计算的能量之间的所计算的比率。41a . 按照权利要求35的HF编码方法,其中,计算HF匹配增益包括:计算在奈奎斯特频率在LF LPC滤波器和HF LPC滤波器的频率响应之间 (successive minima) 的比率。

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (频率分量) of the sound signal , the device comprising : a calculator of a current residual spectrum (系数计算, 获得的) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima (响应之间) of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (的因子) , the correlation map of a current frame , and an initial value of the long-term correlation map .
CN1957398A
CLAIM 7
. 按照权利要求1的用于低频加重声音信号的频谱的方法,其中,计算每个块的因子 (update factor) 包括:使用下述关系式来计算具有小于具有最大能量的块的位置指标的位置指标m的每个块的比率RmRm=Emax/Em其中,Emax是所计算的最大能量,Em是对应于位置指标m的块的所计算的能量。

CN1957398A
CLAIM 35
. 一种用于通过带宽扩展方案编码通过将全带宽声音信号分离为HF信号和LF信号而获得的 (current residual spectrum) HF信号的HF编码方法,包括:对于所述LF和HF信号执行LPC分析,以产生对所述LF和HF信号的频谱包络建模的LPC系数;从所述LCP系数计算 (current residual spectrum) HF匹配增益的估计;计算所述HF信号的能量;处理所述LF信号以产生所述HF信号的合成版本;计算所述HF信号的合成版本的能量;计算在所计算的HF信号的能量和所计算的HF信号的合成版本的能量之间的比率,并且将所计算的比率表达为HF补偿增益;以及计算在HF匹配增益的估计和HF补偿增益之间的差,以获得增益校正;其中,所述编码的HF信号包括LPC参数和增益校正。

CN1957398A
CLAIM 36
. 按照权利要求35的HF编码方法,其中,所述HF信号由高于6400Hz的频率分量 (frequency spectrum, frequency bins, frequency bin basis) 构成。

CN1957398A
CLAIM 41
. 按照权利要求35的HF编码方法,其中,将所计算的比率表达为HF增益包括:以dB来表达在HF信号的所计算的能量和HF信号的合成版本的所计算的能量之间的所计算的比率。41a . 按照权利要求35的HF编码方法,其中,计算HF匹配增益包括:计算在奈奎斯特频率在LF LPC滤波器和HF LPC滤波器的频率响应之间 (successive minima) 的比率。

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum (系数计算, 获得的) comprises : a locator of the minima in the frequency spectrum (频率分量) of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
CN1957398A
CLAIM 35
. 一种用于通过带宽扩展方案编码通过将全带宽声音信号分离为HF信号和LF信号而获得的 (current residual spectrum) HF信号的HF编码方法,包括:对于所述LF和HF信号执行LPC分析,以产生对所述LF和HF信号的频谱包络建模的LPC系数;从所述LCP系数计算 (current residual spectrum) HF匹配增益的估计;计算所述HF信号的能量;处理所述LF信号以产生所述HF信号的合成版本;计算所述HF信号的合成版本的能量;计算在所计算的HF信号的能量和所计算的HF信号的合成版本的能量之间的比率,并且将所计算的比率表达为HF补偿增益;以及计算在HF匹配增益的估计和HF补偿增益之间的差,以获得增益校正;其中,所述编码的HF信号包括LPC参数和增益校正。

CN1957398A
CLAIM 36
. 按照权利要求35的HF编码方法,其中,所述HF信号由高于6400Hz的频率分量 (frequency spectrum, frequency bins, frequency bin basis) 构成。

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis (频率分量) ;

and an adder for summing the filtered correlation map over the frequency bins (频率分量) so as to produce a summed long-term correlation map .
CN1957398A
CLAIM 36
. 按照权利要求35的HF编码方法,其中,所述HF信号由高于6400Hz的频率分量 (frequency spectrum, frequency bins, frequency bin basis) 构成。

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (产生增益) from a background noise signal (产生增益) ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
CN1957398A
CLAIM 42
. 按照权利要求35的HF编码方法,其中:-执行LPC分析包括:计算HF量化的LPC系数`AHF(z);以及-计算HF匹配增益的估计包括:通过经由1/(1+0 . 9z-1)形式的一极滤波器滤波单位冲击δ(n)而计算每个采样在奈奎斯特频率的衰减正弦h(n)的64个采样;通过LF LPC滤波器`A(z)来滤波所述衰减正弦h(n),以获得低频余项,其中,`A(z)表示来自LF编码器的LF量化LPC系数;通过HF LPC合成滤波器1/`AHF(z)来滤波所滤波的衰减正弦h(n),以获得合成信号x(n);以及计算所述合成信号x(n)的能量的乘法逆元素,并且在对数域中表达它,以产生增益 (music signal, background noise signal) gmatch;以及内插所述增益gmatch以产生HF匹配增益的估计。

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal (产生增益) from a background noise signal (产生增益) ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
CN1957398A
CLAIM 42
. 按照权利要求35的HF编码方法,其中:-执行LPC分析包括:计算HF量化的LPC系数`AHF(z);以及-计算HF匹配增益的估计包括:通过经由1/(1+0 . 9z-1)形式的一极滤波器滤波单位冲击δ(n)而计算每个采样在奈奎斯特频率的衰减正弦h(n)的64个采样;通过LF LPC滤波器`A(z)来滤波所述衰减正弦h(n),以获得低频余项,其中,`A(z)表示来自LF编码器的LF量化LPC系数;通过HF LPC合成滤波器1/`AHF(z)来滤波所滤波的衰减正弦h(n),以获得合成信号x(n);以及计算所述合成信号x(n)的能量的乘法逆元素,并且在对数域中表达它,以产生增益 (music signal, background noise signal) gmatch;以及内插所述增益gmatch以产生HF匹配增益的估计。

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal (产生增益) from a background noise signal (产生增益) and preventing update of noise energy estimates .
CN1957398A
CLAIM 42
. 按照权利要求35的HF编码方法,其中:-执行LPC分析包括:计算HF量化的LPC系数`AHF(z);以及-计算HF匹配增益的估计包括:通过经由1/(1+0 . 9z-1)形式的一极滤波器滤波单位冲击δ(n)而计算每个采样在奈奎斯特频率的衰减正弦h(n)的64个采样;通过LF LPC滤波器`A(z)来滤波所述衰减正弦h(n),以获得低频余项,其中,`A(z)表示来自LF编码器的LF量化LPC系数;通过HF LPC合成滤波器1/`AHF(z)来滤波所滤波的衰减正弦h(n),以获得合成信号x(n);以及计算所述合成信号x(n)的能量的乘法逆元素,并且在对数域中表达它,以产生增益 (music signal, background noise signal) gmatch;以及内插所述增益gmatch以产生HF匹配增益的估计。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
WO2005078706A1

Filed: 2005-02-18     Issued: 2005-08-25

Methods and devices for low-frequency emphasis during audio compression based on acelp/tcx

(Original Assignee) Voiceage Corporation     

Bruno Bessette
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum (first wind) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
WO2005078706A1
CLAIM 88
. A device for producing from a decoded target signal an overlap-add target signal in a current frame coded according to a first coding mode , comprising : a first wind (current residual spectrum) ow generator for windowing the decoded target signal of the current frame in a given window ;
means for skipping a left portion of the window ;
a calculator of a zero-input response of a weighting filter of the previous frame coded according to a second coding mode , and a second window generator for windowing the zero-input response so that said zero-input response has an amplitude monotonically decreasing to zero after a predetermined time period ;
and an adder for adding the calculated zero-input response to the decoded target signal to reconstruct said overlap-add target signal .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum (first wind) comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
WO2005078706A1
CLAIM 88
. A device for producing from a decoded target signal an overlap-add target signal in a current frame coded according to a first coding mode , comprising : a first wind (current residual spectrum) ow generator for windowing the decoded target signal of the current frame in a given window ;
means for skipping a left portion of the window ;
a calculator of a zero-input response of a weighting filter of the previous frame coded according to a second coding mode , and a second window generator for windowing the zero-input response so that said zero-input response has an amplitude monotonically decreasing to zero after a predetermined time period ;
and an adder for adding the calculated zero-input response to the decoded target signal to reconstruct said overlap-add target signal .

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum (first wind) comprises locating a maximum between each pair of two consecutive minima of the current residual spectrum .
WO2005078706A1
CLAIM 88
. A device for producing from a decoded target signal an overlap-add target signal in a current frame coded according to a first coding mode , comprising : a first wind (current residual spectrum) ow generator for windowing the decoded target signal of the current frame in a given window ;
means for skipping a left portion of the window ;
a calculator of a zero-input response of a weighting filter of the previous frame coded according to a second coding mode , and a second window generator for windowing the zero-input response so that said zero-input response has an amplitude monotonically decreasing to zero after a predetermined time period ;
and an adder for adding the calculated zero-input response to the decoded target signal to reconstruct said overlap-add target signal .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum (first wind) , calculating a normalized correlation value with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
WO2005078706A1
CLAIM 88
. A device for producing from a decoded target signal an overlap-add target signal in a current frame coded according to a first coding mode , comprising : a first wind (current residual spectrum) ow generator for windowing the decoded target signal of the current frame in a given window ;
means for skipping a left portion of the window ;
a calculator of a zero-input response of a weighting filter of the previous frame coded according to a second coding mode , and a second window generator for windowing the zero-input response so that said zero-input response has an amplitude monotonically decreasing to zero after a predetermined time period ;
and an adder for adding the calculated zero-input response to the decoded target signal to reconstruct said overlap-add target signal .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame energy and an average frame energy (last portion) .
WO2005078706A1
CLAIM 85
. A method for producing an overlap-add target signal as defined in claim 82 , comprising saving in a buffer a last portion (average frame energy, first energy value) of samples of the current frame .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame and an energy of the sound signal in a previous frame , for frequency bands (bandwidth extension) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
WO2005078706A1
CLAIM 35
. An HF coding method for coding , through a bandwidth extension (frequency bands) scheme , an HF signal obtained from separation of a full-bandwidth sound signal into the HF signal and a LF signal , comprising : performing an LPC analysis on the LF and HF signals to produce LPC coefficients which model a spectral envelope of the LF and HF signals ;
calculating , from the LPC coefficients , an estimation of an HF matching gain ;
calculating the energy of the HF signal ;
processing the LF signal to produce a synthesized version of the HF signal ;
calculating the energy of the synthesized version of the HF signal ;
calculating a ratio between the calculated energy of the HF signal and the calculated energy of the synthesized version of the HF signal , and expressing the calculated ratio as an HF compensating gain ;
and calculating a difference between the estimation of the HF matching gain and the HF compensating gain to obtain a gain correction ;
— wherein the coded HF signal comprises the LPC parameters and the gain correction .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (previous frame) indicative of an activity of the sound signal .
WO2005078706A1
CLAIM 67
. A method of switching from a first sound signal coding mode to a second sound signal coding mode at the junction between a previous frame (activity prediction parameter) coded according to the first coding mode and a current frame coded according to the second coding mode , wherein the sound signal is filtered through a weighting filter to produce , in the current frame , a weighted signal , comprising : calculating a zero-input response of the weighting filter ;
windowing the zero-input response so that said zero-input response has an amplitude monotonically decreasing to zero after a predetermined time period ;
and in the current frame , removing from the weighted signal the windowed zero-input response .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (previous frame) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
WO2005078706A1
CLAIM 67
. A method of switching from a first sound signal coding mode to a second sound signal coding mode at the junction between a previous frame (activity prediction parameter) coded according to the first coding mode and a current frame coded according to the second coding mode , wherein the sound signal is filtered through a weighting filter to produce , in the current frame , a weighted signal , comprising : calculating a zero-input response of the weighting filter ;
windowing the zero-input response so that said zero-input response has an amplitude monotonically decreasing to zero after a predetermined time period ;
and in the current frame , removing from the weighted signal the windowed zero-input response .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (previous frame) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
WO2005078706A1
CLAIM 67
. A method of switching from a first sound signal coding mode to a second sound signal coding mode at the junction between a previous frame (activity prediction parameter) coded according to the first coding mode and a current frame coded according to the second coding mode , wherein the sound signal is filtered through a weighting filter to produce , in the current frame , a weighted signal , comprising : calculating a zero-input response of the weighting filter ;
windowing the zero-input response so that said zero-input response has an amplitude monotonically decreasing to zero after a predetermined time period ;
and in the current frame , removing from the weighted signal the windowed zero-input response .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (bandwidth extension) into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value (last portion) for the first group of frequency bands and a second energy value (determined time period, following expressions) of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
WO2005078706A1
CLAIM 29
. A method for processing a received , coded sound signal as defined in claim 28 , wherein computing for each of the first K/s blocks , up to a position index of the block with maximum energy , a factor fac k comprises using the following expressions (second energy value) : faco = max((ε 0 /ε max)° 5' ;
, 0 . 1) faC f c= max((ε k /ε ma χf ' ;
5 , fac A-< ;
) for k=1 , . . . , K/s-1 , where ε is the energy of the block with index k .

WO2005078706A1
CLAIM 35
. An HF coding method for coding , through a bandwidth extension (frequency bands) scheme , an HF signal obtained from separation of a full-bandwidth sound signal into the HF signal and a LF signal , comprising : performing an LPC analysis on the LF and HF signals to produce LPC coefficients which model a spectral envelope of the LF and HF signals ;
calculating , from the LPC coefficients , an estimation of an HF matching gain ;
calculating the energy of the HF signal ;
processing the LF signal to produce a synthesized version of the HF signal ;
calculating the energy of the synthesized version of the HF signal ;
calculating a ratio between the calculated energy of the HF signal and the calculated energy of the synthesized version of the HF signal , and expressing the calculated ratio as an HF compensating gain ;
and calculating a difference between the estimation of the HF matching gain and the HF compensating gain to obtain a gain correction ;
— wherein the coded HF signal comprises the LPC parameters and the gain correction .

WO2005078706A1
CLAIM 67
. A method of switching from a first sound signal coding mode to a second sound signal coding mode at the junction between a previous frame coded according to the first coding mode and a current frame coded according to the second coding mode , wherein the sound signal is filtered through a weighting filter to produce , in the current frame , a weighted signal , comprising : calculating a zero-input response of the weighting filter ;
windowing the zero-input response so that said zero-input response has an amplitude monotonically decreasing to zero after a predetermined time period (second energy value) ;
and in the current frame , removing from the weighted signal the windowed zero-input response .

WO2005078706A1
CLAIM 85
. A method for producing an overlap-add target signal as defined in claim 82 , comprising saving in a buffer a last portion (average frame energy, first energy value) of samples of the current frame .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum (first wind) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
WO2005078706A1
CLAIM 88
. A device for producing from a decoded target signal an overlap-add target signal in a current frame coded according to a first coding mode , comprising : a first wind (current residual spectrum) ow generator for windowing the decoded target signal of the current frame in a given window ;
means for skipping a left portion of the window ;
a calculator of a zero-input response of a weighting filter of the previous frame coded according to a second coding mode , and a second window generator for windowing the zero-input response so that said zero-input response has an amplitude monotonically decreasing to zero after a predetermined time period ;
and an adder for adding the calculated zero-input response to the decoded target signal to reconstruct said overlap-add target signal .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum (first wind) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
WO2005078706A1
CLAIM 88
. A device for producing from a decoded target signal an overlap-add target signal in a current frame coded according to a first coding mode , comprising : a first wind (current residual spectrum) ow generator for windowing the decoded target signal of the current frame in a given window ;
means for skipping a left portion of the window ;
a calculator of a zero-input response of a weighting filter of the previous frame coded according to a second coding mode , and a second window generator for windowing the zero-input response so that said zero-input response has an amplitude monotonically decreasing to zero after a predetermined time period ;
and an adder for adding the calculated zero-input response to the decoded target signal to reconstruct said overlap-add target signal .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum (first wind) comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
WO2005078706A1
CLAIM 88
. A device for producing from a decoded target signal an overlap-add target signal in a current frame coded according to a first coding mode , comprising : a first wind (current residual spectrum) ow generator for windowing the decoded target signal of the current frame in a given window ;
means for skipping a left portion of the window ;
a calculator of a zero-input response of a weighting filter of the previous frame coded according to a second coding mode , and a second window generator for windowing the zero-input response so that said zero-input response has an amplitude monotonically decreasing to zero after a predetermined time period ;
and an adder for adding the calculated zero-input response to the decoded target signal to reconstruct said overlap-add target signal .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20050177363A1

Filed: 2005-02-07     Issued: 2005-08-11

Apparatus, method, and medium for detecting voiced sound and unvoiced sound

(Original Assignee) Samsung Electronics Co Ltd     (Current Assignee) Samsung Electronics Co Ltd

Kwangcheol Oh
US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (second threshold) indicative of sound activity in the sound signal .
US20050177363A1
CLAIM 3
. The method of claim 1 , wherein the determining of the voiced sound zone and the unvoiced sound zone comprises : comparing a first signal waveform obtained by applying the first parameter obtained from the slope to the input signal of the block and a first threshold value ;
comparing a second signal waveform obtained by applying the second parameter obtained from the slope and SFM to the input signal of the block and a second threshold (adaptive threshold) value ;
determining a zone , which has a value larger than the first threshold value in the first signal waveform as a result of the comparing of the first signal waveform and the first threshold value , as a voiced sound zone ;
and determining a zone , which has a value larger than the second threshold value in the second signal waveform as a result of the comparing of the second signal waveform and the second threshold value , as an unvoiced sound zone .

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (first slope) from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US20050177363A1
CLAIM 4
. The method of claim 3 , wherein the first parameter is obtained using a first slope (music signal) calculated at an entire frequency area of the mel-scaled filter bank spectrum .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal prevents updating of noise energy estimates when a music signal (first slope) is detected .
US20050177363A1
CLAIM 4
. The method of claim 3 , wherein the first parameter is obtained using a first slope (music signal) calculated at an entire frequency area of the mel-scaled filter bank spectrum .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal (first slope) from a background noise signal and prevent update of noise energy estimates on the music signal .
US20050177363A1
CLAIM 4
. The method of claim 3 , wherein the first parameter is obtained using a first slope (music signal) calculated at an entire frequency area of the mel-scaled filter bank spectrum .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame energy and an average frame (predetermined threshold values) energy .
US20050177363A1
CLAIM 1
. A method of detecting a voiced sound and an unvoiced sound , the method comprising : dividing an input signal into block units ;
calculating a slope and a spectral flatness measure (SFM) of a mel-scaled filter bank spectrum ;
calculating a first parameter to determine the voiced sound and a second parameter to determine the unvoiced sound by using the slope and the spectral flatness measure (SFM) of the mel-scaled filter bank spectrum of the input signal existing in a block ;
and determining a voiced sound zone and an unvoiced sound zone in the block by comparing the first and the second parameters to predetermined threshold values (average frame) .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (first threshold value) indicative of an activity of the sound signal .
US20050177363A1
CLAIM 3
. The method of claim 1 , wherein the determining of the voiced sound zone and the unvoiced sound zone comprises : comparing a first signal waveform obtained by applying the first parameter obtained from the slope to the input signal of the block and a first threshold value (activity prediction parameter) ;
comparing a second signal waveform obtained by applying the second parameter obtained from the slope and SFM to the input signal of the block and a second threshold value ;
determining a zone , which has a value larger than the first threshold value in the first signal waveform as a result of the comparing of the first signal waveform and the first threshold value , as a voiced sound zone ;
and determining a zone , which has a value larger than the second threshold value in the second signal waveform as a result of the comparing of the second signal waveform and the second threshold value , as an unvoiced sound zone .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (first threshold value) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
US20050177363A1
CLAIM 3
. The method of claim 1 , wherein the determining of the voiced sound zone and the unvoiced sound zone comprises : comparing a first signal waveform obtained by applying the first parameter obtained from the slope to the input signal of the block and a first threshold value (activity prediction parameter) ;
comparing a second signal waveform obtained by applying the second parameter obtained from the slope and SFM to the input signal of the block and a second threshold value ;
determining a zone , which has a value larger than the first threshold value in the first signal waveform as a result of the comparing of the first signal waveform and the first threshold value , as a voiced sound zone ;
and determining a zone , which has a value larger than the second threshold value in the second signal waveform as a result of the comparing of the second signal waveform and the second threshold value , as an unvoiced sound zone .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (first threshold value) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US20050177363A1
CLAIM 3
. The method of claim 1 , wherein the determining of the voiced sound zone and the unvoiced sound zone comprises : comparing a first signal waveform obtained by applying the first parameter obtained from the slope to the input signal of the block and a first threshold value (activity prediction parameter) ;
comparing a second signal waveform obtained by applying the second parameter obtained from the slope and SFM to the input signal of the block and a second threshold value ;
determining a zone , which has a value larger than the first threshold value in the first signal waveform as a result of the comparing of the first signal waveform and the first threshold value , as a voiced sound zone ;
and determining a zone , which has a value larger than the second threshold value in the second signal waveform as a result of the comparing of the second signal waveform and the second threshold value , as an unvoiced sound zone .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (first slope) from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US20050177363A1
CLAIM 4
. The method of claim 3 , wherein the first parameter is obtained using a first slope (music signal) calculated at an entire frequency area of the mel-scaled filter bank spectrum .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal (first slope) from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US20050177363A1
CLAIM 4
. The method of claim 3 , wherein the first parameter is obtained using a first slope (music signal) calculated at an entire frequency area of the mel-scaled filter bank spectrum .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal (third slope) to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US20050177363A1
CLAIM 6
. The method of claim 3 , wherein the first parameter is obtained using a first slope calculated at an entire frequency area of the mel-scaled filter bank spectrum , a second slope calculated at a predetermined low frequency area of the entire frequency area , and a third slope (average signal) calculated at a predetermined high frequency area of the entire frequency area .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal (first slope) from a background noise signal and preventing update of noise energy estimates .
US20050177363A1
CLAIM 4
. The method of claim 3 , wherein the first parameter is obtained using a first slope (music signal) calculated at an entire frequency area of the mel-scaled filter bank spectrum .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20050177364A1

Filed: 2005-01-19     Issued: 2005-08-11

Methods and devices for source controlled variable bit-rate wideband speech coding

(Original Assignee) Nokia Oyj     (Current Assignee) Nokia Technologies Oy

Milan Jelinek
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (speech encoder) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (third threshold) , and an initial value (initial value) of the long term correlation map .
US20050177364A1
CLAIM 18
. A method according to claim 16 , comprising computing the long-term average energy value according to the formula : {overscore (E)} f =0 . 99 {overscore (E)} f +0 . 01 E t where {overscore (E)} f has an initial value (initial value) of 45 dB .

US20050177364A1
CLAIM 27
. A method according to claim 26 , wherein when the sampled speech signal is encoded in Premium mode and the current frame is classified as an unvoiced frame , the current frame is encoded at said half-rate encoding bit-rate when the following conditions are fulfilled : said voicing measure is smaller than a predetermined first threshold value ;
and said spectral tilt measure is smaller than a predetermined second threshold value ;
and said energy variation is smaller than a predetermined third threshold (current frame, current frame energy) value .

US20050177364A1
CLAIM 67
. A speech encoder (frequency spectrum, noise estimator) , responsive to a current frame being classified as an active speech frame , for encoding said current frame using an unvoiced signal coding algorithm , wherein an active speech frame is further classified as an active unvoiced speech frame by examining at least three parameters selected from the set : a voicing measure (r x , {overscore (r)} x) , a spectral tilt measure (e tilt , e t) , an energy variation within the current frame (dE) , and a relative energy of the current frame (E rel) .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (speech encoder) of the sound signal in the current frame (third threshold) ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US20050177364A1
CLAIM 27
. A method according to claim 26 , wherein when the sampled speech signal is encoded in Premium mode and the current frame is classified as an unvoiced frame , the current frame is encoded at said half-rate encoding bit-rate when the following conditions are fulfilled : said voicing measure is smaller than a predetermined first threshold value ;
and said spectral tilt measure is smaller than a predetermined second threshold value ;
and said energy variation is smaller than a predetermined third threshold (current frame, current frame energy) value .

US20050177364A1
CLAIM 67
. A speech encoder (frequency spectrum, noise estimator) , responsive to a current frame being classified as an active speech frame , for encoding said current frame using an unvoiced signal coding algorithm , wherein an active speech frame is further classified as an active unvoiced speech frame by examining at least three parameters selected from the set : a voicing measure (r x , {overscore (r)} x) , a spectral tilt measure (e tilt , e t) , an energy variation within the current frame (dE) , and a relative energy of the current frame (E rel) .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (frequency bins) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US20050177364A1
CLAIM 10
. A method according to claim 8 , further comprising determining a speech pitch period and for speech pitch periods shorter than a predetermined value , computing the low frequency energy measure ({overscore (E)} l) by summing the energy within frequency bins (frequency bins) resulting from spectral analysis of the current frame and only frequency bins sufficiently close to the speech harmonics are taken into account in the summation according to the formula : E l = 1 cnt ⁢ ∑ k = K min 24 ⁢ E BIN ⁡ (k) ⁢ w h ⁡ (k) where E BIN (k) are energies within frequency bins , K min is the index of the first frequency bin taken into account in the summation , cnt is the number of non-zero terms in the summation , and w h (k) is set to 1 when the distance between the frequency bin and the nearest harmonic is not larger than a predetermined frequency threshold and w h (k) is set to zero otherwise .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (frequency bins) so as to produce a summed long-term correlation map .
US20050177364A1
CLAIM 10
. A method according to claim 8 , further comprising determining a speech pitch period and for speech pitch periods shorter than a predetermined value , computing the low frequency energy measure ({overscore (E)} l) by summing the energy within frequency bins (frequency bins) resulting from spectral analysis of the current frame and only frequency bins sufficiently close to the speech harmonics are taken into account in the summation according to the formula : E l = 1 cnt ⁢ ∑ k = K min 24 ⁢ E BIN ⁡ (k) ⁢ w h ⁡ (k) where E BIN (k) are energies within frequency bins , K min is the index of the first frequency bin taken into account in the summation , cnt is the number of non-zero terms in the summation , and w h (k) is set to 1 when the distance between the frequency bin and the nearest harmonic is not larger than a predetermined frequency threshold and w h (k) is set to zero otherwise .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map for frequency bins (frequency bins) having a magnitude that exceeds a given fixed threshold .
US20050177364A1
CLAIM 10
. A method according to claim 8 , further comprising determining a speech pitch period and for speech pitch periods shorter than a predetermined value , computing the low frequency energy measure ({overscore (E)} l) by summing the energy within frequency bins (frequency bins) resulting from spectral analysis of the current frame and only frequency bins sufficiently close to the speech harmonics are taken into account in the summation according to the formula : E l = 1 cnt ⁢ ∑ k = K min 24 ⁢ E BIN ⁡ (k) ⁢ w h ⁡ (k) where E BIN (k) are energies within frequency bins , K min is the index of the first frequency bin taken into account in the summation , cnt is the number of non-zero terms in the summation , and w h (k) is set to 1 when the distance between the frequency bin and the nearest harmonic is not larger than a predetermined frequency threshold and w h (k) is set to zero otherwise .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (second threshold) indicative of sound activity (high frequencies) in the sound signal .
US20050177364A1
CLAIM 6
. A method according to claim 2 , wherein the spectral tilt is proportional to a ratio between the energy of the current frame at low frequencies and the energy of the current frame at high frequencies (sound activity) .

US20050177364A1
CLAIM 27
. A method according to claim 26 , wherein when the sampled speech signal is encoded in Premium mode and the current frame is classified as an unvoiced frame , the current frame is encoded at said half-rate encoding bit-rate when the following conditions are fulfilled : said voicing measure is smaller than a predetermined first threshold value ;
and said spectral tilt measure is smaller than a predetermined second threshold (adaptive threshold) value ;
and said energy variation is smaller than a predetermined third threshold value .

US8990073B2
CLAIM 10
. A method for detecting sound activity (high frequencies) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US20050177364A1
CLAIM 6
. A method according to claim 2 , wherein the spectral tilt is proportional to a ratio between the energy of the current frame at low frequencies and the energy of the current frame at high frequencies (sound activity) .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity (high frequencies) in the sound signal further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
US20050177364A1
CLAIM 6
. A method according to claim 2 , wherein the spectral tilt is proportional to a ratio between the energy of the current frame at low frequencies and the energy of the current frame at high frequencies (sound activity) .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (high frequencies) detection comprises detecting the sound signal based on a frequency dependent signal-to-noise ratio (SNR) .
US20050177364A1
CLAIM 6
. A method according to claim 2 , wherein the spectral tilt is proportional to a ratio between the energy of the current frame at low frequencies and the energy of the current frame at high frequencies (sound activity) .

US8990073B2
CLAIM 14
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (high frequencies) detection comprises comparing an average signal-to-noise ratio (SNR av ) to a threshold calculated as a function of a long-term signal-to-noise ratio (SNR LT ) .
US20050177364A1
CLAIM 6
. A method according to claim 2 , wherein the spectral tilt is proportional to a ratio between the energy of the current frame at low frequencies and the energy of the current frame at high frequencies (sound activity) .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity (high frequencies) detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
US20050177364A1
CLAIM 6
. A method according to claim 2 , wherein the spectral tilt is proportional to a ratio between the energy of the current frame at low frequencies and the energy of the current frame at high frequencies (sound activity) .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity (high frequencies) detection further comprises updating the noise estimates for a next frame (when r) .
US20050177364A1
CLAIM 6
. A method according to claim 2 , wherein the spectral tilt is proportional to a ratio between the energy of the current frame at low frequencies and the energy of the current frame at high frequencies (sound activity) .

US20050177364A1
CLAIM 12
. A method according to claim 8 , further comprising identifying an a priori unvoiced sound when r (next frame) x (0)+ r x (1)+ r e < ;
0 . 6 and computing the low frequency energy measure ({overscore (E)} i) according to the formula : E l = 1 i ⁢ ∑ k = 0 i - 1 ⁢ E CB ⁡ (k) where E CB (k) is the energy of perceptual critical band k .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame (when r) comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error energies (Comfort Noise) .
US20050177364A1
CLAIM 1
. A source-controlled Variable bit-rate Multi-mode WideBand (VMR-WB) codec comprising a unit operable with an Adaptive Multi-Rate wideband (AMR-WB) codec , where in a VMR-WB encoding/AMR-WB decoding case , speech frames are encoded in an AMR-WB interoperable mode of a VMR-WB encoder using one of bit rates corresponding to Interoperable-Full Rate (I-FR) for active speech frames , Interoperable-Half Rate (I-HR) at least for dim-and-burst signaling , Quarter Rate-Comfort Noise (linear prediction residual error energies) Generator (CNG-QR) to encode at least relevant background noise frames and Eighth Rate-Comfort Noise Generator (CNG-ER) frames for background noise frames not encoded as CNG-QR frames , said unit responsive to a case that voice activity is not detected for using CNG-ER encoding , further responsive to a case that voice activity is detected , and responsive to a voiced versus unvoiced classification such that if a frame is classified as unvoiced , the frame is encoded with one of Unvoiced HR or Unvoiced QR encoding , further responsive to a frame not being classified as unvoiced for using a stable voiced classification , and if the frame is classified as stable voiced , encoded the frame using Voiced HR encoding , else assuming the frame to likely contain a non-stationary speech segment for using an appropriate FR encoding , whereas a frame with low energy , and not detected as at least a background or an unvoiced frame , is encoded using generic HR coding to reduce the average data rate ;
an unvoiced classification decision being based on at least some of a voicing measure {overscore (r)} x , a spectral tilt e t , an energy variation within a frame dE , and a relative frame energy E rel , where decision thresholds are set based at least in part on an operating mode comprising a required average data rate .

US20050177364A1
CLAIM 12
. A method according to claim 8 , further comprising identifying an a priori unvoiced sound when r (next frame) x (0)+ r x (1)+ r e < ;
0 . 6 and computing the low frequency energy measure ({overscore (E)} i) according to the formula : E l = 1 i ⁢ ∑ k = 0 i - 1 ⁢ E CB ⁡ (k) where E CB (k) is the energy of perceptual critical band k .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal prevents updating (frequency noise) of noise energy estimates when a music signal is detected .
US20050177364A1
CLAIM 13
. A method according to claim 7 , further comprising : computing a measure (N h) representative of a noise energy of the current frame at high frequencies by calculating an average of the noise energies of the last two perceptual critical bands ;
computing a measure (N l) representative of a noise energy of the current frame at low frequencies by calculating an average of the noise energies in the first i perceptual critical bands ;
subtracting the high frequency noise (average signal, sound signal prevents updating) measure (N h) from the high frequency energy measure ({overscore (E)} h) to obtain a high frequency energy (E h) ;
subtracting the low frequency noise measure (N l) from the low frequency energy measure ({overscore (E)} l) to obtain a low frequency energy (E l) ;
and computing the spectral tilt measure (e tilt) as a ratio of the low frequency energy (E t) divided by the high frequency energy (E h) .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame (third threshold) energy and an average frame energy .
US20050177364A1
CLAIM 27
. A method according to claim 26 , wherein when the sampled speech signal is encoded in Premium mode and the current frame is classified as an unvoiced frame , the current frame is encoded at said half-rate encoding bit-rate when the following conditions are fulfilled : said voicing measure is smaller than a predetermined first threshold value ;
and said spectral tilt measure is smaller than a predetermined second threshold value ;
and said energy variation is smaller than a predetermined third threshold (current frame, current frame energy) value .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame (third threshold) and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US20050177364A1
CLAIM 27
. A method according to claim 26 , wherein when the sampled speech signal is encoded in Premium mode and the current frame is classified as an unvoiced frame , the current frame is encoded at said half-rate encoding bit-rate when the following conditions are fulfilled : said voicing measure is smaller than a predetermined first threshold value ;
and said spectral tilt measure is smaller than a predetermined second threshold value ;
and said energy variation is smaller than a predetermined third threshold (current frame, current frame energy) value .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (first threshold value, previous frame) indicative of an activity of the sound signal .
US20050177364A1
CLAIM 15
. A method according to claim 14 , further comprising computing an average spectral tilt (e t) according to the formula : e l = 1 3 ⁢ (e old + e tilt ⁡ (0) + e tilt ⁡ (1)) where e old is a spectral tilt measure obtained from spectral analysis of the second half of the previous frame (activity prediction parameter) .

US20050177364A1
CLAIM 27
. A method according to claim 26 , wherein when the sampled speech signal is encoded in Premium mode and the current frame is classified as an unvoiced frame , the current frame is encoded at said half-rate encoding bit-rate when the following conditions are fulfilled : said voicing measure is smaller than a predetermined first threshold value (activity prediction parameter) ;
and said spectral tilt measure is smaller than a predetermined second threshold value ;
and said energy variation is smaller than a predetermined third threshold value .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (first threshold value, previous frame) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
US20050177364A1
CLAIM 15
. A method according to claim 14 , further comprising computing an average spectral tilt (e t) according to the formula : e l = 1 3 ⁢ (e old + e tilt ⁡ (0) + e tilt ⁡ (1)) where e old is a spectral tilt measure obtained from spectral analysis of the second half of the previous frame (activity prediction parameter) .

US20050177364A1
CLAIM 27
. A method according to claim 26 , wherein when the sampled speech signal is encoded in Premium mode and the current frame is classified as an unvoiced frame , the current frame is encoded at said half-rate encoding bit-rate when the following conditions are fulfilled : said voicing measure is smaller than a predetermined first threshold value (activity prediction parameter) ;
and said spectral tilt measure is smaller than a predetermined second threshold value ;
and said energy variation is smaller than a predetermined third threshold value .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (first threshold value, previous frame) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US20050177364A1
CLAIM 15
. A method according to claim 14 , further comprising computing an average spectral tilt (e t) according to the formula : e l = 1 3 ⁢ (e old + e tilt ⁡ (0) + e tilt ⁡ (1)) where e old is a spectral tilt measure obtained from spectral analysis of the second half of the previous frame (activity prediction parameter) .

US20050177364A1
CLAIM 27
. A method according to claim 26 , wherein when the sampled speech signal is encoded in Premium mode and the current frame is classified as an unvoiced frame , the current frame is encoded at said half-rate encoding bit-rate when the following conditions are fulfilled : said voicing measure is smaller than a predetermined first threshold value (activity prediction parameter) ;
and said spectral tilt measure is smaller than a predetermined second threshold value ;
and said energy variation is smaller than a predetermined third threshold value .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency (first frequency) bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20050177364A1
CLAIM 10
. A method according to claim 8 , further comprising determining a speech pitch period and for speech pitch periods shorter than a predetermined value , computing the low frequency energy measure ({overscore (E)} l) by summing the energy within frequency bins resulting from spectral analysis of the current frame and only frequency bins sufficiently close to the speech harmonics are taken into account in the summation according to the formula : E l = 1 cnt ⁢ ∑ k = K min 24 ⁢ E BIN ⁡ (k) ⁢ w h ⁡ (k) where E BIN (k) are energies within frequency bins , K min is the index of the first frequency (first frequency) bin taken into account in the summation , cnt is the number of non-zero terms in the summation , and w h (k) is set to 1 when the distance between the frequency bin and the nearest harmonic is not larger than a predetermined frequency threshold and w h (k) is set to zero otherwise .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (speech encoder) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (third threshold) , and an initial value (initial value) of the long-term correlation map .
US20050177364A1
CLAIM 18
. A method according to claim 16 , comprising computing the long-term average energy value according to the formula : {overscore (E)} f =0 . 99 {overscore (E)} f +0 . 01 E t where {overscore (E)} f has an initial value (initial value) of 45 dB .

US20050177364A1
CLAIM 27
. A method according to claim 26 , wherein when the sampled speech signal is encoded in Premium mode and the current frame is classified as an unvoiced frame , the current frame is encoded at said half-rate encoding bit-rate when the following conditions are fulfilled : said voicing measure is smaller than a predetermined first threshold value ;
and said spectral tilt measure is smaller than a predetermined second threshold value ;
and said energy variation is smaller than a predetermined third threshold (current frame, current frame energy) value .

US20050177364A1
CLAIM 67
. A speech encoder (frequency spectrum, noise estimator) , responsive to a current frame being classified as an active speech frame , for encoding said current frame using an unvoiced signal coding algorithm , wherein an active speech frame is further classified as an active unvoiced speech frame by examining at least three parameters selected from the set : a voicing measure (r x , {overscore (r)} x) , a spectral tilt measure (e tilt , e t) , an energy variation within the current frame (dE) , and a relative energy of the current frame (E rel) .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (speech encoder) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (third threshold) , and an initial value (initial value) of the long-term correlation map .
US20050177364A1
CLAIM 18
. A method according to claim 16 , comprising computing the long-term average energy value according to the formula : {overscore (E)} f =0 . 99 {overscore (E)} f +0 . 01 E t where {overscore (E)} f has an initial value (initial value) of 45 dB .

US20050177364A1
CLAIM 27
. A method according to claim 26 , wherein when the sampled speech signal is encoded in Premium mode and the current frame is classified as an unvoiced frame , the current frame is encoded at said half-rate encoding bit-rate when the following conditions are fulfilled : said voicing measure is smaller than a predetermined first threshold value ;
and said spectral tilt measure is smaller than a predetermined second threshold value ;
and said energy variation is smaller than a predetermined third threshold (current frame, current frame energy) value .

US20050177364A1
CLAIM 67
. A speech encoder (frequency spectrum, noise estimator) , responsive to a current frame being classified as an active speech frame , for encoding said current frame using an unvoiced signal coding algorithm , wherein an active speech frame is further classified as an active unvoiced speech frame by examining at least three parameters selected from the set : a voicing measure (r x , {overscore (r)} x) , a spectral tilt measure (e tilt , e t) , an energy variation within the current frame (dE) , and a relative energy of the current frame (E rel) .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (speech encoder) of the sound signal in the current frame (third threshold) ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US20050177364A1
CLAIM 27
. A method according to claim 26 , wherein when the sampled speech signal is encoded in Premium mode and the current frame is classified as an unvoiced frame , the current frame is encoded at said half-rate encoding bit-rate when the following conditions are fulfilled : said voicing measure is smaller than a predetermined first threshold value ;
and said spectral tilt measure is smaller than a predetermined second threshold value ;
and said energy variation is smaller than a predetermined third threshold (current frame, current frame energy) value .

US20050177364A1
CLAIM 67
. A speech encoder (frequency spectrum, noise estimator) , responsive to a current frame being classified as an active speech frame , for encoding said current frame using an unvoiced signal coding algorithm , wherein an active speech frame is further classified as an active unvoiced speech frame by examining at least three parameters selected from the set : a voicing measure (r x , {overscore (r)} x) , a spectral tilt measure (e tilt , e t) , an energy variation within the current frame (dE) , and a relative energy of the current frame (E rel) .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (frequency bins) so as to produce a summed long-term correlation map .
US20050177364A1
CLAIM 10
. A method according to claim 8 , further comprising determining a speech pitch period and for speech pitch periods shorter than a predetermined value , computing the low frequency energy measure ({overscore (E)} l) by summing the energy within frequency bins (frequency bins) resulting from spectral analysis of the current frame and only frequency bins sufficiently close to the speech harmonics are taken into account in the summation according to the formula : E l = 1 cnt ⁢ ∑ k = K min 24 ⁢ E BIN ⁡ (k) ⁢ w h ⁡ (k) where E BIN (k) are energies within frequency bins , K min is the index of the first frequency bin taken into account in the summation , cnt is the number of non-zero terms in the summation , and w h (k) is set to 1 when the distance between the frequency bin and the nearest harmonic is not larger than a predetermined frequency threshold and w h (k) is set to zero otherwise .

US8990073B2
CLAIM 35
. A device for detecting sound activity (high frequencies) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US20050177364A1
CLAIM 6
. A method according to claim 2 , wherein the spectral tilt is proportional to a ratio between the energy of the current frame at low frequencies and the energy of the current frame at high frequencies (sound activity) .

US8990073B2
CLAIM 36
. A device for detecting sound activity (high frequencies) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US20050177364A1
CLAIM 6
. A method according to claim 2 , wherein the spectral tilt is proportional to a ratio between the energy of the current frame at low frequencies and the energy of the current frame at high frequencies (sound activity) .

US8990073B2
CLAIM 37
. A device as defined in claim 36 , further comprising a signal-to-noise ratio (SNR)-based sound activity (high frequencies) detector .
US20050177364A1
CLAIM 6
. A method according to claim 2 , wherein the spectral tilt is proportional to a ratio between the energy of the current frame at low frequencies and the energy of the current frame at high frequencies (sound activity) .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity (high frequencies) detector comprises a comparator of an average signal (frequency noise) to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US20050177364A1
CLAIM 6
. A method according to claim 2 , wherein the spectral tilt is proportional to a ratio between the energy of the current frame at low frequencies and the energy of the current frame at high frequencies (sound activity) .

US20050177364A1
CLAIM 13
. A method according to claim 7 , further comprising : computing a measure (N h) representative of a noise energy of the current frame at high frequencies by calculating an average of the noise energies of the last two perceptual critical bands ;
computing a measure (N l) representative of a noise energy of the current frame at low frequencies by calculating an average of the noise energies in the first i perceptual critical bands ;
subtracting the high frequency noise (average signal, sound signal prevents updating) measure (N h) from the high frequency energy measure ({overscore (E)} h) to obtain a high frequency energy (E h) ;
subtracting the low frequency noise measure (N l) from the low frequency energy measure ({overscore (E)} l) to obtain a low frequency energy (E l) ;
and computing the spectral tilt measure (e tilt) as a ratio of the low frequency energy (E t) divided by the high frequency energy (E h) .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator (speech encoder) for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity (high frequencies) detector .
US20050177364A1
CLAIM 6
. A method according to claim 2 , wherein the spectral tilt is proportional to a ratio between the energy of the current frame at low frequencies and the energy of the current frame at high frequencies (sound activity) .

US20050177364A1
CLAIM 67
. A speech encoder (frequency spectrum, noise estimator) , responsive to a current frame being classified as an active speech frame , for encoding said current frame using an unvoiced signal coding algorithm , wherein an active speech frame is further classified as an active unvoiced speech frame by examining at least three parameters selected from the set : a voicing measure (r x , {overscore (r)} x) , a spectral tilt measure (e tilt , e t) , an energy variation within the current frame (dE) , and a relative energy of the current frame (E rel) .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20050267746A1

Filed: 2005-01-19     Issued: 2005-12-01

Method for interoperation between adaptive multi-rate wideband (AMR-WB) and multi-mode variable bit-rate wideband (VMR-WB) codecs

(Original Assignee) Nokia Oyj     (Current Assignee) Nokia Technologies Oy

Milan Jelinek, Redwan Salami
US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error energies (Comfort Noise) .
US20050267746A1
CLAIM 1
. An interworking function , comprising a unit operable with a source-controlled Variable bit-rate Multi-mode WideBand (VMR-WB) codec providing a mode of operation that is interoperable with an Adaptive Multi-Rate wideband (AMR-WB) codec , where in a VMR-WB encoding/AMR-WB decoding case , speech frames are encoded in an AMR-WB interoperable mode of a VMR-WB encoder using one of bit rates corresponding to Interoperable-Full Rate (I-FR) for active speech frames , Interoperable-Half Rate (I-HR) at least for dim-and-burst signaling , Quarter Rate-Comfort Noise (linear prediction residual error energies) Generator (CNG-QR) to encode at least relevant background noise frames and Eighth Rate-Comfort Noise Generator (CNG-ER) frames for background noise frames not encoded as CNG-QR frames , said interworking function operable such that , invalid frames are transmitted to an AMR-WB decoder as erased frames ;
I-FR frames are transmitted to the AMR-WB decoder as 12 . 65 , 8 . 85 or 6 . 60 kbps AMR-WB frames depending on the I-FR type ;
CNG-QR frames are transmitted to the AMR-WB decoder as Silence Descriptor Update (SID UPDATE) frames ;
CNG-ER frames are transmitted to the AMR-WB decoder as NO DATA frames ;
and I-HR frames are translated to 12 . 65 , 8 . 85 , or 6 . 60 kbps frames , depending on the frame type , by generating missing algebraic codebook indices , where bits indicating the I-HR type are discarded .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency (algebraic codebook) bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20050267746A1
CLAIM 1
. An interworking function , comprising a unit operable with a source-controlled Variable bit-rate Multi-mode WideBand (VMR-WB) codec providing a mode of operation that is interoperable with an Adaptive Multi-Rate wideband (AMR-WB) codec , where in a VMR-WB encoding/AMR-WB decoding case , speech frames are encoded in an AMR-WB interoperable mode of a VMR-WB encoder using one of bit rates corresponding to Interoperable-Full Rate (I-FR) for active speech frames , Interoperable-Half Rate (I-HR) at least for dim-and-burst signaling , Quarter Rate-Comfort Noise Generator (CNG-QR) to encode at least relevant background noise frames and Eighth Rate-Comfort Noise Generator (CNG-ER) frames for background noise frames not encoded as CNG-QR frames , said interworking function operable such that , invalid frames are transmitted to an AMR-WB decoder as erased frames ;
I-FR frames are transmitted to the AMR-WB decoder as 12 . 65 , 8 . 85 or 6 . 60 kbps AMR-WB frames depending on the I-FR type ;
CNG-QR frames are transmitted to the AMR-WB decoder as Silence Descriptor Update (SID UPDATE) frames ;
CNG-ER frames are transmitted to the AMR-WB decoder as NO DATA frames ;
and I-HR frames are translated to 12 . 65 , 8 . 85 , or 6 . 60 kbps frames , depending on the frame type , by generating missing algebraic codebook (first frequency, first frequency bands) indices , where bits indicating the I-HR type are discarded .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
JP2005173607A

Filed: 2004-12-08     Issued: 2005-06-30

時間的に離散した音声信号のアップサンプリングした信号を発生する方法と装置

(Original Assignee) Coding Technologies Ab; コーディング テクノロジーズ アクチボラゲット     

Per Rune Albin Ekstrand, Lars Fredrik Henn, Hans Magnus Kristofer Kjoerling, Lars Gustaf Liljeryd, エクストランド,ペル,ルネ,アルビン, クヨルリング,ハンス,マグヌス,クリストフエル, ヘン,ラルス,フレドリック, リルイエリド,ラルス,グスタフ
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (のフィルタ) , the correlation map of a current frame , and an initial value of the long term correlation map .
JP2005173607A
CLAIM 1
時間的に離散した音声信号のアップサンプリングした信号を発生する方法であって、該音声信号が第1のサンプリングレートによりサンプリングされる方法において、前記方法は、 音声信号の分析された信号を与えるステップであって、該分析された信号は、L個のチャネル分析フィルタバンクにより得られたL個の分析サブバンド信号を含み、該LはL個のチャネル分析フィルタバンクのフィルタ (update factor) バンクチャネルの個数を表している前記与えるステップと、 L個の低域チャネルとL(Q−1)個の高域チャネルとを有するQL個のチャネル合成フィルタバンクを使用して音声信号の分析された信号をフィルタリングして、該時間的に離散した音声信号のアップサンプリングした信号を得るステップであって、該アップサンプリングした信号が前記第1のサンプリングレートにQ(Qは係数)を乗じた値である第2のサンプリングレートを有する前記フィルタリングするステップとを有し、 前記フィルタリングのステップにおいて、前記合成フィルタバンクのL個の低域チャネルのみを使用することによって、音声信号のアップサンプリングされた信号は該音声信号と同一の帯域幅を持つようにし、 前記フィルタリングのステップの前に、L個の低域チャネルのサブバンド信号の数を高域チャネルの数にパッチングして、音声信号のアップサンプリングした信号の帯域を拡大する ことを特徴とする前記方法。

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum comprises locating a maximum between each pair of two consecutive minima (与える手段) of the current residual spectrum .
JP2005173607A
CLAIM 9
時間的に離散した音声信号のアップサンプリングした信号を発生する装置であって、該音声信号が第1のサンプリングレートによりサンプリングされる装置において、前記装置は、 音声信号の分析された信号を与えるステップであって、該分析された信号は、L個のチャネル分析フィルタバンクにより得られたL個の分析サブバンド信号を含み、該LはL個のチャネル分析フィルタバンクのフィルタバンクチャネルの個数を表している前記与える手段 (consecutive minima) と、 音声信号の分析された信号をフィルタリングして、該時間的に離散した音声信号のアップサンプリングした信号を得るL個の低域チャネルとL(Q−1)個の高域チャネルとを有するQL個のチャネル合成フィルタバンクであって、該アップサンプリングした信号が前記第1のサンプリングレートにQ(Qは係数)を乗じた値である第2のサンプリングレートを有する前記フィルタリングするQL個のチャネル合成フィルタバンクとを有し、 前記合成フィルタバンクのL個の低域チャネルのみを使用することによって、音声信号のアップサンプリングされた信号は該音声信号と同一の帯域幅を持つようにし、 L個の低域チャネルのサブバンド信号の数を高域チャネルの数にパッチングして、音声信号のアップサンプリングした信号の帯域を拡大する ことを特徴とする前記装置。

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins between two consecutive minima (与える手段) in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
JP2005173607A
CLAIM 9
時間的に離散した音声信号のアップサンプリングした信号を発生する装置であって、該音声信号が第1のサンプリングレートによりサンプリングされる装置において、前記装置は、 音声信号の分析された信号を与えるステップであって、該分析された信号は、L個のチャネル分析フィルタバンクにより得られたL個の分析サブバンド信号を含み、該LはL個のチャネル分析フィルタバンクのフィルタバンクチャネルの個数を表している前記与える手段 (consecutive minima) と、 音声信号の分析された信号をフィルタリングして、該時間的に離散した音声信号のアップサンプリングした信号を得るL個の低域チャネルとL(Q−1)個の高域チャネルとを有するQL個のチャネル合成フィルタバンクであって、該アップサンプリングした信号が前記第1のサンプリングレートにQ(Qは係数)を乗じた値である第2のサンプリングレートを有する前記フィルタリングするQL個のチャネル合成フィルタバンクとを有し、 前記合成フィルタバンクのL個の低域チャネルのみを使用することによって、音声信号のアップサンプリングされた信号は該音声信号と同一の帯域幅を持つようにし、 L個の低域チャネルのサブバンド信号の数を高域チャネルの数にパッチングして、音声信号のアップサンプリングした信号の帯域を拡大する ことを特徴とする前記装置。

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound signal (サブバンド) is detected .
JP2005173607A
CLAIM 1
時間的に離散した音声信号のアップサンプリングした信号を発生する方法であって、該音声信号が第1のサンプリングレートによりサンプリングされる方法において、前記方法は、 音声信号の分析された信号を与えるステップであって、該分析された信号は、L個のチャネル分析フィルタバンクにより得られたL個の分析サブバンド (tonal sound signal) 信号を含み、該LはL個のチャネル分析フィルタバンクのフィルタバンクチャネルの個数を表している前記与えるステップと、 L個の低域チャネルとL(Q−1)個の高域チャネルとを有するQL個のチャネル合成フィルタバンクを使用して音声信号の分析された信号をフィルタリングして、該時間的に離散した音声信号のアップサンプリングした信号を得るステップであって、該アップサンプリングした信号が前記第1のサンプリングレートにQ(Qは係数)を乗じた値である第2のサンプリングレートを有する前記フィルタリングするステップとを有し、 前記フィルタリングのステップにおいて、前記合成フィルタバンクのL個の低域チャネルのみを使用することによって、音声信号のアップサンプリングされた信号は該音声信号と同一の帯域幅を持つようにし、 前記フィルタリングのステップの前に、L個の低域チャネルのサブバンド信号の数を高域チャネルの数にパッチングして、音声信号のアップサンプリングした信号の帯域を拡大する ことを特徴とする前記方法。

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error (メーション) energies .
JP2005173607A
CLAIM 2
請求項1に記載の時間的に離散した音声信号のアップサンプリングした信号を発生する方法において、前記与えるステップにおいて、 時間的に離散した音声信号をL個のチャネル分析フィルタバンクに供給し、L個のチャネルの1つがバンドパスフィルタとそれの後に接続されるデシメータとを含み、該デシメータがLに等しいデシメーション (residual error) 係数を有することを特徴とする前記方法。

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (のフィルタ) , the correlation map of a current frame , and an initial value of the long-term correlation map .
JP2005173607A
CLAIM 1
時間的に離散した音声信号のアップサンプリングした信号を発生する方法であって、該音声信号が第1のサンプリングレートによりサンプリングされる方法において、前記方法は、 音声信号の分析された信号を与えるステップであって、該分析された信号は、L個のチャネル分析フィルタバンクにより得られたL個の分析サブバンド信号を含み、該LはL個のチャネル分析フィルタバンクのフィルタ (update factor) バンクチャネルの個数を表している前記与えるステップと、 L個の低域チャネルとL(Q−1)個の高域チャネルとを有するQL個のチャネル合成フィルタバンクを使用して音声信号の分析された信号をフィルタリングして、該時間的に離散した音声信号のアップサンプリングした信号を得るステップであって、該アップサンプリングした信号が前記第1のサンプリングレートにQ(Qは係数)を乗じた値である第2のサンプリングレートを有する前記フィルタリングするステップとを有し、 前記フィルタリングのステップにおいて、前記合成フィルタバンクのL個の低域チャネルのみを使用することによって、音声信号のアップサンプリングされた信号は該音声信号と同一の帯域幅を持つようにし、 前記フィルタリングのステップの前に、L個の低域チャネルのサブバンド信号の数を高域チャネルの数にパッチングして、音声信号のアップサンプリングした信号の帯域を拡大する ことを特徴とする前記方法。

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (のフィルタ) , the correlation map of a current frame , and an initial value of the long-term correlation map .
JP2005173607A
CLAIM 1
時間的に離散した音声信号のアップサンプリングした信号を発生する方法であって、該音声信号が第1のサンプリングレートによりサンプリングされる方法において、前記方法は、 音声信号の分析された信号を与えるステップであって、該分析された信号は、L個のチャネル分析フィルタバンクにより得られたL個の分析サブバンド信号を含み、該LはL個のチャネル分析フィルタバンクのフィルタ (update factor) バンクチャネルの個数を表している前記与えるステップと、 L個の低域チャネルとL(Q−1)個の高域チャネルとを有するQL個のチャネル合成フィルタバンクを使用して音声信号の分析された信号をフィルタリングして、該時間的に離散した音声信号のアップサンプリングした信号を得るステップであって、該アップサンプリングした信号が前記第1のサンプリングレートにQ(Qは係数)を乗じた値である第2のサンプリングレートを有する前記フィルタリングするステップとを有し、 前記フィルタリングのステップにおいて、前記合成フィルタバンクのL個の低域チャネルのみを使用することによって、音声信号のアップサンプリングされた信号は該音声信号と同一の帯域幅を持つようにし、 前記フィルタリングのステップの前に、L個の低域チャネルのサブバンド信号の数を高域チャネルの数にパッチングして、音声信号のアップサンプリングした信号の帯域を拡大する ことを特徴とする前記方法。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20050165603A1

Filed: 2004-11-23     Issued: 2005-07-28

Method and device for frequency-selective pitch enhancement of synthesized speech

(Original Assignee) VoiceAge Corp     (Current Assignee) VoiceAge Corp

Bruno Bessette, Claude Laflamme, Milan Jelinek, Roch Lefebvre
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum (-off frequency, lower band) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US20050165603A1
CLAIM 27
. A post-processing method as defined in claim 26 , further comprising adding the frequency upper-band signal with the frequency lower band (current residual spectrum, pole filter) signal to form an output post-processed and up-sampled decoded sound signal .

US20050165603A1
CLAIM 31
. A post-processing method as defined in claim 1 , wherein applying post-processing to said at least one of the frequency sub-band signals comprises : determining a pitch value of the decoded sound signal ;
calculating , in relation to the determined pitch value , a high-pass filter with a cut-off frequency (current residual spectrum, pole filter) below a fundamental frequency of the decoded sound signal ;
and processing the decoded sound signal through the calculated high-pass filter .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum (-off frequency, lower band) comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US20050165603A1
CLAIM 27
. A post-processing method as defined in claim 26 , further comprising adding the frequency upper-band signal with the frequency lower band (current residual spectrum, pole filter) signal to form an output post-processed and up-sampled decoded sound signal .

US20050165603A1
CLAIM 31
. A post-processing method as defined in claim 1 , wherein applying post-processing to said at least one of the frequency sub-band signals comprises : determining a pitch value of the decoded sound signal ;
calculating , in relation to the determined pitch value , a high-pass filter with a cut-off frequency (current residual spectrum, pole filter) below a fundamental frequency of the decoded sound signal ;
and processing the decoded sound signal through the calculated high-pass filter .

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum (-off frequency, lower band) comprises locating a maximum between each pair of two consecutive minima of the current residual spectrum .
US20050165603A1
CLAIM 27
. A post-processing method as defined in claim 26 , further comprising adding the frequency upper-band signal with the frequency lower band (current residual spectrum, pole filter) signal to form an output post-processed and up-sampled decoded sound signal .

US20050165603A1
CLAIM 31
. A post-processing method as defined in claim 1 , wherein applying post-processing to said at least one of the frequency sub-band signals comprises : determining a pitch value of the decoded sound signal ;
calculating , in relation to the determined pitch value , a high-pass filter with a cut-off frequency (current residual spectrum, pole filter) below a fundamental frequency of the decoded sound signal ;
and processing the decoded sound signal through the calculated high-pass filter .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum (-off frequency, lower band) , calculating a normalized correlation value with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US20050165603A1
CLAIM 27
. A post-processing method as defined in claim 26 , further comprising adding the frequency upper-band signal with the frequency lower band (current residual spectrum, pole filter) signal to form an output post-processed and up-sampled decoded sound signal .

US20050165603A1
CLAIM 31
. A post-processing method as defined in claim 1 , wherein applying post-processing to said at least one of the frequency sub-band signals comprises : determining a pitch value of the decoded sound signal ;
calculating , in relation to the determined pitch value , a high-pass filter with a cut-off frequency (current residual spectrum, pole filter) below a fundamental frequency of the decoded sound signal ;
and processing the decoded sound signal through the calculated high-pass filter .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (sampling means) indicative of sound activity in the sound signal .
US20050165603A1
CLAIM 56
. A post-processing device as defined in claim 55 , wherein the dividing means comprises sub-band filter means supplied with the decoded sound signal , and wherein the up-sampling means (adaptive threshold) is combined with the sub-band filter means .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates (adaptive filter) when a tonal sound signal is detected .
US20050165603A1
CLAIM 34
. A post-processing device as defined in claim 32 , wherein the post-processing means comprises adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) means supplied with the decoded sound signal .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates (adaptive filter) calculated in a previous frame in a SNR calculation .
US20050165603A1
CLAIM 34
. A post-processing device as defined in claim 32 , wherein the post-processing means comprises adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) means supplied with the decoded sound signal .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection further comprises updating the noise estimates (adaptive filter) for a next frame .
US20050165603A1
CLAIM 34
. A post-processing device as defined in claim 32 , wherein the post-processing means comprises adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) means supplied with the decoded sound signal .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates (adaptive filter) for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order (encoding parameters) and a sixteenth order of linear prediction residual error energies .
US20050165603A1
CLAIM 34
. A post-processing device as defined in claim 32 , wherein the post-processing means comprises adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) means supplied with the decoded sound signal .

US20050165603A1
CLAIM 63
. A sound signal decoder comprising : an input for receiving an encoded sound signal ;
a parameter decoder supplied with the encoded sound signal for decoding sound signal encoding parameters (second order, second group, second energy values) ;
a sound signal decoder supplied with the decoded sound signal encoding parameters for producing a decoded sound signal ;
and a post processing device as recited in claim 32 for post processing the decoded sound signal in view of enhancing a perceived quality of said decoded sound signal .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal prevents updating of noise energy estimates (adaptive filter) when a music signal is detected .
US20050165603A1
CLAIM 34
. A post-processing device as defined in claim 32 , wherein the post-processing means comprises adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) means supplied with the decoded sound signal .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates (adaptive filter) on the music signal .
US20050165603A1
CLAIM 34
. A post-processing device as defined in claim 32 , wherein the post-processing means comprises adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) means supplied with the decoded sound signal .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates (adaptive filter) is prevented in response to having simultaneously the activity prediction parameter larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US20050165603A1
CLAIM 34
. A post-processing device as defined in claim 32 , wherein the post-processing means comprises adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) means supplied with the decoded sound signal .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group (encoding parameters) of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values (encoding parameters) so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20050165603A1
CLAIM 63
. A sound signal decoder comprising : an input for receiving an encoded sound signal ;
a parameter decoder supplied with the encoded sound signal for decoding sound signal encoding parameters (second order, second group, second energy values) ;
a sound signal decoder supplied with the decoded sound signal encoding parameters for producing a decoded sound signal ;
and a post processing device as recited in claim 32 for post processing the decoded sound signal in view of enhancing a perceived quality of said decoded sound signal .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates (adaptive filter) is prevented in response to having the noise character parameter inferior than a given fixed threshold .
US20050165603A1
CLAIM 34
. A post-processing device as defined in claim 32 , wherein the post-processing means comprises adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) means supplied with the decoded sound signal .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum (-off frequency, lower band) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20050165603A1
CLAIM 27
. A post-processing method as defined in claim 26 , further comprising adding the frequency upper-band signal with the frequency lower band (current residual spectrum, pole filter) signal to form an output post-processed and up-sampled decoded sound signal .

US20050165603A1
CLAIM 31
. A post-processing method as defined in claim 1 , wherein applying post-processing to said at least one of the frequency sub-band signals comprises : determining a pitch value of the decoded sound signal ;
calculating , in relation to the determined pitch value , a high-pass filter with a cut-off frequency (current residual spectrum, pole filter) below a fundamental frequency of the decoded sound signal ;
and processing the decoded sound signal through the calculated high-pass filter .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum (-off frequency, lower band) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20050165603A1
CLAIM 27
. A post-processing method as defined in claim 26 , further comprising adding the frequency upper-band signal with the frequency lower band (current residual spectrum, pole filter) signal to form an output post-processed and up-sampled decoded sound signal .

US20050165603A1
CLAIM 31
. A post-processing method as defined in claim 1 , wherein applying post-processing to said at least one of the frequency sub-band signals comprises : determining a pitch value of the decoded sound signal ;
calculating , in relation to the determined pitch value , a high-pass filter with a cut-off frequency (current residual spectrum, pole filter) below a fundamental frequency of the decoded sound signal ;
and processing the decoded sound signal through the calculated high-pass filter .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum (-off frequency, lower band) comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US20050165603A1
CLAIM 27
. A post-processing method as defined in claim 26 , further comprising adding the frequency upper-band signal with the frequency lower band (current residual spectrum, pole filter) signal to form an output post-processed and up-sampled decoded sound signal .

US20050165603A1
CLAIM 31
. A post-processing method as defined in claim 1 , wherein applying post-processing to said at least one of the frequency sub-band signals comprises : determining a pitch value of the decoded sound signal ;
calculating , in relation to the determined pitch value , a high-pass filter with a cut-off frequency (current residual spectrum, pole filter) below a fundamental frequency of the decoded sound signal ;
and processing the decoded sound signal through the calculated high-pass filter .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates (adaptive filter) in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector .
US20050165603A1
CLAIM 34
. A post-processing device as defined in claim 32 , wherein the post-processing means comprises adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) means supplied with the decoded sound signal .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates (adaptive filter) .
US20050165603A1
CLAIM 34
. A post-processing device as defined in claim 32 , wherein the post-processing means comprises adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) means supplied with the decoded sound signal .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20050240399A1

Filed: 2004-11-22     Issued: 2005-10-27

Signal encoding

(Original Assignee) Nokia Oyj     (Current Assignee) Nokia Technologies Oy

Jari Makinen
US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis (energy levels) ;

and summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US20050240399A1
CLAIM 3
. A method according to claim 1 , wherein said first set of parameters are based on energy levels (frequency bin basis) of one or more frequency bands associated with the frame .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates (said second set) when a tonal sound signal is detected .
US20050240399A1
CLAIM 5
. A method according to claim 1 , wherein said second set (noise energy estimates) of parameters comprises at least one of spectral parameters , LTP parameters and correlation parameters associated with the frame .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates (said second set) calculated in a previous frame in a SNR calculation (noise ratio) .
US20050240399A1
CLAIM 5
. A method according to claim 1 , wherein said second set (noise energy estimates) of parameters comprises at least one of spectral parameters , LTP parameters and correlation parameters associated with the frame .

US20050240399A1
CLAIM 9
. A method according to claim 8 , wherein the selection of the length of the encoded frame is dependent on a signal to noise ratio (noise ratio, SNR LT, SNR calculation) of the frame .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates (said second set) for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction (linear prediction) residual error energies .
US20050240399A1
CLAIM 5
. A method according to claim 1 , wherein said second set (noise energy estimates) of parameters comprises at least one of spectral parameters , LTP parameters and correlation parameters associated with the frame .

US20050240399A1
CLAIM 6
. A method according to claim 2 , wherein the first excitation method is algebraic code excited linear prediction (linear prediction, residual error) excitation .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal prevents updating of noise energy estimates (said second set) when a music signal is detected .
US20050240399A1
CLAIM 5
. A method according to claim 1 , wherein said second set (noise energy estimates) of parameters comprises at least one of spectral parameters , LTP parameters and correlation parameters associated with the frame .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates (said second set) on the music signal .
US20050240399A1
CLAIM 5
. A method according to claim 1 , wherein said second set (noise energy estimates) of parameters comprises at least one of spectral parameters , LTP parameters and correlation parameters associated with the frame .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame and an energy of the sound signal in a previous frame , for frequency bands (frequency bands) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US20050240399A1
CLAIM 3
. A method according to claim 1 , wherein said first set of parameters are based on energy levels of one or more frequency bands (frequency bands) associated with the frame .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates (said second set) is prevented in response to having simultaneously the activity prediction parameter larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US20050240399A1
CLAIM 5
. A method according to claim 1 , wherein said second set (noise energy estimates) of parameters comprises at least one of spectral parameters , LTP parameters and correlation parameters associated with the frame .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (frequency bands) into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20050240399A1
CLAIM 3
. A method according to claim 1 , wherein said first set of parameters are based on energy levels of one or more frequency bands (frequency bands) associated with the frame .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates (said second set) is prevented in response to having the noise character parameter inferior than a given fixed threshold .
US20050240399A1
CLAIM 5
. A method according to claim 1 , wherein said second set (noise energy estimates) of parameters comprises at least one of spectral parameters , LTP parameters and correlation parameters associated with the frame .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis (energy levels) ;

and an adder for summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US20050240399A1
CLAIM 3
. A method according to claim 1 , wherein said first set of parameters are based on energy levels (frequency bin basis) of one or more frequency bands associated with the frame .

US8990073B2
CLAIM 37
. A device as defined in claim 36 , further comprising a signal-to-noise ratio (SNR)-based sound activity detector (signal processing device) .
US20050240399A1
CLAIM 28
. A terminal according to claim 27 , wherein said terminal is a signal processing device (sound activity detector) .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector (signal processing device) comprises a comparator of an average signal to noise ratio (noise ratio) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US20050240399A1
CLAIM 9
. A method according to claim 8 , wherein the selection of the length of the encoded frame is dependent on a signal to noise ratio (noise ratio, SNR LT, SNR calculation) of the frame .

US20050240399A1
CLAIM 28
. A terminal according to claim 27 , wherein said terminal is a signal processing device (sound activity detector) .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates (said second set) in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector (signal processing device) .
US20050240399A1
CLAIM 5
. A method according to claim 1 , wherein said second set (noise energy estimates) of parameters comprises at least one of spectral parameters , LTP parameters and correlation parameters associated with the frame .

US20050240399A1
CLAIM 28
. A terminal according to claim 27 , wherein said terminal is a signal processing device (sound activity detector) .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates (said second set) .
US20050240399A1
CLAIM 5
. A method according to claim 1 , wherein said second set (noise energy estimates) of parameters comprises at least one of spectral parameters , LTP parameters and correlation parameters associated with the frame .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
EP1672618A1

Filed: 2004-10-04     Issued: 2006-06-21

Method for deciding time boundary for encoding spectrum envelope and frequency resolution

(Original Assignee) Panasonic Corp     (Current Assignee) Panasonic Corp

Kok Seng 50 Regent Grove CHONG, Sua Hong Neo, Naoya Tanaka, Takeshi Norimatsu
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (band signals, frequency bands) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long term correlation map .
EP1672618A1
CLAIM 1
A method for determining a time border and a frequency resolution in spectral envelope coding of an audio signal utilizing a time/frequency grid , said method comprising : deriving a start time border of a current frame (current frame) from an end time border of a previous frame of envelope data ;
detecting , by a transient detector , a transient time slot in spectral data between the start time border and the end time border within a predetermined allowed region , a degree of the transient exceeding a certain drasticness ;
and finding and instantiating an actual end time border and intermediate time borders in the spectral data between the transient time slot and the end time border of the current frame within the predetermined allowed region by comparing the transient drasticness with a predetermined signal variation criterion .

EP1672618A1
CLAIM 10
The method for determining the time border and the frequency resolution according to Claim 2 wherein the signal variation criterion is evaluated by computing ratios between the energies of the frequency bands (first frequency, frequency bands, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) for every time segment found , and when minimum of the ratios exceeds a threshold , a high frequency resolution is adopted ;
Otherwise , a low frequency resolution is adopted .

EP1672618A1
CLAIM 12
A method for determining a time border and a frequency resolution by a bandwidth expansion technology in spectral envelope coding of an audio signal utilizing a time/frequency grid , said method comprising : transforming the audio signal into a plurality of low-frequency subband signals (first frequency, frequency bands, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) by an analysis filterbank ;
replicating portions of the subband signal to a high-frequency region , dividing the replicated subbands into time segments using time borders information and subsequently into frequency bands using frequency resolutions information , and subsequently adjusting the subbands by envelope data ;
and transforming the low-frequency subband signals and the envelope-adjusted subband signals into a bandwidth-expanded time domain signal , wherein said method further comprising : deriving a start time border from an end time border of a previous frame of envelope data ;
detecting , by a transient detector , a most drastic transient time slot in spectral data between the start time border and furthest allowed end time border ;
finding and instantiating an actual end time border and intermediate time borders in the spectral data between the transient time slot and the furthest allowed end time border by evaluating a signal variation criterion ;
and deriving the frequency resolution by evaluating energy of every frequency band partitioned by low-resolution borders for every time segment obtained by the dividing of the replicated subbands .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (band signals, frequency bands) of the sound signal in the current frame (current frame) ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
EP1672618A1
CLAIM 1
A method for determining a time border and a frequency resolution in spectral envelope coding of an audio signal utilizing a time/frequency grid , said method comprising : deriving a start time border of a current frame (current frame) from an end time border of a previous frame of envelope data ;
detecting , by a transient detector , a transient time slot in spectral data between the start time border and the end time border within a predetermined allowed region , a degree of the transient exceeding a certain drasticness ;
and finding and instantiating an actual end time border and intermediate time borders in the spectral data between the transient time slot and the end time border of the current frame within the predetermined allowed region by comparing the transient drasticness with a predetermined signal variation criterion .

EP1672618A1
CLAIM 10
The method for determining the time border and the frequency resolution according to Claim 2 wherein the signal variation criterion is evaluated by computing ratios between the energies of the frequency bands (first frequency, frequency bands, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) for every time segment found , and when minimum of the ratios exceeds a threshold , a high frequency resolution is adopted ;
Otherwise , a low frequency resolution is adopted .

EP1672618A1
CLAIM 12
A method for determining a time border and a frequency resolution by a bandwidth expansion technology in spectral envelope coding of an audio signal utilizing a time/frequency grid , said method comprising : transforming the audio signal into a plurality of low-frequency subband signals (first frequency, frequency bands, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) by an analysis filterbank ;
replicating portions of the subband signal to a high-frequency region , dividing the replicated subbands into time segments using time borders information and subsequently into frequency bands using frequency resolutions information , and subsequently adjusting the subbands by envelope data ;
and transforming the low-frequency subband signals and the envelope-adjusted subband signals into a bandwidth-expanded time domain signal , wherein said method further comprising : deriving a start time border from an end time border of a previous frame of envelope data ;
detecting , by a transient detector , a most drastic transient time slot in spectral data between the start time border and furthest allowed end time border ;
finding and instantiating an actual end time border and intermediate time borders in the spectral data between the transient time slot and the furthest allowed end time border by evaluating a signal variation criterion ;
and deriving the frequency resolution by evaluating energy of every frequency band partitioned by low-resolution borders for every time segment obtained by the dividing of the replicated subbands .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (band signals, frequency bands) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
EP1672618A1
CLAIM 10
The method for determining the time border and the frequency resolution according to Claim 2 wherein the signal variation criterion is evaluated by computing ratios between the energies of the frequency bands (first frequency, frequency bands, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) for every time segment found , and when minimum of the ratios exceeds a threshold , a high frequency resolution is adopted ;
Otherwise , a low frequency resolution is adopted .

EP1672618A1
CLAIM 12
A method for determining a time border and a frequency resolution by a bandwidth expansion technology in spectral envelope coding of an audio signal utilizing a time/frequency grid , said method comprising : transforming the audio signal into a plurality of low-frequency subband signals (first frequency, frequency bands, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) by an analysis filterbank ;
replicating portions of the subband signal to a high-frequency region , dividing the replicated subbands into time segments using time borders information and subsequently into frequency bands using frequency resolutions information , and subsequently adjusting the subbands by envelope data ;
and transforming the low-frequency subband signals and the envelope-adjusted subband signals into a bandwidth-expanded time domain signal , wherein said method further comprising : deriving a start time border from an end time border of a previous frame of envelope data ;
detecting , by a transient detector , a most drastic transient time slot in spectral data between the start time border and furthest allowed end time border ;
finding and instantiating an actual end time border and intermediate time borders in the spectral data between the transient time slot and the furthest allowed end time border by evaluating a signal variation criterion ;
and deriving the frequency resolution by evaluating energy of every frequency band partitioned by low-resolution borders for every time segment obtained by the dividing of the replicated subbands .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin (band signals, frequency bands) by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (band signals, frequency bands) so as to produce a summed long-term correlation map .
EP1672618A1
CLAIM 10
The method for determining the time border and the frequency resolution according to Claim 2 wherein the signal variation criterion is evaluated by computing ratios between the energies of the frequency bands (first frequency, frequency bands, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) for every time segment found , and when minimum of the ratios exceeds a threshold , a high frequency resolution is adopted ;
Otherwise , a low frequency resolution is adopted .

EP1672618A1
CLAIM 12
A method for determining a time border and a frequency resolution by a bandwidth expansion technology in spectral envelope coding of an audio signal utilizing a time/frequency grid , said method comprising : transforming the audio signal into a plurality of low-frequency subband signals (first frequency, frequency bands, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) by an analysis filterbank ;
replicating portions of the subband signal to a high-frequency region , dividing the replicated subbands into time segments using time borders information and subsequently into frequency bands using frequency resolutions information , and subsequently adjusting the subbands by envelope data ;
and transforming the low-frequency subband signals and the envelope-adjusted subband signals into a bandwidth-expanded time domain signal , wherein said method further comprising : deriving a start time border from an end time border of a previous frame of envelope data ;
detecting , by a transient detector , a most drastic transient time slot in spectral data between the start time border and furthest allowed end time border ;
finding and instantiating an actual end time border and intermediate time borders in the spectral data between the transient time slot and the furthest allowed end time border by evaluating a signal variation criterion ;
and deriving the frequency resolution by evaluating energy of every frequency band partitioned by low-resolution borders for every time segment obtained by the dividing of the replicated subbands .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map for frequency bins (band signals, frequency bands) having a magnitude that exceeds a given fixed threshold .
EP1672618A1
CLAIM 10
The method for determining the time border and the frequency resolution according to Claim 2 wherein the signal variation criterion is evaluated by computing ratios between the energies of the frequency bands (first frequency, frequency bands, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) for every time segment found , and when minimum of the ratios exceeds a threshold , a high frequency resolution is adopted ;
Otherwise , a low frequency resolution is adopted .

EP1672618A1
CLAIM 12
A method for determining a time border and a frequency resolution by a bandwidth expansion technology in spectral envelope coding of an audio signal utilizing a time/frequency grid , said method comprising : transforming the audio signal into a plurality of low-frequency subband signals (first frequency, frequency bands, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) by an analysis filterbank ;
replicating portions of the subband signal to a high-frequency region , dividing the replicated subbands into time segments using time borders information and subsequently into frequency bands using frequency resolutions information , and subsequently adjusting the subbands by envelope data ;
and transforming the low-frequency subband signals and the envelope-adjusted subband signals into a bandwidth-expanded time domain signal , wherein said method further comprising : deriving a start time border from an end time border of a previous frame of envelope data ;
detecting , by a transient detector , a most drastic transient time slot in spectral data between the start time border and furthest allowed end time border ;
finding and instantiating an actual end time border and intermediate time borders in the spectral data between the transient time slot and the furthest allowed end time border by evaluating a signal variation criterion ;
and deriving the frequency resolution by evaluating energy of every frequency band partitioned by low-resolution borders for every time segment obtained by the dividing of the replicated subbands .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame (current frame) energy and an average frame energy .
EP1672618A1
CLAIM 1
A method for determining a time border and a frequency resolution in spectral envelope coding of an audio signal utilizing a time/frequency grid , said method comprising : deriving a start time border of a current frame (current frame) from an end time border of a previous frame of envelope data ;
detecting , by a transient detector , a transient time slot in spectral data between the start time border and the end time border within a predetermined allowed region , a degree of the transient exceeding a certain drasticness ;
and finding and instantiating an actual end time border and intermediate time borders in the spectral data between the transient time slot and the end time border of the current frame within the predetermined allowed region by comparing the transient drasticness with a predetermined signal variation criterion .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame (current frame) and an energy of the sound signal in a previous frame , for frequency bands (band signals, frequency bands) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
EP1672618A1
CLAIM 1
A method for determining a time border and a frequency resolution in spectral envelope coding of an audio signal utilizing a time/frequency grid , said method comprising : deriving a start time border of a current frame (current frame) from an end time border of a previous frame of envelope data ;
detecting , by a transient detector , a transient time slot in spectral data between the start time border and the end time border within a predetermined allowed region , a degree of the transient exceeding a certain drasticness ;
and finding and instantiating an actual end time border and intermediate time borders in the spectral data between the transient time slot and the end time border of the current frame within the predetermined allowed region by comparing the transient drasticness with a predetermined signal variation criterion .

EP1672618A1
CLAIM 10
The method for determining the time border and the frequency resolution according to Claim 2 wherein the signal variation criterion is evaluated by computing ratios between the energies of the frequency bands (first frequency, frequency bands, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) for every time segment found , and when minimum of the ratios exceeds a threshold , a high frequency resolution is adopted ;
Otherwise , a low frequency resolution is adopted .

EP1672618A1
CLAIM 12
A method for determining a time border and a frequency resolution by a bandwidth expansion technology in spectral envelope coding of an audio signal utilizing a time/frequency grid , said method comprising : transforming the audio signal into a plurality of low-frequency subband signals (first frequency, frequency bands, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) by an analysis filterbank ;
replicating portions of the subband signal to a high-frequency region , dividing the replicated subbands into time segments using time borders information and subsequently into frequency bands using frequency resolutions information , and subsequently adjusting the subbands by envelope data ;
and transforming the low-frequency subband signals and the envelope-adjusted subband signals into a bandwidth-expanded time domain signal , wherein said method further comprising : deriving a start time border from an end time border of a previous frame of envelope data ;
detecting , by a transient detector , a most drastic transient time slot in spectral data between the start time border and furthest allowed end time border ;
finding and instantiating an actual end time border and intermediate time borders in the spectral data between the transient time slot and the furthest allowed end time border by evaluating a signal variation criterion ;
and deriving the frequency resolution by evaluating energy of every frequency band partitioned by low-resolution borders for every time segment obtained by the dividing of the replicated subbands .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (previous frame) indicative of an activity of the sound signal .
EP1672618A1
CLAIM 1
A method for determining a time border and a frequency resolution in spectral envelope coding of an audio signal utilizing a time/frequency grid , said method comprising : deriving a start time border of a current frame from an end time border of a previous frame (activity prediction parameter) of envelope data ;
detecting , by a transient detector , a transient time slot in spectral data between the start time border and the end time border within a predetermined allowed region , a degree of the transient exceeding a certain drasticness ;
and finding and instantiating an actual end time border and intermediate time borders in the spectral data between the transient time slot and the end time border of the current frame within the predetermined allowed region by comparing the transient drasticness with a predetermined signal variation criterion .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (previous frame) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
EP1672618A1
CLAIM 1
A method for determining a time border and a frequency resolution in spectral envelope coding of an audio signal utilizing a time/frequency grid , said method comprising : deriving a start time border of a current frame from an end time border of a previous frame (activity prediction parameter) of envelope data ;
detecting , by a transient detector , a transient time slot in spectral data between the start time border and the end time border within a predetermined allowed region , a degree of the transient exceeding a certain drasticness ;
and finding and instantiating an actual end time border and intermediate time borders in the spectral data between the transient time slot and the end time border of the current frame within the predetermined allowed region by comparing the transient drasticness with a predetermined signal variation criterion .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (previous frame) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
EP1672618A1
CLAIM 1
A method for determining a time border and a frequency resolution in spectral envelope coding of an audio signal utilizing a time/frequency grid , said method comprising : deriving a start time border of a current frame from an end time border of a previous frame (activity prediction parameter) of envelope data ;
detecting , by a transient detector , a transient time slot in spectral data between the start time border and the end time border within a predetermined allowed region , a degree of the transient exceeding a certain drasticness ;
and finding and instantiating an actual end time border and intermediate time borders in the spectral data between the transient time slot and the end time border of the current frame within the predetermined allowed region by comparing the transient drasticness with a predetermined signal variation criterion .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (band signals, frequency bands) into a first group (band signals, frequency bands) of a certain number of first frequency (band signals, frequency bands) bands and a second group of a rest of the frequency bands ;

calculating a first energy (band signals, frequency bands) value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
EP1672618A1
CLAIM 10
The method for determining the time border and the frequency resolution according to Claim 2 wherein the signal variation criterion is evaluated by computing ratios between the energies of the frequency bands (first frequency, frequency bands, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) for every time segment found , and when minimum of the ratios exceeds a threshold , a high frequency resolution is adopted ;
Otherwise , a low frequency resolution is adopted .

EP1672618A1
CLAIM 12
A method for determining a time border and a frequency resolution by a bandwidth expansion technology in spectral envelope coding of an audio signal utilizing a time/frequency grid , said method comprising : transforming the audio signal into a plurality of low-frequency subband signals (first frequency, frequency bands, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) by an analysis filterbank ;
replicating portions of the subband signal to a high-frequency region , dividing the replicated subbands into time segments using time borders information and subsequently into frequency bands using frequency resolutions information , and subsequently adjusting the subbands by envelope data ;
and transforming the low-frequency subband signals and the envelope-adjusted subband signals into a bandwidth-expanded time domain signal , wherein said method further comprising : deriving a start time border from an end time border of a previous frame of envelope data ;
detecting , by a transient detector , a most drastic transient time slot in spectral data between the start time border and furthest allowed end time border ;
finding and instantiating an actual end time border and intermediate time borders in the spectral data between the transient time slot and the furthest allowed end time border by evaluating a signal variation criterion ;
and deriving the frequency resolution by evaluating energy of every frequency band partitioned by low-resolution borders for every time segment obtained by the dividing of the replicated subbands .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (band signals, frequency bands) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long-term correlation map .
EP1672618A1
CLAIM 1
A method for determining a time border and a frequency resolution in spectral envelope coding of an audio signal utilizing a time/frequency grid , said method comprising : deriving a start time border of a current frame (current frame) from an end time border of a previous frame of envelope data ;
detecting , by a transient detector , a transient time slot in spectral data between the start time border and the end time border within a predetermined allowed region , a degree of the transient exceeding a certain drasticness ;
and finding and instantiating an actual end time border and intermediate time borders in the spectral data between the transient time slot and the end time border of the current frame within the predetermined allowed region by comparing the transient drasticness with a predetermined signal variation criterion .

EP1672618A1
CLAIM 10
The method for determining the time border and the frequency resolution according to Claim 2 wherein the signal variation criterion is evaluated by computing ratios between the energies of the frequency bands (first frequency, frequency bands, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) for every time segment found , and when minimum of the ratios exceeds a threshold , a high frequency resolution is adopted ;
Otherwise , a low frequency resolution is adopted .

EP1672618A1
CLAIM 12
A method for determining a time border and a frequency resolution by a bandwidth expansion technology in spectral envelope coding of an audio signal utilizing a time/frequency grid , said method comprising : transforming the audio signal into a plurality of low-frequency subband signals (first frequency, frequency bands, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) by an analysis filterbank ;
replicating portions of the subband signal to a high-frequency region , dividing the replicated subbands into time segments using time borders information and subsequently into frequency bands using frequency resolutions information , and subsequently adjusting the subbands by envelope data ;
and transforming the low-frequency subband signals and the envelope-adjusted subband signals into a bandwidth-expanded time domain signal , wherein said method further comprising : deriving a start time border from an end time border of a previous frame of envelope data ;
detecting , by a transient detector , a most drastic transient time slot in spectral data between the start time border and furthest allowed end time border ;
finding and instantiating an actual end time border and intermediate time borders in the spectral data between the transient time slot and the furthest allowed end time border by evaluating a signal variation criterion ;
and deriving the frequency resolution by evaluating energy of every frequency band partitioned by low-resolution borders for every time segment obtained by the dividing of the replicated subbands .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (band signals, frequency bands) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long-term correlation map .
EP1672618A1
CLAIM 1
A method for determining a time border and a frequency resolution in spectral envelope coding of an audio signal utilizing a time/frequency grid , said method comprising : deriving a start time border of a current frame (current frame) from an end time border of a previous frame of envelope data ;
detecting , by a transient detector , a transient time slot in spectral data between the start time border and the end time border within a predetermined allowed region , a degree of the transient exceeding a certain drasticness ;
and finding and instantiating an actual end time border and intermediate time borders in the spectral data between the transient time slot and the end time border of the current frame within the predetermined allowed region by comparing the transient drasticness with a predetermined signal variation criterion .

EP1672618A1
CLAIM 10
The method for determining the time border and the frequency resolution according to Claim 2 wherein the signal variation criterion is evaluated by computing ratios between the energies of the frequency bands (first frequency, frequency bands, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) for every time segment found , and when minimum of the ratios exceeds a threshold , a high frequency resolution is adopted ;
Otherwise , a low frequency resolution is adopted .

EP1672618A1
CLAIM 12
A method for determining a time border and a frequency resolution by a bandwidth expansion technology in spectral envelope coding of an audio signal utilizing a time/frequency grid , said method comprising : transforming the audio signal into a plurality of low-frequency subband signals (first frequency, frequency bands, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) by an analysis filterbank ;
replicating portions of the subband signal to a high-frequency region , dividing the replicated subbands into time segments using time borders information and subsequently into frequency bands using frequency resolutions information , and subsequently adjusting the subbands by envelope data ;
and transforming the low-frequency subband signals and the envelope-adjusted subband signals into a bandwidth-expanded time domain signal , wherein said method further comprising : deriving a start time border from an end time border of a previous frame of envelope data ;
detecting , by a transient detector , a most drastic transient time slot in spectral data between the start time border and furthest allowed end time border ;
finding and instantiating an actual end time border and intermediate time borders in the spectral data between the transient time slot and the furthest allowed end time border by evaluating a signal variation criterion ;
and deriving the frequency resolution by evaluating energy of every frequency band partitioned by low-resolution borders for every time segment obtained by the dividing of the replicated subbands .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (band signals, frequency bands) of the sound signal in the current frame (current frame) ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
EP1672618A1
CLAIM 1
A method for determining a time border and a frequency resolution in spectral envelope coding of an audio signal utilizing a time/frequency grid , said method comprising : deriving a start time border of a current frame (current frame) from an end time border of a previous frame of envelope data ;
detecting , by a transient detector , a transient time slot in spectral data between the start time border and the end time border within a predetermined allowed region , a degree of the transient exceeding a certain drasticness ;
and finding and instantiating an actual end time border and intermediate time borders in the spectral data between the transient time slot and the end time border of the current frame within the predetermined allowed region by comparing the transient drasticness with a predetermined signal variation criterion .

EP1672618A1
CLAIM 10
The method for determining the time border and the frequency resolution according to Claim 2 wherein the signal variation criterion is evaluated by computing ratios between the energies of the frequency bands (first frequency, frequency bands, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) for every time segment found , and when minimum of the ratios exceeds a threshold , a high frequency resolution is adopted ;
Otherwise , a low frequency resolution is adopted .

EP1672618A1
CLAIM 12
A method for determining a time border and a frequency resolution by a bandwidth expansion technology in spectral envelope coding of an audio signal utilizing a time/frequency grid , said method comprising : transforming the audio signal into a plurality of low-frequency subband signals (first frequency, frequency bands, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) by an analysis filterbank ;
replicating portions of the subband signal to a high-frequency region , dividing the replicated subbands into time segments using time borders information and subsequently into frequency bands using frequency resolutions information , and subsequently adjusting the subbands by envelope data ;
and transforming the low-frequency subband signals and the envelope-adjusted subband signals into a bandwidth-expanded time domain signal , wherein said method further comprising : deriving a start time border from an end time border of a previous frame of envelope data ;
detecting , by a transient detector , a most drastic transient time slot in spectral data between the start time border and furthest allowed end time border ;
finding and instantiating an actual end time border and intermediate time borders in the spectral data between the transient time slot and the furthest allowed end time border by evaluating a signal variation criterion ;
and deriving the frequency resolution by evaluating energy of every frequency band partitioned by low-resolution borders for every time segment obtained by the dividing of the replicated subbands .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin (band signals, frequency bands) by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (band signals, frequency bands) so as to produce a summed long-term correlation map .
EP1672618A1
CLAIM 10
The method for determining the time border and the frequency resolution according to Claim 2 wherein the signal variation criterion is evaluated by computing ratios between the energies of the frequency bands (first frequency, frequency bands, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) for every time segment found , and when minimum of the ratios exceeds a threshold , a high frequency resolution is adopted ;
Otherwise , a low frequency resolution is adopted .

EP1672618A1
CLAIM 12
A method for determining a time border and a frequency resolution by a bandwidth expansion technology in spectral envelope coding of an audio signal utilizing a time/frequency grid , said method comprising : transforming the audio signal into a plurality of low-frequency subband signals (first frequency, frequency bands, first group, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) by an analysis filterbank ;
replicating portions of the subband signal to a high-frequency region , dividing the replicated subbands into time segments using time borders information and subsequently into frequency bands using frequency resolutions information , and subsequently adjusting the subbands by envelope data ;
and transforming the low-frequency subband signals and the envelope-adjusted subband signals into a bandwidth-expanded time domain signal , wherein said method further comprising : deriving a start time border from an end time border of a previous frame of envelope data ;
detecting , by a transient detector , a most drastic transient time slot in spectral data between the start time border and furthest allowed end time border ;
finding and instantiating an actual end time border and intermediate time borders in the spectral data between the transient time slot and the furthest allowed end time border by evaluating a signal variation criterion ;
and deriving the frequency resolution by evaluating energy of every frequency band partitioned by low-resolution borders for every time segment obtained by the dividing of the replicated subbands .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator (domain signal) for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector .
EP1672618A1
CLAIM 12
A method for determining a time border and a frequency resolution by a bandwidth expansion technology in spectral envelope coding of an audio signal utilizing a time/frequency grid , said method comprising : transforming the audio signal into a plurality of low-frequency subband signals by an analysis filterbank ;
replicating portions of the subband signal to a high-frequency region , dividing the replicated subbands into time segments using time borders information and subsequently into frequency bands using frequency resolutions information , and subsequently adjusting the subbands by envelope data ;
and transforming the low-frequency subband signals and the envelope-adjusted subband signals into a bandwidth-expanded time domain signal (noise estimator) , wherein said method further comprising : deriving a start time border from an end time border of a previous frame of envelope data ;
detecting , by a transient detector , a most drastic transient time slot in spectral data between the start time border and furthest allowed end time border ;
finding and instantiating an actual end time border and intermediate time borders in the spectral data between the transient time slot and the furthest allowed end time border by evaluating a signal variation criterion ;
and deriving the frequency resolution by evaluating energy of every frequency band partitioned by low-resolution borders for every time segment obtained by the dividing of the replicated subbands .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
WO2005041169A2

Filed: 2004-08-13     Issued: 2005-05-06

Method and system for speech coding

(Original Assignee) Nokia Corporation; Nokia Inc.     

Anssi RÄMÖ, Jani Nurminen, Sakari Himanen, Ari Heikkinen
US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (signal data) .
WO2005041169A2
CLAIM 15
. The method of claim 1 , characterized in that the audio signal is encoded into audio signal data (SNR calculation) , said method further characterized by forming a parameter signal based on the audio signal data having a first number of signal data ;
downsampling the parameter signal to a second number of signal data for providing a further parameter signal , wherein the second number is smaller than the first number ;
and upsampling the further parameter signal to a third number of signal data in decoding , wherein the third number is greater than the second number .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (speech signal) in order to distinguish a music signal from a background noise signal and prevent update (base stations) of noise energy estimates on the music signal .
WO2005041169A2
CLAIM 9
. The method of claim 8 , characterized in that the plurality of values includes a value designated to a voiced speech signal (noise character parameter, activity prediction parameter) and another value designated to an unvoiced signal . PATENT 944-003 . 182-1

WO2005041169A2
CLAIM 31
. A communication network , characterized by : a plurality of base stations (prevent update) ;
and a plurality of mobile stations adapted to communicating with the base stations , wherein at least one the mobile stations comprises : a decoder for generating a synthesized audio signal indicative of an audio signal having audio characteristics , wherein the audio signal is coded in a coding step into a plurality of parameters at a data rate , and the coding step is adjusted based on the characteristics of the audio characteristics of audio signals for providing an adjusted representation of the parameters ;
and an input for receiving audio data indicative of the parameters in the adjusted representation from at least one of the base stations for providing the audio data to the decoder , so as to allow the decoder to generate the synthesized audio signal based on the adjusted representation .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (speech signal) indicative of an activity of the sound signal .
WO2005041169A2
CLAIM 9
. The method of claim 8 , characterized in that the plurality of values includes a value designated to a voiced speech signal (noise character parameter, activity prediction parameter) and another value designated to an unvoiced signal . PATENT 944-003 . 182-1

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (speech signal) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
WO2005041169A2
CLAIM 9
. The method of claim 8 , characterized in that the plurality of values includes a value designated to a voiced speech signal (noise character parameter, activity prediction parameter) and another value designated to an unvoiced signal . PATENT 944-003 . 182-1

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (speech signal) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
WO2005041169A2
CLAIM 9
. The method of claim 8 , characterized in that the plurality of values includes a value designated to a voiced speech signal (noise character parameter, activity prediction parameter) and another value designated to an unvoiced signal . PATENT 944-003 . 182-1

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (speech signal) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency (first number) bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
WO2005041169A2
CLAIM 9
. The method of claim 8 , characterized in that the plurality of values includes a value designated to a voiced speech signal (noise character parameter, activity prediction parameter) and another value designated to an unvoiced signal . PATENT 944-003 . 182-1

WO2005041169A2
CLAIM 15
. The method of claim 1 , characterized in that the audio signal is encoded into audio signal data , said method further characterized by forming a parameter signal based on the audio signal data having a first number (first frequency) of signal data ;
downsampling the parameter signal to a second number of signal data for providing a further parameter signal , wherein the second number is smaller than the first number ;
and upsampling the further parameter signal to a third number of signal data in decoding , wherein the third number is greater than the second number .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (speech signal) inferior than a given fixed threshold .
WO2005041169A2
CLAIM 9
. The method of claim 8 , characterized in that the plurality of values includes a value designated to a voiced speech signal (noise character parameter, activity prediction parameter) and another value designated to an unvoiced signal . PATENT 944-003 . 182-1




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
KR20060025203A

Filed: 2004-06-25     Issued: 2006-03-20

잡음 부가에 의한 디코딩된 오디오의 품질 개선

(Original Assignee) 코닌클리케 필립스 일렉트로닉스 엔.브이.     

알버투스 씨. 덴 브린커, 프랑소아 피. 마이버그
US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection further comprises updating the noise estimates (추정치를) for a next frame .
KR20060025203A
CLAIM 5
제 1 항 내지 제 4 항 중 어느 한 항에 있어서 , 상기 변환 파라미터(b2)는 상기 오디오 신호(x)의 사인곡선 성분들의 진폭 추정치를 (noise estimates, noise estimator) 나타내는 , 오디오 신호(x) 인코딩 방법 .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group (특성들을) of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
KR20060025203A
CLAIM 1
미리 정해진 코딩 방법(201)에 따라 오디오 신호(x)로부터 코드 신호(b1)가 생성되는 오디오 신호(x) 인코딩 방법에 있어서 , - 상기 오디오 신호(x)의 스펙트럼-시간 정보의 적어도 일부를 규정하며 상기 오디오 신호와 실질적으로 유사한 스펙트럼-시간 특성들을 (first group) 가진 잡음신호의 생성을 가능하게 하는 변환 파라미터들(b2)의 세트로 상기 오디오 신호(x)를 변환하는 단계(207) ;
및 - 상기 코드 신호(b1) 및 상기 변환 파라미터들(b2)에 의하여 상기 오디오 신호(x)를 표현하는 단계를 포함하는 , 오디오 신호(x) 인코딩 방법 .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator (추정치를) for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector .
KR20060025203A
CLAIM 5
제 1 항 내지 제 4 항 중 어느 한 항에 있어서 , 상기 변환 파라미터(b2)는 상기 오디오 신호(x)의 사인곡선 성분들의 진폭 추정치를 (noise estimates, noise estimator) 나타내는 , 오디오 신호(x) 인코딩 방법 .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
WO2004114133A1

Filed: 2004-06-25     Issued: 2004-12-29

Method to diagnose equipment status

(Original Assignee) Abb Research Ltd.     

Magnus HÖGSTEDT, Dominique Blanc, Patrik Nordling, Marko Lehtola, Tommy Kettu
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map (additional event) .
WO2004114133A1
CLAIM 14
. A method according to any previous claim , Λ-0 characterised in that an existing pattern may be reconfigured by a user by editing any of the events and/or alarms and/or by adding additional event (term correlation map) s and/or alarms to the existing pattern .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character (displaying information) parameter in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
WO2004114133A1
CLAIM 5
7 . A method according to any previous claim , characterised by displaying information (noise character) about the identified cause of an alarm and/or event is based on a recognised pattern .

WO2004114133A1
CLAIM 26
. A graphical user interface for displaying a cause of one or more alarm and/or event messages generated by an equipment over a time period , examining a list of alarm and/or event messages for said time (noise character parameter) period , characterised in that a display of a cause of one or more said alarms and/or events is provided by means of a method according to any of claims 1-18 .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character (displaying information) parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
WO2004114133A1
CLAIM 5
7 . A method according to any previous claim , characterised by displaying information (noise character) about the identified cause of an alarm and/or event is based on a recognised pattern .

WO2004114133A1
CLAIM 26
. A graphical user interface for displaying a cause of one or more alarm and/or event messages generated by an equipment over a time period , examining a list of alarm and/or event messages for said time (noise character parameter) period , characterised in that a display of a cause of one or more said alarms and/or events is provided by means of a method according to any of claims 1-18 .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character (displaying information) parameter inferior than a given fixed threshold .
WO2004114133A1
CLAIM 5
7 . A method according to any previous claim , characterised by displaying information (noise character) about the identified cause of an alarm and/or event is based on a recognised pattern .

WO2004114133A1
CLAIM 26
. A graphical user interface for displaying a cause of one or more alarm and/or event messages generated by an equipment over a time period , examining a list of alarm and/or event messages for said time (noise character parameter) period , characterised in that a display of a cause of one or more said alarms and/or events is provided by means of a method according to any of claims 1-18 .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character (displaying information) of the sound signal for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates .
WO2004114133A1
CLAIM 5
7 . A method according to any previous claim , characterised by displaying information (noise character) about the identified cause of an alarm and/or event is based on a recognised pattern .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20050278171A1

Filed: 2004-06-15     Issued: 2005-12-15

Comfort noise generator using modified doblinger noise estimate

(Original Assignee) Acoustic Technologies Inc     (Current Assignee) Cirrus Logic Inc

Seth Suppappola, Samuel Ebenezer, Justin Allen
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long term correlation map .
US20050278171A1
CLAIM 10
. The telephone as set forth in claim 1 wherein said said circuit for calculating an estimate includes a comparator for comparing the ratio of the noise power estimate from the current frame (current frame) to the noise power estimate from the previous frame with a threshold .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame (current frame) ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US20050278171A1
CLAIM 10
. The telephone as set forth in claim 1 wherein said said circuit for calculating an estimate includes a comparator for comparing the ratio of the noise power estimate from the current frame (current frame) to the noise power estimate from the previous frame with a threshold .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (spectral gain) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US20050278171A1
CLAIM 3
. The telephone as set forth in claim 1 and further including a circuit for limiting spectral gain (frequency bins) in said circuit for calculating a noise estimate .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (spectral gain) so as to produce a summed long-term correlation map .
US20050278171A1
CLAIM 3
. The telephone as set forth in claim 1 and further including a circuit for limiting spectral gain (frequency bins) in said circuit for calculating a noise estimate .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map for frequency bins (spectral gain) having a magnitude that exceeds a given fixed threshold .
US20050278171A1
CLAIM 3
. The telephone as set forth in claim 1 and further including a circuit for limiting spectral gain (frequency bins) in said circuit for calculating a noise estimate .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity in the sound signal further comprises using a signal-to-noise ratio (SNR)-based sound activity detection (speech detector) .
US20050278171A1
CLAIM 4
. The telephone as set forth in claim 3 and further including a speech detector (sound activity detection) , wherein the spectral gain limit is higher when speech is detected than when speech is not detected .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection (speech detector) comprises detecting the sound signal based on a frequency dependent signal-to-noise ratio (SNR) .
US20050278171A1
CLAIM 4
. The telephone as set forth in claim 3 and further including a speech detector (sound activity detection) , wherein the spectral gain limit is higher when speech is detected than when speech is not detected .

US8990073B2
CLAIM 14
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection (speech detector) comprises comparing an average signal-to-noise ratio (SNR av ) to a threshold calculated as a function of a long-term signal-to-noise ratio (SNR LT ) .
US20050278171A1
CLAIM 4
. The telephone as set forth in claim 3 and further including a speech detector (sound activity detection) , wherein the spectral gain limit is higher when speech is detected than when speech is not detected .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection (speech detector) in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (noise power) .
US20050278171A1
CLAIM 4
. The telephone as set forth in claim 3 and further including a speech detector (sound activity detection) , wherein the spectral gain limit is higher when speech is detected than when speech is not detected .

US20050278171A1
CLAIM 9
. The telephone as set forth in claim 1 wherein said said circuit for calculating an estimate includes a comparator for comparing the noise power (SNR LT, SNR calculation) estimate from one frame with the noise power estimate from another frame .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection (speech detector) further comprises updating the noise estimates for a next frame .
US20050278171A1
CLAIM 4
. The telephone as set forth in claim 3 and further including a speech detector (sound activity detection) , wherein the spectral gain limit is higher when speech is detected than when speech is not detected .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame (current frame) energy and an average frame energy .
US20050278171A1
CLAIM 10
. The telephone as set forth in claim 1 wherein said said circuit for calculating an estimate includes a comparator for comparing the ratio of the noise power estimate from the current frame (current frame) to the noise power estimate from the previous frame with a threshold .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame (current frame) and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US20050278171A1
CLAIM 10
. The telephone as set forth in claim 1 wherein said said circuit for calculating an estimate includes a comparator for comparing the ratio of the noise power estimate from the current frame (current frame) to the noise power estimate from the previous frame with a threshold .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (previous frame) indicative of an activity of the sound signal .
US20050278171A1
CLAIM 10
. The telephone as set forth in claim 1 wherein said said circuit for calculating an estimate includes a comparator for comparing the ratio of the noise power estimate from the current frame to the noise power estimate from the previous frame (activity prediction parameter) with a threshold .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (previous frame) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
US20050278171A1
CLAIM 10
. The telephone as set forth in claim 1 wherein said said circuit for calculating an estimate includes a comparator for comparing the ratio of the noise power estimate from the current frame to the noise power estimate from the previous frame (activity prediction parameter) with a threshold .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (previous frame) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US20050278171A1
CLAIM 10
. The telephone as set forth in claim 1 wherein said said circuit for calculating an estimate includes a comparator for comparing the ratio of the noise power estimate from the current frame to the noise power estimate from the previous frame (activity prediction parameter) with a threshold .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long-term correlation map .
US20050278171A1
CLAIM 10
. The telephone as set forth in claim 1 wherein said said circuit for calculating an estimate includes a comparator for comparing the ratio of the noise power estimate from the current frame (current frame) to the noise power estimate from the previous frame with a threshold .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long-term correlation map .
US20050278171A1
CLAIM 10
. The telephone as set forth in claim 1 wherein said said circuit for calculating an estimate includes a comparator for comparing the ratio of the noise power estimate from the current frame (current frame) to the noise power estimate from the previous frame with a threshold .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame (current frame) ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US20050278171A1
CLAIM 10
. The telephone as set forth in claim 1 wherein said said circuit for calculating an estimate includes a comparator for comparing the ratio of the noise power estimate from the current frame (current frame) to the noise power estimate from the previous frame with a threshold .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (spectral gain) so as to produce a summed long-term correlation map .
US20050278171A1
CLAIM 3
. The telephone as set forth in claim 1 and further including a circuit for limiting spectral gain (frequency bins) in said circuit for calculating a noise estimate .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
JP2005195955A

Filed: 2004-01-08     Issued: 2005-07-21

雑音抑圧装置及び雑音抑圧方法

(Original Assignee) Toshiba Corp; 株式会社東芝     

Ko Amada, Akinori Kawamura, Akinori Koshiba, 皇 天田, 亮典 小柴, 聡典 河村
US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound signal (サブバンド) is detected .
JP2005195955A
CLAIM 10
雑音信号と目的信号が混合した複数の入力信号から雑音信号を抑圧する雑音抑圧装置において、前記複数の入力信号から周波数帯域ごとに目的信号が強調されるサブバンド (tonal sound signal) 統合信号を生成するサブバンド統合信号生成手段と、前記サブバンド統合信号から各サブバンド毎の雑音信号成分を推定する雑音推定手段と、前記複数の入力信号から各サブバンド毎に目的信号区間と雑音信号区間を判定する区間判定手段と、前記区間判定手段の判定結果に基づいて各サブバンド毎に前記サブバンド統合信号から前記推定雑音信号成分を引き去る雑音抑圧手段と、各サブバンド毎の雑音抑圧手段の出力信号を合成する合成手段とを具備したことを特徴とする雑音抑圧装置。

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character (切替えること) parameter in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
JP2005195955A
CLAIM 3
更に、目的信号区間の出力に残留する雑音信号とのレベルの違いを補正する係数を前記入力信号に乗じた補正用信号を生成する補正用信号生成手段と、前記補正用信号と前記雑音過剰抑圧手段の出力とを加算する加算手段とを具備し、前記切替手段は前記雑音抑圧手段の出力信号と前記加算手段の出力信号とを切替えること (noise character) を特徴とする請求項2記載の雑音抑圧装置。

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision (の判定) obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
JP2005195955A
CLAIM 1
雑音信号と目的信号が混合した入力信号から雑音信号を抑圧する雑音抑圧装置において、前記入力信号から雑音信号成分を推定する雑音推定手段と、前記入力信号から目的信号区間と雑音信号区間を判定する区間判定手段と、前記区間判定手段の判定 (binary decision) 結果に基づいて前記入力信号から前記推定雑音信号成分を引き去る雑音抑圧手段とを具備したことを特徴とする雑音抑圧装置。

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character (切替えること) parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
JP2005195955A
CLAIM 3
更に、目的信号区間の出力に残留する雑音信号とのレベルの違いを補正する係数を前記入力信号に乗じた補正用信号を生成する補正用信号生成手段と、前記補正用信号と前記雑音過剰抑圧手段の出力とを加算する加算手段とを具備し、前記切替手段は前記雑音抑圧手段の出力信号と前記加算手段の出力信号とを切替えること (noise character) を特徴とする請求項2記載の雑音抑圧装置。

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character (切替えること) parameter inferior than a given fixed threshold .
JP2005195955A
CLAIM 3
更に、目的信号区間の出力に残留する雑音信号とのレベルの違いを補正する係数を前記入力信号に乗じた補正用信号を生成する補正用信号生成手段と、前記補正用信号と前記雑音過剰抑圧手段の出力とを加算する加算手段とを具備し、前記切替手段は前記雑音抑圧手段の出力信号と前記加算手段の出力信号とを切替えること (noise character) を特徴とする請求項2記載の雑音抑圧装置。

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character (切替えること) of the sound signal for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates .
JP2005195955A
CLAIM 3
更に、目的信号区間の出力に残留する雑音信号とのレベルの違いを補正する係数を前記入力信号に乗じた補正用信号を生成する補正用信号生成手段と、前記補正用信号と前記雑音過剰抑圧手段の出力とを加算する加算手段とを具備し、前記切替手段は前記雑音抑圧手段の出力信号と前記加算手段の出力信号とを切替えること (noise character) を特徴とする請求項2記載の雑音抑圧装置。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
CN1735928A

Filed: 2003-12-22     Issued: 2006-02-15

用于可变速率音频编解码的方法

(Original Assignee) 法国电信公司     

巴拉兹·科弗西, 多米尼克·马萨卢
US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (N个比特) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
CN1735928A
CLAIM 1
. 一种把数字音频信号帧(S)编码为二进制输出序列(Φ)的方法,其中为参数集合定义编码比特的最大数量Nmax,该参数可以根据所述信号帧计算,该集合包括第一子集和第二子集,所述方法包括如下步骤:-计算所述第一子集的所述参数,并且把这些参数编码为N0个编码比特,使得N0<Nmax;-确定分配Nmax-N0个编码比特用于所述第二子集的所述参数;以及-把分配给所述第二子集的所述参数的所述Nmax-N0个编码比特按照确定的顺序排列,其中,根据所述第一子集的所述编码参数,确定对所述Nmax-N0个编码比特的所述分配和/或所述排列顺序,响应所述二进制输出序列的N个比特 (frequency bins) 的指示,该N个比特可用于所述参数集合的所述编码,且N0<N≤Nmax,所述方法还包括以下步骤:-选择所述第二子集的参数,分配按照所述顺序排列的前N-N0个编码比特给这些参数;-计算所述第二子集的所选参数,并且对这些参数编码以产生所述排列的前N-N0个编码比特;以及-把所述第一子集的N0个编码比特以及所述第二子集的所选参数的N-N0个编码比特插入到所述输出序列中。

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (N个比特) so as to produce a summed long-term correlation map .
CN1735928A
CLAIM 1
. 一种把数字音频信号帧(S)编码为二进制输出序列(Φ)的方法,其中为参数集合定义编码比特的最大数量Nmax,该参数可以根据所述信号帧计算,该集合包括第一子集和第二子集,所述方法包括如下步骤:-计算所述第一子集的所述参数,并且把这些参数编码为N0个编码比特,使得N0<Nmax;-确定分配Nmax-N0个编码比特用于所述第二子集的所述参数;以及-把分配给所述第二子集的所述参数的所述Nmax-N0个编码比特按照确定的顺序排列,其中,根据所述第一子集的所述编码参数,确定对所述Nmax-N0个编码比特的所述分配和/或所述排列顺序,响应所述二进制输出序列的N个比特 (frequency bins) 的指示,该N个比特可用于所述参数集合的所述编码,且N0<N≤Nmax,所述方法还包括以下步骤:-选择所述第二子集的参数,分配按照所述顺序排列的前N-N0个编码比特给这些参数;-计算所述第二子集的所选参数,并且对这些参数编码以产生所述排列的前N-N0个编码比特;以及-把所述第一子集的N0个编码比特以及所述第二子集的所选参数的N-N0个编码比特插入到所述输出序列中。

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map for frequency bins (N个比特) having a magnitude that exceeds a given fixed threshold .
CN1735928A
CLAIM 1
. 一种把数字音频信号帧(S)编码为二进制输出序列(Φ)的方法,其中为参数集合定义编码比特的最大数量Nmax,该参数可以根据所述信号帧计算,该集合包括第一子集和第二子集,所述方法包括如下步骤:-计算所述第一子集的所述参数,并且把这些参数编码为N0个编码比特,使得N0<Nmax;-确定分配Nmax-N0个编码比特用于所述第二子集的所述参数;以及-把分配给所述第二子集的所述参数的所述Nmax-N0个编码比特按照确定的顺序排列,其中,根据所述第一子集的所述编码参数,确定对所述Nmax-N0个编码比特的所述分配和/或所述排列顺序,响应所述二进制输出序列的N个比特 (frequency bins) 的指示,该N个比特可用于所述参数集合的所述编码,且N0<N≤Nmax,所述方法还包括以下步骤:-选择所述第二子集的参数,分配按照所述顺序排列的前N-N0个编码比特给这些参数;-计算所述第二子集的所选参数,并且对这些参数编码以产生所述排列的前N-N0个编码比特;以及-把所述第一子集的N0个编码比特以及所述第二子集的所选参数的N-N0个编码比特插入到所述输出序列中。

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (来计算) .
CN1735928A
CLAIM 6
. 根据权利要求5的方法,其中,所述第二子集的所述参数与所述信号的谱带有关,其中基于所述第一子集的所述编码参数来估计所述编码信号的谱包络,其中通过将听觉感知模型应用于所述估计的谱包络来计算 (SNR calculation) 频率掩蔽曲线,并且其中所述心理声学准则参考所述估计的谱包络的级别,其与每一个谱带中的所述掩蔽曲线有关。

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (这些参数) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
CN1735928A
CLAIM 1
. 一种把数字音频信号帧(S)编码为二进制输出序列(Φ)的方法,其中为参数集合定义编码比特的最大数量Nmax,该参数可以根据所述信号帧计算,该集合包括第一子集和第二子集,所述方法包括如下步骤:-计算所述第一子集的所述参数,并且把这些参数 (noise character parameter) 编码为N0个编码比特,使得N0<Nmax;-确定分配Nmax-N0个编码比特用于所述第二子集的所述参数;以及-把分配给所述第二子集的所述参数的所述Nmax-N0个编码比特按照确定的顺序排列,其中,根据所述第一子集的所述编码参数,确定对所述Nmax-N0个编码比特的所述分配和/或所述排列顺序,响应所述二进制输出序列的N个比特的指示,该N个比特可用于所述参数集合的所述编码,且N0<N≤Nmax,所述方法还包括以下步骤:-选择所述第二子集的参数,分配按照所述顺序排列的前N-N0个编码比特给这些参数;-计算所述第二子集的所选参数,并且对这些参数编码以产生所述排列的前N-N0个编码比特;以及-把所述第一子集的N0个编码比特以及所述第二子集的所选参数的N-N0个编码比特插入到所述输出序列中。

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (这些参数) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
CN1735928A
CLAIM 1
. 一种把数字音频信号帧(S)编码为二进制输出序列(Φ)的方法,其中为参数集合定义编码比特的最大数量Nmax,该参数可以根据所述信号帧计算,该集合包括第一子集和第二子集,所述方法包括如下步骤:-计算所述第一子集的所述参数,并且把这些参数 (noise character parameter) 编码为N0个编码比特,使得N0<Nmax;-确定分配Nmax-N0个编码比特用于所述第二子集的所述参数;以及-把分配给所述第二子集的所述参数的所述Nmax-N0个编码比特按照确定的顺序排列,其中,根据所述第一子集的所述编码参数,确定对所述Nmax-N0个编码比特的所述分配和/或所述排列顺序,响应所述二进制输出序列的N个比特的指示,该N个比特可用于所述参数集合的所述编码,且N0<N≤Nmax,所述方法还包括以下步骤:-选择所述第二子集的参数,分配按照所述顺序排列的前N-N0个编码比特给这些参数;-计算所述第二子集的所选参数,并且对这些参数编码以产生所述排列的前N-N0个编码比特;以及-把所述第一子集的N0个编码比特以及所述第二子集的所选参数的N-N0个编码比特插入到所述输出序列中。

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (这些参数) inferior than a given fixed threshold .
CN1735928A
CLAIM 1
. 一种把数字音频信号帧(S)编码为二进制输出序列(Φ)的方法,其中为参数集合定义编码比特的最大数量Nmax,该参数可以根据所述信号帧计算,该集合包括第一子集和第二子集,所述方法包括如下步骤:-计算所述第一子集的所述参数,并且把这些参数 (noise character parameter) 编码为N0个编码比特,使得N0<Nmax;-确定分配Nmax-N0个编码比特用于所述第二子集的所述参数;以及-把分配给所述第二子集的所述参数的所述Nmax-N0个编码比特按照确定的顺序排列,其中,根据所述第一子集的所述编码参数,确定对所述Nmax-N0个编码比特的所述分配和/或所述排列顺序,响应所述二进制输出序列的N个比特的指示,该N个比特可用于所述参数集合的所述编码,且N0<N≤Nmax,所述方法还包括以下步骤:-选择所述第二子集的参数,分配按照所述顺序排列的前N-N0个编码比特给这些参数;-计算所述第二子集的所选参数,并且对这些参数编码以产生所述排列的前N-N0个编码比特;以及-把所述第一子集的N0个编码比特以及所述第二子集的所选参数的N-N0个编码比特插入到所述输出序列中。

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (N个比特) so as to produce a summed long-term correlation map .
CN1735928A
CLAIM 1
. 一种把数字音频信号帧(S)编码为二进制输出序列(Φ)的方法,其中为参数集合定义编码比特的最大数量Nmax,该参数可以根据所述信号帧计算,该集合包括第一子集和第二子集,所述方法包括如下步骤:-计算所述第一子集的所述参数,并且把这些参数编码为N0个编码比特,使得N0<Nmax;-确定分配Nmax-N0个编码比特用于所述第二子集的所述参数;以及-把分配给所述第二子集的所述参数的所述Nmax-N0个编码比特按照确定的顺序排列,其中,根据所述第一子集的所述编码参数,确定对所述Nmax-N0个编码比特的所述分配和/或所述排列顺序,响应所述二进制输出序列的N个比特 (frequency bins) 的指示,该N个比特可用于所述参数集合的所述编码,且N0<N≤Nmax,所述方法还包括以下步骤:-选择所述第二子集的参数,分配按照所述顺序排列的前N-N0个编码比特给这些参数;-计算所述第二子集的所选参数,并且对这些参数编码以产生所述排列的前N-N0个编码比特;以及-把所述第一子集的N0个编码比特以及所述第二子集的所选参数的N-N0个编码比特插入到所述输出序列中。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
WO2004040830A1

Filed: 2003-10-30     Issued: 2004-05-13

Variable rate speech codec

(Original Assignee) Nokia Corporation     

Jari MÄKINEN, Pasi Ojala
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (limit values) of the long term correlation map .
WO2004040830A1
CLAIM 2
. A method as claimed in claim 1 , characterized in that responsive to changes in the channel conditions in the telecommunications network and/or in the active codec mode set , the parameters to be used in the speech codec mode selection and the limit values (initial value, correlation value, term value, first energy value) thereof are adapted to correspond to new channel conditions and capacity of the telecommunications network and/or to the active codec mode set .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value (limit values) with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
WO2004040830A1
CLAIM 2
. A method as claimed in claim 1 , characterized in that responsive to changes in the channel conditions in the telecommunications network and/or in the active codec mode set , the parameters to be used in the speech codec mode selection and the limit values (initial value, correlation value, term value, first energy value) thereof are adapted to correspond to new channel conditions and capacity of the telecommunications network and/or to the active codec mode set .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error (residual error) energies .
WO2004040830A1
CLAIM 1
. A method for performing variable rate speech coding in a speech codec comprising a plurality of speech codec modes operating at different bit rates and speech encoded by said speech codec being arranged for transmis- sion in a telecommunications network , the method comprising : receiving information on an active codec mode set to be supported from the telecommunications network , activating the speech-codec-supported speech codec modes that correspond to the active codec mode set determined in the telecommunica- tions network , characterized by encoding speech signals to be applied to the speech codec with said activated speech codec modes such that a speech codec mode of the substantially lowest bit rate is adapted to speech frames comprised by the speech signals such that , in view of the channel conditions in the telecommu- nications network , the level of residual error (residual error) in coding will be minimized at the same time .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (speech signal) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
WO2004040830A1
CLAIM 1
. A method for performing variable rate speech coding in a speech codec comprising a plurality of speech codec modes operating at different bit rates and speech encoded by said speech codec being arranged for transmis- sion in a telecommunications network , the method comprising : receiving information on an active codec mode set to be supported from the telecommunications network , activating the speech-codec-supported speech codec modes that correspond to the active codec mode set determined in the telecommunica- tions network , characterized by encoding speech signal (noise character parameter, activity prediction parameter) s to be applied to the speech codec with said activated speech codec modes such that a speech codec mode of the substantially lowest bit rate is adapted to speech frames comprised by the speech signals such that , in view of the channel conditions in the telecommu- nications network , the level of residual error in coding will be minimized at the same time .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (speech signal) indicative of an activity of the sound signal .
WO2004040830A1
CLAIM 1
. A method for performing variable rate speech coding in a speech codec comprising a plurality of speech codec modes operating at different bit rates and speech encoded by said speech codec being arranged for transmis- sion in a telecommunications network , the method comprising : receiving information on an active codec mode set to be supported from the telecommunications network , activating the speech-codec-supported speech codec modes that correspond to the active codec mode set determined in the telecommunica- tions network , characterized by encoding speech signal (noise character parameter, activity prediction parameter) s to be applied to the speech codec with said activated speech codec modes such that a speech codec mode of the substantially lowest bit rate is adapted to speech frames comprised by the speech signals such that , in view of the channel conditions in the telecommu- nications network , the level of residual error in coding will be minimized at the same time .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (speech signal) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
WO2004040830A1
CLAIM 1
. A method for performing variable rate speech coding in a speech codec comprising a plurality of speech codec modes operating at different bit rates and speech encoded by said speech codec being arranged for transmis- sion in a telecommunications network , the method comprising : receiving information on an active codec mode set to be supported from the telecommunications network , activating the speech-codec-supported speech codec modes that correspond to the active codec mode set determined in the telecommunica- tions network , characterized by encoding speech signal (noise character parameter, activity prediction parameter) s to be applied to the speech codec with said activated speech codec modes such that a speech codec mode of the substantially lowest bit rate is adapted to speech frames comprised by the speech signals such that , in view of the channel conditions in the telecommu- nications network , the level of residual error in coding will be minimized at the same time .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (speech signal) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
WO2004040830A1
CLAIM 1
. A method for performing variable rate speech coding in a speech codec comprising a plurality of speech codec modes operating at different bit rates and speech encoded by said speech codec being arranged for transmis- sion in a telecommunications network , the method comprising : receiving information on an active codec mode set to be supported from the telecommunications network , activating the speech-codec-supported speech codec modes that correspond to the active codec mode set determined in the telecommunica- tions network , characterized by encoding speech signal (noise character parameter, activity prediction parameter) s to be applied to the speech codec with said activated speech codec modes such that a speech codec mode of the substantially lowest bit rate is adapted to speech frames comprised by the speech signals such that , in view of the channel conditions in the telecommu- nications network , the level of residual error in coding will be minimized at the same time .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (speech signal) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value (limit values) for the first group of frequency bands and a second energy (different rates) value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
WO2004040830A1
CLAIM 1
. A method for performing variable rate speech coding in a speech codec comprising a plurality of speech codec modes operating at different bit rates and speech encoded by said speech codec being arranged for transmis- sion in a telecommunications network , the method comprising : receiving information on an active codec mode set to be supported from the telecommunications network , activating the speech-codec-supported speech codec modes that correspond to the active codec mode set determined in the telecommunica- tions network , characterized by encoding speech signal (noise character parameter, activity prediction parameter) s to be applied to the speech codec with said activated speech codec modes such that a speech codec mode of the substantially lowest bit rate is adapted to speech frames comprised by the speech signals such that , in view of the channel conditions in the telecommu- nications network , the level of residual error in coding will be minimized at the same time .

WO2004040830A1
CLAIM 2
. A method as claimed in claim 1 , characterized in that responsive to changes in the channel conditions in the telecommunications network and/or in the active codec mode set , the parameters to be used in the speech codec mode selection and the limit values (initial value, correlation value, term value, first energy value) thereof are adapted to correspond to new channel conditions and capacity of the telecommunications network and/or to the active codec mode set .

WO2004040830A1
CLAIM 8
. A variable rate speech codec comprising a plurality of speech codec modes operating at different rates (second energy, second energy value) and speech encoded by said speech codec being arranged for transmission in a telecommunications network , the speech codec being arranged to receive information from the telecommunications network on an active codec mode set to be supported , to activate the speech codec modes that correspond to the active codec mode set determined in the telecommunications network , c h a r a c t e r i z e d in that the speech codec is also arranged to encode the speech signals to be applied to the speech codec with said activated speech codec modes such that a speech codec mode of the substantially lowest bit rate is arranged for adaption to speech frames comprised by the speech signals such that , in view of the channel conditions in the telecommunications networks , the level of residual error in coding will be minimized at the same time .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (speech signal) inferior than a given fixed threshold .
WO2004040830A1
CLAIM 1
. A method for performing variable rate speech coding in a speech codec comprising a plurality of speech codec modes operating at different bit rates and speech encoded by said speech codec being arranged for transmis- sion in a telecommunications network , the method comprising : receiving information on an active codec mode set to be supported from the telecommunications network , activating the speech-codec-supported speech codec modes that correspond to the active codec mode set determined in the telecommunica- tions network , characterized by encoding speech signal (noise character parameter, activity prediction parameter) s to be applied to the speech codec with said activated speech codec modes such that a speech codec mode of the substantially lowest bit rate is adapted to speech frames comprised by the speech signals such that , in view of the channel conditions in the telecommu- nications network , the level of residual error in coding will be minimized at the same time .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (limit values) of the long-term correlation map .
WO2004040830A1
CLAIM 2
. A method as claimed in claim 1 , characterized in that responsive to changes in the channel conditions in the telecommunications network and/or in the active codec mode set , the parameters to be used in the speech codec mode selection and the limit values (initial value, correlation value, term value, first energy value) thereof are adapted to correspond to new channel conditions and capacity of the telecommunications network and/or to the active codec mode set .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (limit values) of the long-term correlation map .
WO2004040830A1
CLAIM 2
. A method as claimed in claim 1 , characterized in that responsive to changes in the channel conditions in the telecommunications network and/or in the active codec mode set , the parameters to be used in the speech codec mode selection and the limit values (initial value, correlation value, term value, first energy value) thereof are adapted to correspond to new channel conditions and capacity of the telecommunications network and/or to the active codec mode set .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20040128126A1

Filed: 2003-10-14     Issued: 2004-07-01

Preprocessing of digital audio data for mobile audio codecs

(Original Assignee) WILDERTHANCOM Co Ltd     (Current Assignee) WILDERTHANCOM Co Ltd ; Realnetworks Asia Pacific Co Ltd

Young Nam, Seop Park, Tae Ha, Yun Jeon
US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (automatic gain control) indicative of sound activity (signal level) in the sound signal .
US20040128126A1
CLAIM 2
. A method for preprocessing audio data to be processed by a codec having variable coding rate , comprising the steps of : classifying the audio data based on the characteristic of the audio data ;
in case the audio data includes monophonic sound , performing AGC (automatic gain control (adaptive threshold) ) preprocessing of all frames ;
and in case the audio data includes polyphonic sound , performing AGC preprocessing of selected frames .

US20040128126A1
CLAIM 5
. A method in accordance with claim 4 , wherein the adjusting step comprises the steps of : calculating signal level (sound activity, sound activity detection, detecting sound activity) s of the audio data ;
deciding smoothed gain coefficients based on signal levels ;
and generating preprocessed audio data by multiplying the smoothed gain coefficients to the audio data in the decided interval .

US8990073B2
CLAIM 10
. A method for detecting sound activity (signal level) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US20040128126A1
CLAIM 5
. A method in accordance with claim 4 , wherein the adjusting step comprises the steps of : calculating signal level (sound activity, sound activity detection, detecting sound activity) s of the audio data ;
deciding smoothed gain coefficients based on signal levels ;
and generating preprocessed audio data by multiplying the smoothed gain coefficients to the audio data in the decided interval .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity (signal level) in the sound signal further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
US20040128126A1
CLAIM 5
. A method in accordance with claim 4 , wherein the adjusting step comprises the steps of : calculating signal level (sound activity, sound activity detection, detecting sound activity) s of the audio data ;
deciding smoothed gain coefficients based on signal levels ;
and generating preprocessed audio data by multiplying the smoothed gain coefficients to the audio data in the decided interval .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (signal level) detection comprises detecting the sound signal based on a frequency dependent signal-to-noise ratio (SNR) .
US20040128126A1
CLAIM 5
. A method in accordance with claim 4 , wherein the adjusting step comprises the steps of : calculating signal level (sound activity, sound activity detection, detecting sound activity) s of the audio data ;
deciding smoothed gain coefficients based on signal levels ;
and generating preprocessed audio data by multiplying the smoothed gain coefficients to the audio data in the decided interval .

US8990073B2
CLAIM 14
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (signal level) detection comprises comparing an average signal-to-noise ratio (SNR av ) to a threshold calculated as a function of a long-term signal-to-noise ratio (SNR LT ) .
US20040128126A1
CLAIM 5
. A method in accordance with claim 4 , wherein the adjusting step comprises the steps of : calculating signal level (sound activity, sound activity detection, detecting sound activity) s of the audio data ;
deciding smoothed gain coefficients based on signal levels ;
and generating preprocessed audio data by multiplying the smoothed gain coefficients to the audio data in the decided interval .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity (signal level) detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
US20040128126A1
CLAIM 5
. A method in accordance with claim 4 , wherein the adjusting step comprises the steps of : calculating signal level (sound activity, sound activity detection, detecting sound activity) s of the audio data ;
deciding smoothed gain coefficients based on signal levels ;
and generating preprocessed audio data by multiplying the smoothed gain coefficients to the audio data in the decided interval .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity (signal level) detection further comprises updating the noise estimates for a next frame .
US20040128126A1
CLAIM 5
. A method in accordance with claim 4 , wherein the adjusting step comprises the steps of : calculating signal level (sound activity, sound activity detection, detecting sound activity) s of the audio data ;
deciding smoothed gain coefficients based on signal levels ;
and generating preprocessed audio data by multiplying the smoothed gain coefficients to the audio data in the decided interval .

US8990073B2
CLAIM 35
. A device for detecting sound activity (signal level) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US20040128126A1
CLAIM 5
. A method in accordance with claim 4 , wherein the adjusting step comprises the steps of : calculating signal level (sound activity, sound activity detection, detecting sound activity) s of the audio data ;
deciding smoothed gain coefficients based on signal levels ;
and generating preprocessed audio data by multiplying the smoothed gain coefficients to the audio data in the decided interval .

US8990073B2
CLAIM 36
. A device for detecting sound activity (signal level) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US20040128126A1
CLAIM 5
. A method in accordance with claim 4 , wherein the adjusting step comprises the steps of : calculating signal level (sound activity, sound activity detection, detecting sound activity) s of the audio data ;
deciding smoothed gain coefficients based on signal levels ;
and generating preprocessed audio data by multiplying the smoothed gain coefficients to the audio data in the decided interval .

US8990073B2
CLAIM 37
. A device as defined in claim 36 , further comprising a signal-to-noise ratio (SNR)-based sound activity (signal level) detector .
US20040128126A1
CLAIM 5
. A method in accordance with claim 4 , wherein the adjusting step comprises the steps of : calculating signal level (sound activity, sound activity detection, detecting sound activity) s of the audio data ;
deciding smoothed gain coefficients based on signal levels ;
and generating preprocessed audio data by multiplying the smoothed gain coefficients to the audio data in the decided interval .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity (signal level) detector comprises a comparator of an average signal to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US20040128126A1
CLAIM 5
. A method in accordance with claim 4 , wherein the adjusting step comprises the steps of : calculating signal level (sound activity, sound activity detection, detecting sound activity) s of the audio data ;
deciding smoothed gain coefficients based on signal levels ;
and generating preprocessed audio data by multiplying the smoothed gain coefficients to the audio data in the decided interval .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity (signal level) detector .
US20040128126A1
CLAIM 5
. A method in accordance with claim 4 , wherein the adjusting step comprises the steps of : calculating signal level (sound activity, sound activity detection, detecting sound activity) s of the audio data ;
deciding smoothed gain coefficients based on signal levels ;
and generating preprocessed audio data by multiplying the smoothed gain coefficients to the audio data in the decided interval .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
JP2005110127A

Filed: 2003-10-01     Issued: 2005-04-21

風雑音検出装置及びそれを有するビデオカメラ装置

(Original Assignee) Canon Inc; キヤノン株式会社     

Katsutoshi Takahashi, 克寿 高橋
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (音声信号) using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
JP2005110127A
CLAIM 3
前記風雑音検出手段からの信号を用いて、音声信号 (sound signal) の低周波領域のレベルを連続的に制御する低周波領域連続制御手段を更に備えることを特徴とする請求項1又は2に記載の風雑音検出装置。

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal (音声信号) in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
JP2005110127A
CLAIM 3
前記風雑音検出手段からの信号を用いて、音声信号 (sound signal) の低周波領域のレベルを連続的に制御する低周波領域連続制御手段を更に備えることを特徴とする請求項1又は2に記載の風雑音検出装置。

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (音声信号) .
JP2005110127A
CLAIM 3
前記風雑音検出手段からの信号を用いて、音声信号 (sound signal) の低周波領域のレベルを連続的に制御する低周波領域連続制御手段を更に備えることを特徴とする請求項1又は2に記載の風雑音検出装置。

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (音声信号) comprises searching in the correlation map for frequency bins having a magnitude that exceeds a given fixed threshold .
JP2005110127A
CLAIM 3
前記風雑音検出手段からの信号を用いて、音声信号 (sound signal) の低周波領域のレベルを連続的に制御する低周波領域連続制御手段を更に備えることを特徴とする請求項1又は2に記載の風雑音検出装置。

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (音声信号) comprises comparing the summed long-term correlation map with an adaptive threshold indicative of sound activity (風雑音検出) in the sound signal .
JP2005110127A
CLAIM 1
複数の集音手段を備え、当該複数の集音手段による信号で風雑音を検出する風雑音検出 (tonal sound, sound activity, sound activity detector, detecting sound activity) 装置であって、 前記複数の集音手段で得られた信号から差信号を生成する差信号生成手段と、 前記複数の集音手段で得られた信号から和信号を生成する和信号生成手段と、 前記差信号生成手段で生成された差信号を絶対値信号に変換する第1の絶対値変換手段と、 前記和信号生成手段で生成された和信号を絶対値信号に変換する第2の絶対値変換手段と、 前記第1、第2の絶対値変換手段で変換して得た差信号と和信号との差を求め、風雑音検出信号として出力する風雑音検出手段と を備えることを特徴とする風雑音検出装置。

JP2005110127A
CLAIM 3
前記風雑音検出手段からの信号を用いて、音声信号 (sound signal) の低周波領域のレベルを連続的に制御する低周波領域連続制御手段を更に備えることを特徴とする請求項1又は2に記載の風雑音検出装置。

US8990073B2
CLAIM 10
. A method for detecting sound activity (風雑音検出) in a sound signal (音声信号) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
JP2005110127A
CLAIM 1
複数の集音手段を備え、当該複数の集音手段による信号で風雑音を検出する風雑音検出 (tonal sound, sound activity, sound activity detector, detecting sound activity) 装置であって、 前記複数の集音手段で得られた信号から差信号を生成する差信号生成手段と、 前記複数の集音手段で得られた信号から和信号を生成する和信号生成手段と、 前記差信号生成手段で生成された差信号を絶対値信号に変換する第1の絶対値変換手段と、 前記和信号生成手段で生成された和信号を絶対値信号に変換する第2の絶対値変換手段と、 前記第1、第2の絶対値変換手段で変換して得た差信号と和信号との差を求め、風雑音検出信号として出力する風雑音検出手段と を備えることを特徴とする風雑音検出装置。

JP2005110127A
CLAIM 3
前記風雑音検出手段からの信号を用いて、音声信号 (sound signal) の低周波領域のレベルを連続的に制御する低周波領域連続制御手段を更に備えることを特徴とする請求項1又は2に記載の風雑音検出装置。

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound (風雑音検出) signal is detected .
JP2005110127A
CLAIM 1
複数の集音手段を備え、当該複数の集音手段による信号で風雑音を検出する風雑音検出 (tonal sound, sound activity, sound activity detector, detecting sound activity) 装置であって、 前記複数の集音手段で得られた信号から差信号を生成する差信号生成手段と、 前記複数の集音手段で得られた信号から和信号を生成する和信号生成手段と、 前記差信号生成手段で生成された差信号を絶対値信号に変換する第1の絶対値変換手段と、 前記和信号生成手段で生成された和信号を絶対値信号に変換する第2の絶対値変換手段と、 前記第1、第2の絶対値変換手段で変換して得た差信号と和信号との差を求め、風雑音検出信号として出力する風雑音検出手段と を備えることを特徴とする風雑音検出装置。

JP2005110127A
CLAIM 3
前記風雑音検出手段からの信号を用いて、音声信号 (sound signal) の低周波領域のレベルを連続的に制御する低周波領域連続制御手段を更に備えることを特徴とする請求項1又は2に記載の風雑音検出装置。

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity (風雑音検出) in the sound signal (音声信号) further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
JP2005110127A
CLAIM 1
複数の集音手段を備え、当該複数の集音手段による信号で風雑音を検出する風雑音検出 (tonal sound, sound activity, sound activity detector, detecting sound activity) 装置であって、 前記複数の集音手段で得られた信号から差信号を生成する差信号生成手段と、 前記複数の集音手段で得られた信号から和信号を生成する和信号生成手段と、 前記差信号生成手段で生成された差信号を絶対値信号に変換する第1の絶対値変換手段と、 前記和信号生成手段で生成された和信号を絶対値信号に変換する第2の絶対値変換手段と、 前記第1、第2の絶対値変換手段で変換して得た差信号と和信号との差を求め、風雑音検出信号として出力する風雑音検出手段と を備えることを特徴とする風雑音検出装置。

JP2005110127A
CLAIM 3
前記風雑音検出手段からの信号を用いて、音声信号 (sound signal) の低周波領域のレベルを連続的に制御する低周波領域連続制御手段を更に備えることを特徴とする請求項1又は2に記載の風雑音検出装置。

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (風雑音検出) detection comprises detecting the sound signal (音声信号) based on a frequency dependent signal-to-noise ratio (SNR) .
JP2005110127A
CLAIM 1
複数の集音手段を備え、当該複数の集音手段による信号で風雑音を検出する風雑音検出 (tonal sound, sound activity, sound activity detector, detecting sound activity) 装置であって、 前記複数の集音手段で得られた信号から差信号を生成する差信号生成手段と、 前記複数の集音手段で得られた信号から和信号を生成する和信号生成手段と、 前記差信号生成手段で生成された差信号を絶対値信号に変換する第1の絶対値変換手段と、 前記和信号生成手段で生成された和信号を絶対値信号に変換する第2の絶対値変換手段と、 前記第1、第2の絶対値変換手段で変換して得た差信号と和信号との差を求め、風雑音検出信号として出力する風雑音検出手段と を備えることを特徴とする風雑音検出装置。

JP2005110127A
CLAIM 3
前記風雑音検出手段からの信号を用いて、音声信号 (sound signal) の低周波領域のレベルを連続的に制御する低周波領域連続制御手段を更に備えることを特徴とする請求項1又は2に記載の風雑音検出装置。

US8990073B2
CLAIM 14
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (風雑音検出) detection comprises comparing an average signal-to-noise ratio (SNR av ) to a threshold calculated as a function of a long-term signal-to-noise ratio (SNR LT ) .
JP2005110127A
CLAIM 1
複数の集音手段を備え、当該複数の集音手段による信号で風雑音を検出する風雑音検出 (tonal sound, sound activity, sound activity detector, detecting sound activity) 装置であって、 前記複数の集音手段で得られた信号から差信号を生成する差信号生成手段と、 前記複数の集音手段で得られた信号から和信号を生成する和信号生成手段と、 前記差信号生成手段で生成された差信号を絶対値信号に変換する第1の絶対値変換手段と、 前記和信号生成手段で生成された和信号を絶対値信号に変換する第2の絶対値変換手段と、 前記第1、第2の絶対値変換手段で変換して得た差信号と和信号との差を求め、風雑音検出信号として出力する風雑音検出手段と を備えることを特徴とする風雑音検出装置。

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity (風雑音検出) detection in the sound signal (音声信号) further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
JP2005110127A
CLAIM 1
複数の集音手段を備え、当該複数の集音手段による信号で風雑音を検出する風雑音検出 (tonal sound, sound activity, sound activity detector, detecting sound activity) 装置であって、 前記複数の集音手段で得られた信号から差信号を生成する差信号生成手段と、 前記複数の集音手段で得られた信号から和信号を生成する和信号生成手段と、 前記差信号生成手段で生成された差信号を絶対値信号に変換する第1の絶対値変換手段と、 前記和信号生成手段で生成された和信号を絶対値信号に変換する第2の絶対値変換手段と、 前記第1、第2の絶対値変換手段で変換して得た差信号と和信号との差を求め、風雑音検出信号として出力する風雑音検出手段と を備えることを特徴とする風雑音検出装置。

JP2005110127A
CLAIM 3
前記風雑音検出手段からの信号を用いて、音声信号 (sound signal) の低周波領域のレベルを連続的に制御する低周波領域連続制御手段を更に備えることを特徴とする請求項1又は2に記載の風雑音検出装置。

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity (風雑音検出) detection further comprises updating the noise estimates for a next frame .
JP2005110127A
CLAIM 1
複数の集音手段を備え、当該複数の集音手段による信号で風雑音を検出する風雑音検出 (tonal sound, sound activity, sound activity detector, detecting sound activity) 装置であって、 前記複数の集音手段で得られた信号から差信号を生成する差信号生成手段と、 前記複数の集音手段で得られた信号から和信号を生成する和信号生成手段と、 前記差信号生成手段で生成された差信号を絶対値信号に変換する第1の絶対値変換手段と、 前記和信号生成手段で生成された和信号を絶対値信号に変換する第2の絶対値変換手段と、 前記第1、第2の絶対値変換手段で変換して得た差信号と和信号との差を求め、風雑音検出信号として出力する風雑音検出手段と を備えることを特徴とする風雑音検出装置。

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal (音声信号) and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
JP2005110127A
CLAIM 3
前記風雑音検出手段からの信号を用いて、音声信号 (sound signal) の低周波領域のレベルを連続的に制御する低周波領域連続制御手段を更に備えることを特徴とする請求項1又は2に記載の風雑音検出装置。

US8990073B2
CLAIM 18
. A method as defined in claim 14 , comprising classifying the sound signal (音声信号) as one of an inactive sound signal and active sound signal , which comprises determining an inactive sound signal when the average signal-to-noise ratio (SNR av ) is inferior to the calculated threshold .
JP2005110127A
CLAIM 3
前記風雑音検出手段からの信号を用いて、音声信号 (sound signal) の低周波領域のレベルを連続的に制御する低周波領域連続制御手段を更に備えることを特徴とする請求項1又は2に記載の風雑音検出装置。

US8990073B2
CLAIM 19
. A method as defined in claim 14 , comprising classifying the sound signal (音声信号) as one of an inactive sound signal and active sound signal , which comprises determining an active sound signal when the average signal-to-noise ratio (SNR av ) is larger than the calculated threshold .
JP2005110127A
CLAIM 3
前記風雑音検出手段からの信号を用いて、音声信号 (sound signal) の低周波領域のレベルを連続的に制御する低周波領域連続制御手段を更に備えることを特徴とする請求項1又は2に記載の風雑音検出装置。

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal (音声信号) prevents updating of noise energy estimates when a music signal is detected .
JP2005110127A
CLAIM 3
前記風雑音検出手段からの信号を用いて、音声信号 (sound signal) の低周波領域のレベルを連続的に制御する低周波領域連続制御手段を更に備えることを特徴とする請求項1又は2に記載の風雑音検出装置。

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal (音声信号) in a current frame and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
JP2005110127A
CLAIM 3
前記風雑音検出手段からの信号を用いて、音声信号 (sound signal) の低周波領域のレベルを連続的に制御する低周波領域連続制御手段を更に備えることを特徴とする請求項1又は2に記載の風雑音検出装置。

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter indicative of an activity of the sound signal (音声信号) .
JP2005110127A
CLAIM 3
前記風雑音検出手段からの信号を用いて、音声信号 (sound signal) の低周波領域のレベルを連続的に制御する低周波領域連続制御手段を更に備えることを特徴とする請求項1又は2に記載の風雑音検出装置。

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal (音声信号) and the complementary non-stationarity parameter .
JP2005110127A
CLAIM 3
前記風雑音検出手段からの信号を用いて、音声信号 (sound signal) の低周波領域のレベルを連続的に制御する低周波領域連続制御手段を更に備えることを特徴とする請求項1又は2に記載の風雑音検出装置。

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group (えること) of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
JP2005110127A
CLAIM 1
複数の集音手段を備え、当該複数の集音手段による信号で風雑音を検出する風雑音検出装置であって、 前記複数の集音手段で得られた信号から差信号を生成する差信号生成手段と、 前記複数の集音手段で得られた信号から和信号を生成する和信号生成手段と、 前記差信号生成手段で生成された差信号を絶対値信号に変換する第1の絶対値変換手段と、 前記和信号生成手段で生成された和信号を絶対値信号に変換する第2の絶対値変換手段と、 前記第1、第2の絶対値変換手段で変換して得た差信号と和信号との差を求め、風雑音検出信号として出力する風雑音検出手段と を備えること (first group) を特徴とする風雑音検出装置。

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal (音声信号) using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
JP2005110127A
CLAIM 3
前記風雑音検出手段からの信号を用いて、音声信号 (sound signal) の低周波領域のレベルを連続的に制御する低周波領域連続制御手段を更に備えることを特徴とする請求項1又は2に記載の風雑音検出装置。

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal (音声信号) using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
JP2005110127A
CLAIM 3
前記風雑音検出手段からの信号を用いて、音声信号 (sound signal) の低周波領域のレベルを連続的に制御する低周波領域連続制御手段を更に備えることを特徴とする請求項1又は2に記載の風雑音検出装置。

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal (音声信号) in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
JP2005110127A
CLAIM 3
前記風雑音検出手段からの信号を用いて、音声信号 (sound signal) の低周波領域のレベルを連続的に制御する低周波領域連続制御手段を更に備えることを特徴とする請求項1又は2に記載の風雑音検出装置。

US8990073B2
CLAIM 34
. A device as defined in claim 31 , further comprising a detector of strong tones in the sound signal (音声信号) .
JP2005110127A
CLAIM 3
前記風雑音検出手段からの信号を用いて、音声信号 (sound signal) の低周波領域のレベルを連続的に制御する低周波領域連続制御手段を更に備えることを特徴とする請求項1又は2に記載の風雑音検出装置。

US8990073B2
CLAIM 35
. A device for detecting sound activity (風雑音検出) in a sound signal (音声信号) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
JP2005110127A
CLAIM 1
複数の集音手段を備え、当該複数の集音手段による信号で風雑音を検出する風雑音検出 (tonal sound, sound activity, sound activity detector, detecting sound activity) 装置であって、 前記複数の集音手段で得られた信号から差信号を生成する差信号生成手段と、 前記複数の集音手段で得られた信号から和信号を生成する和信号生成手段と、 前記差信号生成手段で生成された差信号を絶対値信号に変換する第1の絶対値変換手段と、 前記和信号生成手段で生成された和信号を絶対値信号に変換する第2の絶対値変換手段と、 前記第1、第2の絶対値変換手段で変換して得た差信号と和信号との差を求め、風雑音検出信号として出力する風雑音検出手段と を備えることを特徴とする風雑音検出装置。

JP2005110127A
CLAIM 3
前記風雑音検出手段からの信号を用いて、音声信号 (sound signal) の低周波領域のレベルを連続的に制御する低周波領域連続制御手段を更に備えることを特徴とする請求項1又は2に記載の風雑音検出装置。

US8990073B2
CLAIM 36
. A device for detecting sound activity (風雑音検出) in a sound signal (音声信号) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
JP2005110127A
CLAIM 1
複数の集音手段を備え、当該複数の集音手段による信号で風雑音を検出する風雑音検出 (tonal sound, sound activity, sound activity detector, detecting sound activity) 装置であって、 前記複数の集音手段で得られた信号から差信号を生成する差信号生成手段と、 前記複数の集音手段で得られた信号から和信号を生成する和信号生成手段と、 前記差信号生成手段で生成された差信号を絶対値信号に変換する第1の絶対値変換手段と、 前記和信号生成手段で生成された和信号を絶対値信号に変換する第2の絶対値変換手段と、 前記第1、第2の絶対値変換手段で変換して得た差信号と和信号との差を求め、風雑音検出信号として出力する風雑音検出手段と を備えることを特徴とする風雑音検出装置。

JP2005110127A
CLAIM 3
前記風雑音検出手段からの信号を用いて、音声信号 (sound signal) の低周波領域のレベルを連続的に制御する低周波領域連続制御手段を更に備えることを特徴とする請求項1又は2に記載の風雑音検出装置。

US8990073B2
CLAIM 37
. A device as defined in claim 36 , further comprising a signal-to-noise ratio (SNR)-based sound activity (風雑音検出) detector .
JP2005110127A
CLAIM 1
複数の集音手段を備え、当該複数の集音手段による信号で風雑音を検出する風雑音検出 (tonal sound, sound activity, sound activity detector, detecting sound activity) 装置であって、 前記複数の集音手段で得られた信号から差信号を生成する差信号生成手段と、 前記複数の集音手段で得られた信号から和信号を生成する和信号生成手段と、 前記差信号生成手段で生成された差信号を絶対値信号に変換する第1の絶対値変換手段と、 前記和信号生成手段で生成された和信号を絶対値信号に変換する第2の絶対値変換手段と、 前記第1、第2の絶対値変換手段で変換して得た差信号と和信号との差を求め、風雑音検出信号として出力する風雑音検出手段と を備えることを特徴とする風雑音検出装置。

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity (風雑音検出) detector comprises a comparator of an average signal to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
JP2005110127A
CLAIM 1
複数の集音手段を備え、当該複数の集音手段による信号で風雑音を検出する風雑音検出 (tonal sound, sound activity, sound activity detector, detecting sound activity) 装置であって、 前記複数の集音手段で得られた信号から差信号を生成する差信号生成手段と、 前記複数の集音手段で得られた信号から和信号を生成する和信号生成手段と、 前記差信号生成手段で生成された差信号を絶対値信号に変換する第1の絶対値変換手段と、 前記和信号生成手段で生成された和信号を絶対値信号に変換する第2の絶対値変換手段と、 前記第1、第2の絶対値変換手段で変換して得た差信号と和信号との差を求め、風雑音検出信号として出力する風雑音検出手段と を備えることを特徴とする風雑音検出装置。

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity (風雑音検出) detector .
JP2005110127A
CLAIM 1
複数の集音手段を備え、当該複数の集音手段による信号で風雑音を検出する風雑音検出 (tonal sound, sound activity, sound activity detector, detecting sound activity) 装置であって、 前記複数の集音手段で得られた信号から差信号を生成する差信号生成手段と、 前記複数の集音手段で得られた信号から和信号を生成する和信号生成手段と、 前記差信号生成手段で生成された差信号を絶対値信号に変換する第1の絶対値変換手段と、 前記和信号生成手段で生成された和信号を絶対値信号に変換する第2の絶対値変換手段と、 前記第1、第2の絶対値変換手段で変換して得た差信号と和信号との差を求め、風雑音検出信号として出力する風雑音検出手段と を備えることを特徴とする風雑音検出装置。

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal (音声信号) for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates .
JP2005110127A
CLAIM 3
前記風雑音検出手段からの信号を用いて、音声信号 (sound signal) の低周波領域のレベルを連続的に制御する低周波領域連続制御手段を更に備えることを特徴とする請求項1又は2に記載の風雑音検出装置。

US8990073B2
CLAIM 41
. A device as defined in claim 36 , further comprising a calculator of a spectral parameter used for detecting spectral changes and spectral attacks in the sound signal (音声信号) .
JP2005110127A
CLAIM 3
前記風雑音検出手段からの信号を用いて、音声信号 (sound signal) の低周波領域のレベルを連続的に制御する低周波領域連続制御手段を更に備えることを特徴とする請求項1又は2に記載の風雑音検出装置。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
WO2004027368A1

Filed: 2003-09-11     Issued: 2004-04-01

Audio decoding apparatus and method

(Original Assignee) Matsushita Electric Industrial Co., Ltd.; Nec Corporation     

Naoya Tanaka, Osamu Shimada, Mineo Tsushima, Takeshi Norimatsu, Kok Seng Chong, Kim Hann Kuah, Sua Hong Neo, Toshiyuki Nomura, Yuichiro Takamizawa, Masahiro Serizawa
US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (high frequency component) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
WO2004027368A1
CLAIM 9
. The audio decoding apparatus as described in claim 2 , wherein the bitstream contains encoded information for a narrowband audio signal and additional information used for enabling narrowband to wideband , the additional information containing high frequency component (frequency bins, frequency bin) information describing a feature of a signal in a higher frequency band than the frequency band of the first subband signals ;
and the bitstream demultiplexer further demultiplexes the additional information from the bitstream ;
and the band expander generates multiple second subband signals in a higher frequency band than the frequency band of the first subband signals from at least one first subband signal and the high frequency component information in the additional information .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin (high frequency component) by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (high frequency component) so as to produce a summed long-term correlation map .
WO2004027368A1
CLAIM 9
. The audio decoding apparatus as described in claim 2 , wherein the bitstream contains encoded information for a narrowband audio signal and additional information used for enabling narrowband to wideband , the additional information containing high frequency component (frequency bins, frequency bin) information describing a feature of a signal in a higher frequency band than the frequency band of the first subband signals ;
and the bitstream demultiplexer further demultiplexes the additional information from the bitstream ;
and the band expander generates multiple second subband signals in a higher frequency band than the frequency band of the first subband signals from at least one first subband signal and the high frequency component information in the additional information .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map for frequency bins (high frequency component) having a magnitude that exceeds a given fixed threshold .
WO2004027368A1
CLAIM 9
. The audio decoding apparatus as described in claim 2 , wherein the bitstream contains encoded information for a narrowband audio signal and additional information used for enabling narrowband to wideband , the additional information containing high frequency component (frequency bins, frequency bin) information describing a feature of a signal in a higher frequency band than the frequency band of the first subband signals ;
and the bitstream demultiplexer further demultiplexes the additional information from the bitstream ;
and the band expander generates multiple second subband signals in a higher frequency band than the frequency band of the first subband signals from at least one first subband signal and the high frequency component information in the additional information .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection further comprises updating the noise estimates for a next frame (adjacent sub) .
WO2004027368A1
CLAIM 6
. The audio decoding apparatus as described in claim 5 , wherein the aliasing detector evaluates a parameter denoting a slope of a frequency distribution in two adjacent sub (next frame) bands , and detects the degree of occurrence of aliasing components in the two subbands .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame (adjacent sub) comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
WO2004027368A1
CLAIM 6
. The audio decoding apparatus as described in claim 5 , wherein the aliasing detector evaluates a parameter denoting a slope of a frequency distribution in two adjacent sub (next frame) bands , and detects the degree of occurrence of aliasing components in the two subbands .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin (high frequency component) by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (high frequency component) so as to produce a summed long-term correlation map .
WO2004027368A1
CLAIM 9
. The audio decoding apparatus as described in claim 2 , wherein the bitstream contains encoded information for a narrowband audio signal and additional information used for enabling narrowband to wideband , the additional information containing high frequency component (frequency bins, frequency bin) information describing a feature of a signal in a higher frequency band than the frequency band of the first subband signals ;
and the bitstream demultiplexer further demultiplexes the additional information from the bitstream ;
and the band expander generates multiple second subband signals in a higher frequency band than the frequency band of the first subband signals from at least one first subband signal and the high frequency component information in the additional information .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal to noise ratio (square root) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
WO2004027368A1
CLAIM 14
. The audio decoding apparatus as described in claim 13 , wherein gain g of the second subband signal is g = sqrt{R/E/(1+Q)} where sqrt is a square root (noise ratio) operator .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20040225505A1

Filed: 2003-05-08     Issued: 2004-11-11

Audio coding systems and methods using spectral component coupling and spectral component regeneration

(Original Assignee) Dolby Laboratories Licensing Corp     (Current Assignee) Dolby Laboratories Licensing Corp

Robert Andersen, Michael Truman, Philip Williams, Stephen Vernon
US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin (third sets) by frequency bin basis ;

and summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US20040225505A1
CLAIM 30
. The method according to claim 28 that comprises : obtaining from the encoded signal an indication of frequency extents of the first , second or third sets (frequency bin, frequency bin basis) of frequency subbands ;
and adapting the generation of synthesized signals and decoupled signals in response to the indication .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame and an energy of the sound signal in a previous frame , for frequency bands (band information) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US20040225505A1
CLAIM 26
. The method according to claim 25 that adapts the generation of the associated synthesized signal in response to subband information (frequency bands, first frequency bands) conveyed in the encoded signal that specifies frequency extents of the frequency subbands .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (band information) into a first group of a certain number of first frequency (analysis filterbank) bands and a second group of a rest of the frequency bands ;

calculating a first energy (more output) value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20040225505A1
CLAIM 7
. The method according to claim 6 that comprises : applying a first analysis filterbank (first frequency) to the one or more input audio signals to obtain the one or more baseband signals and the one or more residual signals ;
and applying a second analysis filterbank to the one or more input audio signals to obtain additional spectral components ;
wherein the energy measures of spectral components in the residual signals are calculated from the spectral components of the residual signals and one or more of the additional spectral components .

US20040225505A1
CLAIM 18
. A method for decoding an encoded signal representing one or more input audio signals , wherein the method comprises : obtaining scaling information and signal information from the encoded signal , wherein the scaling information represents scale factors calculated from square roots of ratios of energy measures of spectral components or ratios of square roots of energy measures of spectral components , and the signal information represents spectral components for one or more baseband signals , wherein the spectral components in each baseband signal represent spectral components of a respective input audio signal in a first set of frequency subbands ;
generating for each respective baseband signal an associated synthesized signal having spectral components in a second set of frequency subbands that are not represented by the respective baseband signal , wherein the spectral components in the associated synthesized signal are scaled by multiplication or division according to one or more of the scale factors ;
and generating one or more output (first energy) audio signals , wherein each output audio signal represents a respective input audio signal and is generated from the spectral components in a respective baseband signal and its associated synthesized signal .

US20040225505A1
CLAIM 26
. The method according to claim 25 that adapts the generation of the associated synthesized signal in response to subband information (frequency bands, first frequency bands) conveyed in the encoded signal that specifies frequency extents of the frequency subbands .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin (third sets) by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US20040225505A1
CLAIM 30
. The method according to claim 28 that comprises : obtaining from the encoded signal an indication of frequency extents of the first , second or third sets (frequency bin, frequency bin basis) of frequency subbands ;
and adapting the generation of synthesized signals and decoupled signals in response to the indication .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
GB2400003A

Filed: 2003-03-22     Issued: 2004-09-29

Pitch estimation within a speech signal

(Original Assignee) Motorola Solutions Inc     (Current Assignee) Motorola Solutions Inc

Halil Fikretler, Jonathan Gibbs
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (Kalman filter) , the correlation map of a current frame , and an initial value of the long term correlation map .
GB2400003A
CLAIM 22
. A method according to any one of the preceding claims , wherein the estimated pitch track is input to a prototype waveform interpolation or a Kalman filter (update factor) method .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value (peak values) with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
GB2400003A
CLAIM 5
. A method according to any one of the preceding claims , wherein the determination of selected sample points as candidate positions includes the identification of the maximum value of the filtered signal , other peak values (correlation value) above a relative threshold to this maximum also being selected unless they are respectively ;
i) below a given threshold smaller than both adjacent peak values , typically 30% , ii) below a given threshold smaller than any adjacent peak value , typically 70% , or iii) less than a specified number of samples away from a larger peak value .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error (sample point) energies .
GB2400003A
CLAIM 1
. A method of sample-by-sample pitch estimation within a ;
speech signal , comprising steps A to E wherein A) Selecting sample point (residual error) s in the speech signal as candidate positions ;
B) Estimating candidate pitches at each of the plurality of said candidate positions ;
C) Refining said candidate pitches to sub-integer pitch estimates ;
D) Selecting from among said candidate pitches at each of the plurality of said candidate positions ;
and E) Interpolating between pitches selected at each of the plurality of said candidate positions , characterized by ;
during step A , selecting said candidate positions over a time history from peaks obtained using linear predictive coding inverse filtration of the speech signal .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (previous frame, speech signal) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
GB2400003A
CLAIM 1
. A method of sample-by-sample pitch estimation within a ;
speech signal (activity prediction parameter, noise character parameter) , comprising steps A to E wherein A) Selecting sample points in the speech signal as candidate positions ;
B) Estimating candidate pitches at each of the plurality of said candidate positions ;
C) Refining said candidate pitches to sub-integer pitch estimates ;
D) Selecting from among said candidate pitches at each of the plurality of said candidate positions ;
and E) Interpolating between pitches selected at each of the plurality of said candidate positions , characterized by ;
during step A , selecting said candidate positions over a time history from peaks obtained using linear predictive coding inverse filtration of the speech signal .

GB2400003A
CLAIM 13
. A method according to any one of claims 2-12 , wherein for each of the current and previous frame (activity prediction parameter, noise character parameter) , having been stored , the candidate pitch with the highest reliability over the total of the plurality of candidate positions is selected , said candidate pitch and associated candidate position defining a reference point .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (previous frame, speech signal) indicative of an activity of the sound signal .
GB2400003A
CLAIM 1
. A method of sample-by-sample pitch estimation within a ;
speech signal (activity prediction parameter, noise character parameter) , comprising steps A to E wherein A) Selecting sample points in the speech signal as candidate positions ;
B) Estimating candidate pitches at each of the plurality of said candidate positions ;
C) Refining said candidate pitches to sub-integer pitch estimates ;
D) Selecting from among said candidate pitches at each of the plurality of said candidate positions ;
and E) Interpolating between pitches selected at each of the plurality of said candidate positions , characterized by ;
during step A , selecting said candidate positions over a time history from peaks obtained using linear predictive coding inverse filtration of the speech signal .

GB2400003A
CLAIM 13
. A method according to any one of claims 2-12 , wherein for each of the current and previous frame (activity prediction parameter, noise character parameter) , having been stored , the candidate pitch with the highest reliability over the total of the plurality of candidate positions is selected , said candidate pitch and associated candidate position defining a reference point .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (previous frame, speech signal) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
GB2400003A
CLAIM 1
. A method of sample-by-sample pitch estimation within a ;
speech signal (activity prediction parameter, noise character parameter) , comprising steps A to E wherein A) Selecting sample points in the speech signal as candidate positions ;
B) Estimating candidate pitches at each of the plurality of said candidate positions ;
C) Refining said candidate pitches to sub-integer pitch estimates ;
D) Selecting from among said candidate pitches at each of the plurality of said candidate positions ;
and E) Interpolating between pitches selected at each of the plurality of said candidate positions , characterized by ;
during step A , selecting said candidate positions over a time history from peaks obtained using linear predictive coding inverse filtration of the speech signal .

GB2400003A
CLAIM 13
. A method according to any one of claims 2-12 , wherein for each of the current and previous frame (activity prediction parameter, noise character parameter) , having been stored , the candidate pitch with the highest reliability over the total of the plurality of candidate positions is selected , said candidate pitch and associated candidate position defining a reference point .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (previous frame, speech signal) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
GB2400003A
CLAIM 1
. A method of sample-by-sample pitch estimation within a ;
speech signal (activity prediction parameter, noise character parameter) , comprising steps A to E wherein A) Selecting sample points in the speech signal as candidate positions ;
B) Estimating candidate pitches at each of the plurality of said candidate positions ;
C) Refining said candidate pitches to sub-integer pitch estimates ;
D) Selecting from among said candidate pitches at each of the plurality of said candidate positions ;
and E) Interpolating between pitches selected at each of the plurality of said candidate positions , characterized by ;
during step A , selecting said candidate positions over a time history from peaks obtained using linear predictive coding inverse filtration of the speech signal .

GB2400003A
CLAIM 13
. A method according to any one of claims 2-12 , wherein for each of the current and previous frame (activity prediction parameter, noise character parameter) , having been stored , the candidate pitch with the highest reliability over the total of the plurality of candidate positions is selected , said candidate pitch and associated candidate position defining a reference point .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (previous frame, speech signal) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
GB2400003A
CLAIM 1
. A method of sample-by-sample pitch estimation within a ;
speech signal (activity prediction parameter, noise character parameter) , comprising steps A to E wherein A) Selecting sample points in the speech signal as candidate positions ;
B) Estimating candidate pitches at each of the plurality of said candidate positions ;
C) Refining said candidate pitches to sub-integer pitch estimates ;
D) Selecting from among said candidate pitches at each of the plurality of said candidate positions ;
and E) Interpolating between pitches selected at each of the plurality of said candidate positions , characterized by ;
during step A , selecting said candidate positions over a time history from peaks obtained using linear predictive coding inverse filtration of the speech signal .

GB2400003A
CLAIM 13
. A method according to any one of claims 2-12 , wherein for each of the current and previous frame (activity prediction parameter, noise character parameter) , having been stored , the candidate pitch with the highest reliability over the total of the plurality of candidate positions is selected , said candidate pitch and associated candidate position defining a reference point .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (previous frame, speech signal) inferior than a given fixed threshold .
GB2400003A
CLAIM 1
. A method of sample-by-sample pitch estimation within a ;
speech signal (activity prediction parameter, noise character parameter) , comprising steps A to E wherein A) Selecting sample points in the speech signal as candidate positions ;
B) Estimating candidate pitches at each of the plurality of said candidate positions ;
C) Refining said candidate pitches to sub-integer pitch estimates ;
D) Selecting from among said candidate pitches at each of the plurality of said candidate positions ;
and E) Interpolating between pitches selected at each of the plurality of said candidate positions , characterized by ;
during step A , selecting said candidate positions over a time history from peaks obtained using linear predictive coding inverse filtration of the speech signal .

GB2400003A
CLAIM 13
. A method according to any one of claims 2-12 , wherein for each of the current and previous frame (activity prediction parameter, noise character parameter) , having been stored , the candidate pitch with the highest reliability over the total of the plurality of candidate positions is selected , said candidate pitch and associated candidate position defining a reference point .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (Kalman filter) , the correlation map of a current frame , and an initial value of the long-term correlation map .
GB2400003A
CLAIM 22
. A method according to any one of the preceding claims , wherein the estimated pitch track is input to a prototype waveform interpolation or a Kalman filter (update factor) method .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (Kalman filter) , the correlation map of a current frame , and an initial value of the long-term correlation map .
GB2400003A
CLAIM 22
. A method according to any one of the preceding claims , wherein the estimated pitch track is input to a prototype waveform interpolation or a Kalman filter (update factor) method .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US7209567B1

Filed: 2003-03-10     Issued: 2007-04-24

Communication system with adaptive noise suppression

(Original Assignee) Purdue Research Foundation     (Current Assignee) Purdue Research Foundation

David Kozel, James A. Devault, Richard B. Birr
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (time frames) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US7209567B1
CLAIM 1
. A method of reducing noise in a communication system , the method comprising : averaging an input sound signal' ;
s magnitude spectrum over multiple time frames (frequency spectrum) to reduce musical noise ;
determining an average magnitude of a noise spectrum while speech is not present on the input sound signal , wherein the average magnitude is determined for each of a plurality of discrete frequencies of the noise spectrum ;
determining a maximum ratio of noise to average noise over each of a plurality of sub-bands ;
determining a running average of the maximum ratio of noise to average noise over each sub-band ;
receiving an indication that speech may be present on the input sound signal ;
and for each of a plurality of frames while receiving the indication that speech may be present on the input sound signal ;
detecting whether speech is present ;
while speech is detected , estimating a speech signal magnitude for each discrete frequency by subtracting from the input sound signal magnitude for that discrete frequency the average noise for that discrete frequency multiplied by the lesser of (a) a ratio of a sum of noise-corrupted speech to a sum of average noise for the frequency sub-band containing that discrete frequency and (b) the running average of the maximum ratio of noise to average noise for the frequency sub-band containing that discrete frequency ;
and while speech is not detected , estimating the speech signal magnitude to be zero .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (time frames) of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US7209567B1
CLAIM 1
. A method of reducing noise in a communication system , the method comprising : averaging an input sound signal' ;
s magnitude spectrum over multiple time frames (frequency spectrum) to reduce musical noise ;
determining an average magnitude of a noise spectrum while speech is not present on the input sound signal , wherein the average magnitude is determined for each of a plurality of discrete frequencies of the noise spectrum ;
determining a maximum ratio of noise to average noise over each of a plurality of sub-bands ;
determining a running average of the maximum ratio of noise to average noise over each sub-band ;
receiving an indication that speech may be present on the input sound signal ;
and for each of a plurality of frames while receiving the indication that speech may be present on the input sound signal ;
detecting whether speech is present ;
while speech is detected , estimating a speech signal magnitude for each discrete frequency by subtracting from the input sound signal magnitude for that discrete frequency the average noise for that discrete frequency multiplied by the lesser of (a) a ratio of a sum of noise-corrupted speech to a sum of average noise for the frequency sub-band containing that discrete frequency and (b) the running average of the maximum ratio of noise to average noise for the frequency sub-band containing that discrete frequency ;
and while speech is not detected , estimating the speech signal magnitude to be zero .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (frequency bins) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US7209567B1
CLAIM 12
. A method of reducing noise in a communication system , the method comprising : designating a plurality of frequency sub-bands for a signal spectrum of interest ;
designating a plurality of frequency bins (frequency bins) for each of said sub-bands ;
during an initialization/update mode , determining , for each bin , an average magnitude of noise in said system over a first set of time frames ;
obtaining , for each sub-band , a noise sum equal to the sum of the average noise magnitudes for the bins in the sub-band ;
for each of said frames in said first set , a) determining the ratio of noise to said average noise for each bin ;
b) determining for each sub-band , the maximum ratio of noise to said average noise for the bins therein ;
determining a running average of said maximum ratio for each sub-band ;
and during a noise reduction mode , for each frame in a second set of time frames , a) obtaining , for each sub-band , an input signal sum equal to the sum of the magnitudes of an input sound signal for the bins in the sub-band ;
b) determining the ratio of said input signal sum to said noise sum ;
and c) estimating a speech signal magnitude for a given bin as a function of i) the input sound signal magnitude for the given bin ;
ii) said average noise for the given bin ;
iii) the ratio of said input signal sum to said noise sum ;
and iv) said running average .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (frequency bins) so as to produce a summed long-term correlation map .
US7209567B1
CLAIM 12
. A method of reducing noise in a communication system , the method comprising : designating a plurality of frequency sub-bands for a signal spectrum of interest ;
designating a plurality of frequency bins (frequency bins) for each of said sub-bands ;
during an initialization/update mode , determining , for each bin , an average magnitude of noise in said system over a first set of time frames ;
obtaining , for each sub-band , a noise sum equal to the sum of the average noise magnitudes for the bins in the sub-band ;
for each of said frames in said first set , a) determining the ratio of noise to said average noise for each bin ;
b) determining for each sub-band , the maximum ratio of noise to said average noise for the bins therein ;
determining a running average of said maximum ratio for each sub-band ;
and during a noise reduction mode , for each frame in a second set of time frames , a) obtaining , for each sub-band , an input signal sum equal to the sum of the magnitudes of an input sound signal for the bins in the sub-band ;
b) determining the ratio of said input signal sum to said noise sum ;
and c) estimating a speech signal magnitude for a given bin as a function of i) the input sound signal magnitude for the given bin ;
ii) said average noise for the given bin ;
iii) the ratio of said input signal sum to said noise sum ;
and iv) said running average .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map for frequency bins (frequency bins) having a magnitude that exceeds a given fixed threshold .
US7209567B1
CLAIM 12
. A method of reducing noise in a communication system , the method comprising : designating a plurality of frequency sub-bands for a signal spectrum of interest ;
designating a plurality of frequency bins (frequency bins) for each of said sub-bands ;
during an initialization/update mode , determining , for each bin , an average magnitude of noise in said system over a first set of time frames ;
obtaining , for each sub-band , a noise sum equal to the sum of the average noise magnitudes for the bins in the sub-band ;
for each of said frames in said first set , a) determining the ratio of noise to said average noise for each bin ;
b) determining for each sub-band , the maximum ratio of noise to said average noise for the bins therein ;
determining a running average of said maximum ratio for each sub-band ;
and during a noise reduction mode , for each frame in a second set of time frames , a) obtaining , for each sub-band , an input signal sum equal to the sum of the magnitudes of an input sound signal for the bins in the sub-band ;
b) determining the ratio of said input signal sum to said noise sum ;
and c) estimating a speech signal magnitude for a given bin as a function of i) the input sound signal magnitude for the given bin ;
ii) said average noise for the given bin ;
iii) the ratio of said input signal sum to said noise sum ;
and iv) said running average .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (noise reduction) indicative of sound activity in the sound signal .
US7209567B1
CLAIM 12
. A method of reducing noise in a communication system , the method comprising : designating a plurality of frequency sub-bands for a signal spectrum of interest ;
designating a plurality of frequency bins for each of said sub-bands ;
during an initialization/update mode , determining , for each bin , an average magnitude of noise in said system over a first set of time frames ;
obtaining , for each sub-band , a noise sum equal to the sum of the average noise magnitudes for the bins in the sub-band ;
for each of said frames in said first set , a) determining the ratio of noise to said average noise for each bin ;
b) determining for each sub-band , the maximum ratio of noise to said average noise for the bins therein ;
determining a running average of said maximum ratio for each sub-band ;
and during a noise reduction (adaptive threshold) mode , for each frame in a second set of time frames , a) obtaining , for each sub-band , an input signal sum equal to the sum of the magnitudes of an input sound signal for the bins in the sub-band ;
b) determining the ratio of said input signal sum to said noise sum ;
and c) estimating a speech signal magnitude for a given bin as a function of i) the input sound signal magnitude for the given bin ;
ii) said average noise for the given bin ;
iii) the ratio of said input signal sum to said noise sum ;
and iv) said running average .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (noise ratio) .
US7209567B1
CLAIM 18
. A method of reducing noise in a communication system , the method comprising : designating a plurality of frequency sub-bands for a signal spectrum of interest ;
designating a plurality of frequency bins for each of said sub-bands ;
during an initialization/update mode , determining , for each bin , an average magnitude of noise in said system over a first set of time frames ;
obtaining an indication of noise strength for each sub-band ;
for each of said frames in said first set , determining a noise deviation for each sub-band by a) determining the ratio of noise to said average noise for each bin ;
b) determining , for the sub-band , the maximum ratio of noise to said average noise for the bins therein ;
and during a noise reduction mode , for each frame in a second set of time frames in which an input signal is received , a) obtaining an indication of input signal strength for each sub-band ;
b) determining a signal-to-noise ratio (noise ratio, SNR LT, SNR calculation) as the ratio of said input signal strength indication to said noise strength indication ;
and c) estimating a speech signal magnitude for a given bin as a function of i) the input sound signal magnitude for the given bin ;
ii) said average noise for the given bin ;
iii) said signal-to-noise ratio ;
and iv) said noise deviation .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (speech signal) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
US7209567B1
CLAIM 1
. A method of reducing noise in a communication system , the method comprising : averaging an input sound signal' ;
s magnitude spectrum over multiple time frames to reduce musical noise ;
determining an average magnitude of a noise spectrum while speech is not present on the input sound signal , wherein the average magnitude is determined for each of a plurality of discrete frequencies of the noise spectrum ;
determining a maximum ratio of noise to average noise over each of a plurality of sub-bands ;
determining a running average of the maximum ratio of noise to average noise over each sub-band ;
receiving an indication that speech may be present on the input sound signal ;
and for each of a plurality of frames while receiving the indication that speech may be present on the input sound signal ;
detecting whether speech is present ;
while speech is detected , estimating a speech signal (noise character parameter, activity prediction parameter) magnitude for each discrete frequency by subtracting from the input sound signal magnitude for that discrete frequency the average noise for that discrete frequency multiplied by the lesser of (a) a ratio of a sum of noise-corrupted speech to a sum of average noise for the frequency sub-band containing that discrete frequency and (b) the running average of the maximum ratio of noise to average noise for the frequency sub-band containing that discrete frequency ;
and while speech is not detected , estimating the speech signal magnitude to be zero .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (speech signal) indicative of an activity of the sound signal .
US7209567B1
CLAIM 1
. A method of reducing noise in a communication system , the method comprising : averaging an input sound signal' ;
s magnitude spectrum over multiple time frames to reduce musical noise ;
determining an average magnitude of a noise spectrum while speech is not present on the input sound signal , wherein the average magnitude is determined for each of a plurality of discrete frequencies of the noise spectrum ;
determining a maximum ratio of noise to average noise over each of a plurality of sub-bands ;
determining a running average of the maximum ratio of noise to average noise over each sub-band ;
receiving an indication that speech may be present on the input sound signal ;
and for each of a plurality of frames while receiving the indication that speech may be present on the input sound signal ;
detecting whether speech is present ;
while speech is detected , estimating a speech signal (noise character parameter, activity prediction parameter) magnitude for each discrete frequency by subtracting from the input sound signal magnitude for that discrete frequency the average noise for that discrete frequency multiplied by the lesser of (a) a ratio of a sum of noise-corrupted speech to a sum of average noise for the frequency sub-band containing that discrete frequency and (b) the running average of the maximum ratio of noise to average noise for the frequency sub-band containing that discrete frequency ;
and while speech is not detected , estimating the speech signal magnitude to be zero .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (speech signal) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
US7209567B1
CLAIM 1
. A method of reducing noise in a communication system , the method comprising : averaging an input sound signal' ;
s magnitude spectrum over multiple time frames to reduce musical noise ;
determining an average magnitude of a noise spectrum while speech is not present on the input sound signal , wherein the average magnitude is determined for each of a plurality of discrete frequencies of the noise spectrum ;
determining a maximum ratio of noise to average noise over each of a plurality of sub-bands ;
determining a running average of the maximum ratio of noise to average noise over each sub-band ;
receiving an indication that speech may be present on the input sound signal ;
and for each of a plurality of frames while receiving the indication that speech may be present on the input sound signal ;
detecting whether speech is present ;
while speech is detected , estimating a speech signal (noise character parameter, activity prediction parameter) magnitude for each discrete frequency by subtracting from the input sound signal magnitude for that discrete frequency the average noise for that discrete frequency multiplied by the lesser of (a) a ratio of a sum of noise-corrupted speech to a sum of average noise for the frequency sub-band containing that discrete frequency and (b) the running average of the maximum ratio of noise to average noise for the frequency sub-band containing that discrete frequency ;
and while speech is not detected , estimating the speech signal magnitude to be zero .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (speech signal) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US7209567B1
CLAIM 1
. A method of reducing noise in a communication system , the method comprising : averaging an input sound signal' ;
s magnitude spectrum over multiple time frames to reduce musical noise ;
determining an average magnitude of a noise spectrum while speech is not present on the input sound signal , wherein the average magnitude is determined for each of a plurality of discrete frequencies of the noise spectrum ;
determining a maximum ratio of noise to average noise over each of a plurality of sub-bands ;
determining a running average of the maximum ratio of noise to average noise over each sub-band ;
receiving an indication that speech may be present on the input sound signal ;
and for each of a plurality of frames while receiving the indication that speech may be present on the input sound signal ;
detecting whether speech is present ;
while speech is detected , estimating a speech signal (noise character parameter, activity prediction parameter) magnitude for each discrete frequency by subtracting from the input sound signal magnitude for that discrete frequency the average noise for that discrete frequency multiplied by the lesser of (a) a ratio of a sum of noise-corrupted speech to a sum of average noise for the frequency sub-band containing that discrete frequency and (b) the running average of the maximum ratio of noise to average noise for the frequency sub-band containing that discrete frequency ;
and while speech is not detected , estimating the speech signal magnitude to be zero .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (speech signal) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US7209567B1
CLAIM 1
. A method of reducing noise in a communication system , the method comprising : averaging an input sound signal' ;
s magnitude spectrum over multiple time frames to reduce musical noise ;
determining an average magnitude of a noise spectrum while speech is not present on the input sound signal , wherein the average magnitude is determined for each of a plurality of discrete frequencies of the noise spectrum ;
determining a maximum ratio of noise to average noise over each of a plurality of sub-bands ;
determining a running average of the maximum ratio of noise to average noise over each sub-band ;
receiving an indication that speech may be present on the input sound signal ;
and for each of a plurality of frames while receiving the indication that speech may be present on the input sound signal ;
detecting whether speech is present ;
while speech is detected , estimating a speech signal (noise character parameter, activity prediction parameter) magnitude for each discrete frequency by subtracting from the input sound signal magnitude for that discrete frequency the average noise for that discrete frequency multiplied by the lesser of (a) a ratio of a sum of noise-corrupted speech to a sum of average noise for the frequency sub-band containing that discrete frequency and (b) the running average of the maximum ratio of noise to average noise for the frequency sub-band containing that discrete frequency ;
and while speech is not detected , estimating the speech signal magnitude to be zero .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (speech signal) inferior than a given fixed threshold .
US7209567B1
CLAIM 1
. A method of reducing noise in a communication system , the method comprising : averaging an input sound signal' ;
s magnitude spectrum over multiple time frames to reduce musical noise ;
determining an average magnitude of a noise spectrum while speech is not present on the input sound signal , wherein the average magnitude is determined for each of a plurality of discrete frequencies of the noise spectrum ;
determining a maximum ratio of noise to average noise over each of a plurality of sub-bands ;
determining a running average of the maximum ratio of noise to average noise over each sub-band ;
receiving an indication that speech may be present on the input sound signal ;
and for each of a plurality of frames while receiving the indication that speech may be present on the input sound signal ;
detecting whether speech is present ;
while speech is detected , estimating a speech signal (noise character parameter, activity prediction parameter) magnitude for each discrete frequency by subtracting from the input sound signal magnitude for that discrete frequency the average noise for that discrete frequency multiplied by the lesser of (a) a ratio of a sum of noise-corrupted speech to a sum of average noise for the frequency sub-band containing that discrete frequency and (b) the running average of the maximum ratio of noise to average noise for the frequency sub-band containing that discrete frequency ;
and while speech is not detected , estimating the speech signal magnitude to be zero .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (time frames) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US7209567B1
CLAIM 1
. A method of reducing noise in a communication system , the method comprising : averaging an input sound signal' ;
s magnitude spectrum over multiple time frames (frequency spectrum) to reduce musical noise ;
determining an average magnitude of a noise spectrum while speech is not present on the input sound signal , wherein the average magnitude is determined for each of a plurality of discrete frequencies of the noise spectrum ;
determining a maximum ratio of noise to average noise over each of a plurality of sub-bands ;
determining a running average of the maximum ratio of noise to average noise over each sub-band ;
receiving an indication that speech may be present on the input sound signal ;
and for each of a plurality of frames while receiving the indication that speech may be present on the input sound signal ;
detecting whether speech is present ;
while speech is detected , estimating a speech signal magnitude for each discrete frequency by subtracting from the input sound signal magnitude for that discrete frequency the average noise for that discrete frequency multiplied by the lesser of (a) a ratio of a sum of noise-corrupted speech to a sum of average noise for the frequency sub-band containing that discrete frequency and (b) the running average of the maximum ratio of noise to average noise for the frequency sub-band containing that discrete frequency ;
and while speech is not detected , estimating the speech signal magnitude to be zero .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (time frames) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US7209567B1
CLAIM 1
. A method of reducing noise in a communication system , the method comprising : averaging an input sound signal' ;
s magnitude spectrum over multiple time frames (frequency spectrum) to reduce musical noise ;
determining an average magnitude of a noise spectrum while speech is not present on the input sound signal , wherein the average magnitude is determined for each of a plurality of discrete frequencies of the noise spectrum ;
determining a maximum ratio of noise to average noise over each of a plurality of sub-bands ;
determining a running average of the maximum ratio of noise to average noise over each sub-band ;
receiving an indication that speech may be present on the input sound signal ;
and for each of a plurality of frames while receiving the indication that speech may be present on the input sound signal ;
detecting whether speech is present ;
while speech is detected , estimating a speech signal magnitude for each discrete frequency by subtracting from the input sound signal magnitude for that discrete frequency the average noise for that discrete frequency multiplied by the lesser of (a) a ratio of a sum of noise-corrupted speech to a sum of average noise for the frequency sub-band containing that discrete frequency and (b) the running average of the maximum ratio of noise to average noise for the frequency sub-band containing that discrete frequency ;
and while speech is not detected , estimating the speech signal magnitude to be zero .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (time frames) of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US7209567B1
CLAIM 1
. A method of reducing noise in a communication system , the method comprising : averaging an input sound signal' ;
s magnitude spectrum over multiple time frames (frequency spectrum) to reduce musical noise ;
determining an average magnitude of a noise spectrum while speech is not present on the input sound signal , wherein the average magnitude is determined for each of a plurality of discrete frequencies of the noise spectrum ;
determining a maximum ratio of noise to average noise over each of a plurality of sub-bands ;
determining a running average of the maximum ratio of noise to average noise over each sub-band ;
receiving an indication that speech may be present on the input sound signal ;
and for each of a plurality of frames while receiving the indication that speech may be present on the input sound signal ;
detecting whether speech is present ;
while speech is detected , estimating a speech signal magnitude for each discrete frequency by subtracting from the input sound signal magnitude for that discrete frequency the average noise for that discrete frequency multiplied by the lesser of (a) a ratio of a sum of noise-corrupted speech to a sum of average noise for the frequency sub-band containing that discrete frequency and (b) the running average of the maximum ratio of noise to average noise for the frequency sub-band containing that discrete frequency ;
and while speech is not detected , estimating the speech signal magnitude to be zero .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (frequency bins) so as to produce a summed long-term correlation map .
US7209567B1
CLAIM 12
. A method of reducing noise in a communication system , the method comprising : designating a plurality of frequency sub-bands for a signal spectrum of interest ;
designating a plurality of frequency bins (frequency bins) for each of said sub-bands ;
during an initialization/update mode , determining , for each bin , an average magnitude of noise in said system over a first set of time frames ;
obtaining , for each sub-band , a noise sum equal to the sum of the average noise magnitudes for the bins in the sub-band ;
for each of said frames in said first set , a) determining the ratio of noise to said average noise for each bin ;
b) determining for each sub-band , the maximum ratio of noise to said average noise for the bins therein ;
determining a running average of said maximum ratio for each sub-band ;
and during a noise reduction mode , for each frame in a second set of time frames , a) obtaining , for each sub-band , an input signal sum equal to the sum of the magnitudes of an input sound signal for the bins in the sub-band ;
b) determining the ratio of said input signal sum to said noise sum ;
and c) estimating a speech signal magnitude for a given bin as a function of i) the input sound signal magnitude for the given bin ;
ii) said average noise for the given bin ;
iii) the ratio of said input signal sum to said noise sum ;
and iv) said running average .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal to noise ratio (noise ratio) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US7209567B1
CLAIM 18
. A method of reducing noise in a communication system , the method comprising : designating a plurality of frequency sub-bands for a signal spectrum of interest ;
designating a plurality of frequency bins for each of said sub-bands ;
during an initialization/update mode , determining , for each bin , an average magnitude of noise in said system over a first set of time frames ;
obtaining an indication of noise strength for each sub-band ;
for each of said frames in said first set , determining a noise deviation for each sub-band by a) determining the ratio of noise to said average noise for each bin ;
b) determining , for the sub-band , the maximum ratio of noise to said average noise for the bins therein ;
and during a noise reduction mode , for each frame in a second set of time frames in which an input signal is received , a) obtaining an indication of input signal strength for each sub-band ;
b) determining a signal-to-noise ratio (noise ratio, SNR LT, SNR calculation) as the ratio of said input signal strength indication to said noise strength indication ;
and c) estimating a speech signal magnitude for a given bin as a function of i) the input sound signal magnitude for the given bin ;
ii) said average noise for the given bin ;
iii) said signal-to-noise ratio ;
and iv) said noise deviation .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
EP1474755A1

Filed: 2003-02-11     Issued: 2004-11-10

Filter set for frequency analysis

(Original Assignee) Audience LLC     (Current Assignee) Audience LLC

Lloyd Watts
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (frequency channel) using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
EP1474755A1
CLAIM 8
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first filter wherein the first filter is configured to separate part of the signal into a first output frequency channel (sound signal, sound signal prevents updating) ;
and processing the signal with a second filter wherein the second filter is configured to separate part of the signal into a second output frequency channel wherein the first frequency channel emphasizes higher frequencies than the second frequency channel ;
and wherein the second filter has a different Q than the first filter .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal (frequency channel) in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
EP1474755A1
CLAIM 8
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first filter wherein the first filter is configured to separate part of the signal into a first output frequency channel (sound signal, sound signal prevents updating) ;
and processing the signal with a second filter wherein the second filter is configured to separate part of the signal into a second output frequency channel wherein the first frequency channel emphasizes higher frequencies than the second frequency channel ;
and wherein the second filter has a different Q than the first filter .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (sampled signal) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
EP1474755A1
CLAIM 1
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first set of low pass filters to derive a first set of frequency components wherein the first set of low pass filters are arranged serially in a chain having a first low pass filter and a last low pass filter , the output of each low pass filter being fed to the next low pass filter in the chain until the last low pass filter ;
downsampling the output of the last low pass filter to produce a downsampled signal (frequency bins) ;
processing the downsampled signal with a second set of low pass filters to derive a second set of frequency components .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (sampled signal) so as to produce a summed long-term correlation map .
EP1474755A1
CLAIM 1
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first set of low pass filters to derive a first set of frequency components wherein the first set of low pass filters are arranged serially in a chain having a first low pass filter and a last low pass filter , the output of each low pass filter being fed to the next low pass filter in the chain until the last low pass filter ;
downsampling the output of the last low pass filter to produce a downsampled signal (frequency bins) ;
processing the downsampled signal with a second set of low pass filters to derive a second set of frequency components .

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (frequency channel) .
EP1474755A1
CLAIM 8
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first filter wherein the first filter is configured to separate part of the signal into a first output frequency channel (sound signal, sound signal prevents updating) ;
and processing the signal with a second filter wherein the second filter is configured to separate part of the signal into a second output frequency channel wherein the first frequency channel emphasizes higher frequencies than the second frequency channel ;
and wherein the second filter has a different Q than the first filter .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (frequency channel) comprises searching in the correlation map for frequency bins (sampled signal) having a magnitude that exceeds a given fixed threshold .
EP1474755A1
CLAIM 1
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first set of low pass filters to derive a first set of frequency components wherein the first set of low pass filters are arranged serially in a chain having a first low pass filter and a last low pass filter , the output of each low pass filter being fed to the next low pass filter in the chain until the last low pass filter ;
downsampling the output of the last low pass filter to produce a downsampled signal (frequency bins) ;
processing the downsampled signal with a second set of low pass filters to derive a second set of frequency components .

EP1474755A1
CLAIM 8
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first filter wherein the first filter is configured to separate part of the signal into a first output frequency channel (sound signal, sound signal prevents updating) ;
and processing the signal with a second filter wherein the second filter is configured to separate part of the signal into a second output frequency channel wherein the first frequency channel emphasizes higher frequencies than the second frequency channel ;
and wherein the second filter has a different Q than the first filter .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (frequency channel) comprises comparing the summed long-term correlation map with an adaptive threshold indicative of sound activity in the sound signal .
EP1474755A1
CLAIM 8
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first filter wherein the first filter is configured to separate part of the signal into a first output frequency channel (sound signal, sound signal prevents updating) ;
and processing the signal with a second filter wherein the second filter is configured to separate part of the signal into a second output frequency channel wherein the first frequency channel emphasizes higher frequencies than the second frequency channel ;
and wherein the second filter has a different Q than the first filter .

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal (frequency channel) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (first filter) ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
EP1474755A1
CLAIM 8
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first filter (background noise signal, noise ratio) wherein the first filter is configured to separate part of the signal into a first output frequency channel (sound signal, sound signal prevents updating) ;
and processing the signal with a second filter wherein the second filter is configured to separate part of the signal into a second output frequency channel wherein the first frequency channel emphasizes higher frequencies than the second frequency channel ;
and wherein the second filter has a different Q than the first filter .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound signal (frequency channel) is detected .
EP1474755A1
CLAIM 8
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first filter wherein the first filter is configured to separate part of the signal into a first output frequency channel (sound signal, sound signal prevents updating) ;
and processing the signal with a second filter wherein the second filter is configured to separate part of the signal into a second output frequency channel wherein the first frequency channel emphasizes higher frequencies than the second frequency channel ;
and wherein the second filter has a different Q than the first filter .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity in the sound signal (frequency channel) further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
EP1474755A1
CLAIM 8
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first filter wherein the first filter is configured to separate part of the signal into a first output frequency channel (sound signal, sound signal prevents updating) ;
and processing the signal with a second filter wherein the second filter is configured to separate part of the signal into a second output frequency channel wherein the first frequency channel emphasizes higher frequencies than the second frequency channel ;
and wherein the second filter has a different Q than the first filter .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection comprises detecting the sound signal (frequency channel) based on a frequency dependent signal-to-noise ratio (SNR) .
EP1474755A1
CLAIM 8
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first filter wherein the first filter is configured to separate part of the signal into a first output frequency channel (sound signal, sound signal prevents updating) ;
and processing the signal with a second filter wherein the second filter is configured to separate part of the signal into a second output frequency channel wherein the first frequency channel emphasizes higher frequencies than the second frequency channel ;
and wherein the second filter has a different Q than the first filter .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal (frequency channel) further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
EP1474755A1
CLAIM 8
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first filter wherein the first filter is configured to separate part of the signal into a first output frequency channel (sound signal, sound signal prevents updating) ;
and processing the signal with a second filter wherein the second filter is configured to separate part of the signal into a second output frequency channel wherein the first frequency channel emphasizes higher frequencies than the second frequency channel ;
and wherein the second filter has a different Q than the first filter .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal (frequency channel) and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
EP1474755A1
CLAIM 8
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first filter wherein the first filter is configured to separate part of the signal into a first output frequency channel (sound signal, sound signal prevents updating) ;
and processing the signal with a second filter wherein the second filter is configured to separate part of the signal into a second output frequency channel wherein the first frequency channel emphasizes higher frequencies than the second frequency channel ;
and wherein the second filter has a different Q than the first filter .

US8990073B2
CLAIM 18
. A method as defined in claim 14 , comprising classifying the sound signal (frequency channel) as one of an inactive sound signal and active sound signal , which comprises determining an inactive sound signal when the average signal-to-noise ratio (SNR av ) is inferior to the calculated threshold .
EP1474755A1
CLAIM 8
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first filter wherein the first filter is configured to separate part of the signal into a first output frequency channel (sound signal, sound signal prevents updating) ;
and processing the signal with a second filter wherein the second filter is configured to separate part of the signal into a second output frequency channel wherein the first frequency channel emphasizes higher frequencies than the second frequency channel ;
and wherein the second filter has a different Q than the first filter .

US8990073B2
CLAIM 19
. A method as defined in claim 14 , comprising classifying the sound signal (frequency channel) as one of an inactive sound signal and active sound signal , which comprises determining an active sound signal when the average signal-to-noise ratio (SNR av ) is larger than the calculated threshold .
EP1474755A1
CLAIM 8
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first filter wherein the first filter is configured to separate part of the signal into a first output frequency channel (sound signal, sound signal prevents updating) ;
and processing the signal with a second filter wherein the second filter is configured to separate part of the signal into a second output frequency channel wherein the first frequency channel emphasizes higher frequencies than the second frequency channel ;
and wherein the second filter has a different Q than the first filter .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal (frequency channel) prevents updating of noise energy estimates when a music signal is detected .
EP1474755A1
CLAIM 8
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first filter wherein the first filter is configured to separate part of the signal into a first output frequency channel (sound signal, sound signal prevents updating) ;
and processing the signal with a second filter wherein the second filter is configured to separate part of the signal into a second output frequency channel wherein the first frequency channel emphasizes higher frequencies than the second frequency channel ;
and wherein the second filter has a different Q than the first filter .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal (first filter) and prevent update of noise energy estimates on the music signal .
EP1474755A1
CLAIM 8
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first filter (background noise signal, noise ratio) wherein the first filter is configured to separate part of the signal into a first output frequency channel ;
and processing the signal with a second filter wherein the second filter is configured to separate part of the signal into a second output frequency channel wherein the first frequency channel emphasizes higher frequencies than the second frequency channel ;
and wherein the second filter has a different Q than the first filter .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal (frequency channel) in a current frame and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
EP1474755A1
CLAIM 8
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first filter wherein the first filter is configured to separate part of the signal into a first output frequency channel (sound signal, sound signal prevents updating) ;
and processing the signal with a second filter wherein the second filter is configured to separate part of the signal into a second output frequency channel wherein the first frequency channel emphasizes higher frequencies than the second frequency channel ;
and wherein the second filter has a different Q than the first filter .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter indicative of an activity of the sound signal (frequency channel) .
EP1474755A1
CLAIM 8
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first filter wherein the first filter is configured to separate part of the signal into a first output frequency channel (sound signal, sound signal prevents updating) ;
and processing the signal with a second filter wherein the second filter is configured to separate part of the signal into a second output frequency channel wherein the first frequency channel emphasizes higher frequencies than the second frequency channel ;
and wherein the second filter has a different Q than the first filter .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal (frequency channel) and the complementary non-stationarity parameter .
EP1474755A1
CLAIM 8
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first filter wherein the first filter is configured to separate part of the signal into a first output frequency channel (sound signal, sound signal prevents updating) ;
and processing the signal with a second filter wherein the second filter is configured to separate part of the signal into a second output frequency channel wherein the first frequency channel emphasizes higher frequencies than the second frequency channel ;
and wherein the second filter has a different Q than the first filter .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency (first frequency) bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy (audio stream) value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
EP1474755A1
CLAIM 6
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first low pass filter to produce a first low pass filtered signal ;
subtracting the first low pass filtered signal from the input signal to derive a first frequency (first frequency) component ;
processing the signal with a second low pass filter to produce a second low pass filtered signal ;
and subtracting the second low pass filtered signal from the first low pass filtered signal to derive a second frequency component .

EP1474755A1
CLAIM 18
. A system for analyzing an input signal into a plurality of frequency components as recited in claim 12 wherein the system is used for audio stream (second energy) separation

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal (frequency channel) using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
EP1474755A1
CLAIM 8
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first filter wherein the first filter is configured to separate part of the signal into a first output frequency channel (sound signal, sound signal prevents updating) ;
and processing the signal with a second filter wherein the second filter is configured to separate part of the signal into a second output frequency channel wherein the first frequency channel emphasizes higher frequencies than the second frequency channel ;
and wherein the second filter has a different Q than the first filter .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal (frequency channel) using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
EP1474755A1
CLAIM 8
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first filter wherein the first filter is configured to separate part of the signal into a first output frequency channel (sound signal, sound signal prevents updating) ;
and processing the signal with a second filter wherein the second filter is configured to separate part of the signal into a second output frequency channel wherein the first frequency channel emphasizes higher frequencies than the second frequency channel ;
and wherein the second filter has a different Q than the first filter .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal (frequency channel) in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
EP1474755A1
CLAIM 8
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first filter wherein the first filter is configured to separate part of the signal into a first output frequency channel (sound signal, sound signal prevents updating) ;
and processing the signal with a second filter wherein the second filter is configured to separate part of the signal into a second output frequency channel wherein the first frequency channel emphasizes higher frequencies than the second frequency channel ;
and wherein the second filter has a different Q than the first filter .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (sampled signal) so as to produce a summed long-term correlation map .
EP1474755A1
CLAIM 1
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first set of low pass filters to derive a first set of frequency components wherein the first set of low pass filters are arranged serially in a chain having a first low pass filter and a last low pass filter , the output of each low pass filter being fed to the next low pass filter in the chain until the last low pass filter ;
downsampling the output of the last low pass filter to produce a downsampled signal (frequency bins) ;
processing the downsampled signal with a second set of low pass filters to derive a second set of frequency components .

US8990073B2
CLAIM 34
. A device as defined in claim 31 , further comprising a detector of strong tones in the sound signal (frequency channel) .
EP1474755A1
CLAIM 8
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first filter wherein the first filter is configured to separate part of the signal into a first output frequency channel (sound signal, sound signal prevents updating) ;
and processing the signal with a second filter wherein the second filter is configured to separate part of the signal into a second output frequency channel wherein the first frequency channel emphasizes higher frequencies than the second frequency channel ;
and wherein the second filter has a different Q than the first filter .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal (frequency channel) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (first filter) ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
EP1474755A1
CLAIM 8
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first filter (background noise signal, noise ratio) wherein the first filter is configured to separate part of the signal into a first output frequency channel (sound signal, sound signal prevents updating) ;
and processing the signal with a second filter wherein the second filter is configured to separate part of the signal into a second output frequency channel wherein the first frequency channel emphasizes higher frequencies than the second frequency channel ;
and wherein the second filter has a different Q than the first filter .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal (frequency channel) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal (first filter) ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
EP1474755A1
CLAIM 8
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first filter (background noise signal, noise ratio) wherein the first filter is configured to separate part of the signal into a first output frequency channel (sound signal, sound signal prevents updating) ;
and processing the signal with a second filter wherein the second filter is configured to separate part of the signal into a second output frequency channel wherein the first frequency channel emphasizes higher frequencies than the second frequency channel ;
and wherein the second filter has a different Q than the first filter .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal to noise ratio (first filter) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
EP1474755A1
CLAIM 8
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first filter (background noise signal, noise ratio) wherein the first filter is configured to separate part of the signal into a first output frequency channel ;
and processing the signal with a second filter wherein the second filter is configured to separate part of the signal into a second output frequency channel wherein the first frequency channel emphasizes higher frequencies than the second frequency channel ;
and wherein the second filter has a different Q than the first filter .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal (frequency channel) for distinguishing a music signal from a background noise signal (first filter) and preventing update of noise energy estimates .
EP1474755A1
CLAIM 8
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first filter (background noise signal, noise ratio) wherein the first filter is configured to separate part of the signal into a first output frequency channel (sound signal, sound signal prevents updating) ;
and processing the signal with a second filter wherein the second filter is configured to separate part of the signal into a second output frequency channel wherein the first frequency channel emphasizes higher frequencies than the second frequency channel ;
and wherein the second filter has a different Q than the first filter .

US8990073B2
CLAIM 41
. A device as defined in claim 36 , further comprising a calculator of a spectral parameter used for detecting spectral changes and spectral attacks in the sound signal (frequency channel) .
EP1474755A1
CLAIM 8
. A method of analyzing an input signal into a plurality of frequency components comprising : processing the signal with a first filter wherein the first filter is configured to separate part of the signal into a first output frequency channel (sound signal, sound signal prevents updating) ;
and processing the signal with a second filter wherein the second filter is configured to separate part of the signal into a second output frequency channel wherein the first frequency channel emphasizes higher frequencies than the second frequency channel ;
and wherein the second filter has a different Q than the first filter .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6985856B2

Filed: 2002-12-31     Issued: 2006-01-10

Method and device for compressed-domain packet loss concealment

(Original Assignee) Nokia Oyj     (Current Assignee) Provenance Asset Group LLC ; Nokia USA Inc

Ye Wang, Juha Ojanperä, Jari Korhonen
US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal and prevent update (transform coefficient) of noise energy estimates on the music signal .
US6985856B2
CLAIM 8
. The method of claim 1 , wherein said at least one defective data part in the current frame includes a plurality of transform coefficient (prevent update) s and said at least one of the stored data parts includes the plurality of transform coefficients in said at least one neighboring frame for recovering said at least one defective data part in the current frame .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy (audio stream) value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US6985856B2
CLAIM 12
. An audio receiver adapted to receive packet data in audio stream (second energy) ing , said receiver comprising an unpacking module for unpacking the received packet data into a bitstream indicative of audio signals , wherein the bitstream comprises a current frame and at least one neighboring frame , each frame having a plurality of data parts , said receiver characterized by a decoding module , for decoding said each frame for providing a signal indicative of the plurality of data parts in a compressed domain , by a storage module , responsive to the signal , for storing said plurality of data parts in the compressed domain in said at least one neighboring frame , and by an error concealing module for detecting at least one data part in the current frame if the current frame is defective so as to recover said at least one defective data part in the current frame based on at least one of the stored data parts in said at least one neighboring frame .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20030078770A1

Filed: 2002-10-25     Issued: 2003-04-24

Method for detecting a voice activity decision (voice activity detector)

(Original Assignee) Deutsche Telekom AG     (Current Assignee) Deutsche Telekom AG

Alexander Fischer, Christoph Erdmann
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (first stage) of the long term correlation map .
US20030078770A1
CLAIM 1
. A method for determining speech activity in a signal segment of an audio signal , the result of whether speech activity is present in the observed signal segment depending both on the spectral and on the temporal stationarity of the signal segment and/or on preceding signal segments , wherein in a first stage (initial value) , the method assesses whether spectral stationarity is present in the observed signal segment ;
and in a second stage , it is assessed whether temporal stationarity is present in the observed signal segment , the final decision on the presence of speech activity in the observed signal segment being dependent on the output values of the two stages .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value (output values) with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US20030078770A1
CLAIM 1
. A method for determining speech activity in a signal segment of an audio signal , the result of whether speech activity is present in the observed signal segment depending both on the spectral and on the temporal stationarity of the signal segment and/or on preceding signal segments , wherein in a first stage , the method assesses whether spectral stationarity is present in the observed signal segment ;
and in a second stage , it is assessed whether temporal stationarity is present in the observed signal segment , the final decision on the presence of speech activity in the observed signal segment being dependent on the output values (correlation value) of the two stages .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision (linear prediction coefficient) obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
US20030078770A1
CLAIM 7
. The method as recited in claim 6 , wherein the decision on the stationarity is made on the basis of the previously determined linear prediction coefficient (binary decision) s of the current signal segment LPC NOW and of a previously determined measure for the voicedness of the observed signal segment .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency (time t) bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20030078770A1
CLAIM 13
. The method as recited in one of the preceding claims , wherein the second stage outputs “stationary” as the result for STAT 2 each time t (first frequency) hat STAT 1 is equal to “stationary” .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (first stage) of the long-term correlation map .
US20030078770A1
CLAIM 1
. A method for determining speech activity in a signal segment of an audio signal , the result of whether speech activity is present in the observed signal segment depending both on the spectral and on the temporal stationarity of the signal segment and/or on preceding signal segments , wherein in a first stage (initial value) , the method assesses whether spectral stationarity is present in the observed signal segment ;
and in a second stage , it is assessed whether temporal stationarity is present in the observed signal segment , the final decision on the presence of speech activity in the observed signal segment being dependent on the output values of the two stages .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (first stage) of the long-term correlation map .
US20030078770A1
CLAIM 1
. A method for determining speech activity in a signal segment of an audio signal , the result of whether speech activity is present in the observed signal segment depending both on the spectral and on the temporal stationarity of the signal segment and/or on preceding signal segments , wherein in a first stage (initial value) , the method assesses whether spectral stationarity is present in the observed signal segment ;
and in a second stage , it is assessed whether temporal stationarity is present in the observed signal segment , the final decision on the presence of speech activity in the observed signal segment being dependent on the output values of the two stages .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
CN1539137A

Filed: 2002-06-12     Issued: 2004-10-20

产生有色舒适噪声的方法和系统

(Original Assignee) 格鲁斯番 维拉塔公司; 格鲁斯番维拉塔公司     

戴维・王, 戴维·王, 拉德玛, 马修·拉德玛, 夫・S・纳雅克, 瓦苏德夫·S·纳雅克
US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (个检测器) indicative of sound activity in the sound signal .
CN1539137A
CLAIM 12
. 一种用于产生舒适噪声的系统,所述系统包括:一个标识符,用于在解码器端识别话音数据中的多个静音分组;一种自适应算法,用于适应于话音数据中的多个静音分组,其中所述自适应算法与时间相适应;一个检测器 (adaptive threshold) ,用于确定一个静音分段的起点;一个舒适噪声生成器,用于在静音分段起点通过自适应算法来产生舒适噪声。

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (背景噪声) ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
CN1539137A
CLAIM 2
. 权利要求1的方法,还包括以下步骤:在解码器端识别话音数据与静音分段之间的过渡余音,其中静音分段包括背景噪声 (background noise signal) 以及代表静音分段之前的一个等待时间的过渡余音。

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal (背景噪声) and prevent update of noise energy estimates on the music signal .
CN1539137A
CLAIM 2
. 权利要求1的方法,还包括以下步骤:在解码器端识别话音数据与静音分段之间的过渡余音,其中静音分段包括背景噪声 (background noise signal) 以及代表静音分段之前的一个等待时间的过渡余音。

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame energy and an average frame (装置适) energy .
CN1539137A
CLAIM 32
. 在一个具有经由网络而以通信方式耦合的第一和第二通信设备的通信系统中,一种用于产生舒适噪声的系统,所述系统包括:一个编码器,用于从第一通信设备接收通信信息,所述通信信息包括语音数据分段和非语音数据分段;一个检测器,用于在语音数据分段与非语音数据分段之间进行区分;所述编码器适配成对通信信息进行编码,并且具有用于传送经过编码的通信信息的装置;经过编码的通信信息包括代表语音数据的已编码语音数据;一个解码器,具有一个适配成接收传送装置所发送的经过编码的通信信息的输入;解码器适配成对经过编码的通信信息进行解码,并且产生包含了解码语音数据的解码通信信息;传送装置适 (average frame) 配成不向解码器传递大部分非语音数据,其中传递编码通信信息所需要的带宽要小于传递通信信息所需要的带宽;以及一个舒适噪声生成器,它与解码器相耦合并且适配成至少部分基于已解码语音数据和代表非语音数据特性的数据中的一个或二者来产生舒适噪声。

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (背景噪声) ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
CN1539137A
CLAIM 2
. 权利要求1的方法,还包括以下步骤:在解码器端识别话音数据与静音分段之间的过渡余音,其中静音分段包括背景噪声 (background noise signal) 以及代表静音分段之前的一个等待时间的过渡余音。

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal (背景噪声) ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
CN1539137A
CLAIM 2
. 权利要求1的方法,还包括以下步骤:在解码器端识别话音数据与静音分段之间的过渡余音,其中静音分段包括背景噪声 (background noise signal) 以及代表静音分段之前的一个等待时间的过渡余音。

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal (背景噪声) and preventing update of noise energy estimates .
CN1539137A
CLAIM 2
. 权利要求1的方法,还包括以下步骤:在解码器端识别话音数据与静音分段之间的过渡余音,其中静音分段包括背景噪声 (background noise signal) 以及代表静音分段之前的一个等待时间的过渡余音。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
CN1539138A

Filed: 2002-06-12     Issued: 2004-10-20

执行低复杂性频谱估计技术来产生舒适噪声的方法和系统

(Original Assignee) 格鲁斯番维拉塔公司     

瓦苏德夫・S・纳雅克, 瓦苏德夫·S·纳雅克
US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value (预测信号) with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
CN1539138A
CLAIM 23
. 权利要求22的系统,其中编码器还使用一个反向预测器而使输入噪声信号适合于预测信号 (correlation value) 频谱。

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (个检测器) indicative of sound activity in the sound signal .
CN1539138A
CLAIM 22
. 一种用于执行频谱估计以便产生舒适噪声的系统,所述系统包括: 一个接收机,用于接收输入噪声信号; 一个编码器,用于在某个时段上使用算法来对输入噪声信号频谱进行近似;以及 一个检测器 (adaptive threshold) ,用于检测是否不存在话音信号; 一个舒适噪声生成器,在检测到不存在话音信号的时候,基于频谱近似来产生舒适噪声; 其中在这个时段上,所述输入噪声信号实质是恒定的。

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection further comprises updating the noise estimates (的噪声) for a next frame .
CN1539138A
CLAIM 18
. 权利要求1的方法,其中近似估计步骤还包括如下步骤: 检测话音数据之间的噪声 (noise estimates) ; 自适应所述噪声;以及 在话音休止的时候,基于所述自适应步骤来创建静音插入描述符。

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction (预测编码) residual error energies .
CN1539138A
CLAIM 8
. 权利要求1的方法,其中所述算法是一种线性预测编码 (linear prediction) 算法。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
EP1265224A1

Filed: 2002-05-30     Issued: 2002-12-11

Method for converging a G.729 annex B compliant voice activity detection circuit

(Original Assignee) Telogy Networks Inc     (Current Assignee) Telogy Networks Inc

Dunling Li, Gokhan Sisli, Daniel Thomas
US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value (autocorrelation coefficients, said first value, initial values, second values) with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
EP1265224A1
CLAIM 2
The method of claim 1 , for initializing an ITU Recommendation G . 729 Annex B voice activity detection (VAD) , wherein : said step of extracting includes extracting a set of parameters characterizing said signal from a digital representation of said signal within a data frame , wherein said parameters are the autocorrelation coefficients (correlation value, term value, first energy value, second energy value, average frame, current frame energy, average frame energy) , which are derived in accordance with said Recommendation G . 729 ;
said energy measure is calculated by calculating a full-band frame energy by multiplying a value of ten times a base ten logarithm of a quotient obtained by dividing a first autocorrelation coefficient R(0) , of said autocorrelation coefficients , by a constant value of 240 ;
said comparison of said energy with said reference value includes comparing said full-band frame energy with a reference level ;
said counting step includes changing the value of a frame counter during said initialization only if said full-band frame energy equals or exceeds said reference level ;
and further including : updating initial values (correlation value, term value, first energy value, second energy value, average frame, current frame energy, average frame energy) for averages of the noise characteristics in accordance with said Recommendation G . 729 Annex B ;
and

EP1265224A1
CLAIM 3
A method of converging an ITU Recommendation G . 729 Annex B voice activity detection (VAD) device , comprising the steps of : determining a noise identification threshold value ;
comparing a number of energy measures of a signal to said noise threshold value ;
determining a first value representing an average of said number of energy measures , when said energy measure is less than said noise threshold , wherein only the energy measures of said number of energy measures having values less than said noise threshold value are used to determine said first value (correlation value, term value, first energy value, second energy value, average frame, current frame energy, average frame energy) ;
determining a second value representing an average of said number of energy measures ;
and substituting said first value for said second value when the divergence between said first and second values (correlation value, term value, first energy value, second energy value, average frame, current frame energy, average frame energy) increases with time .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character (noise character) parameter in order to distinguish a music signal from a background noise signal and prevent update (n times) of noise energy estimates on the music signal .
EP1265224A1
CLAIM 2
The method of claim 1 , for initializing an ITU Recommendation G . 729 Annex B voice activity detection (VAD) , wherein : said step of extracting includes extracting a set of parameters characterizing said signal from a digital representation of said signal within a data frame , wherein said parameters are the autocorrelation coefficients , which are derived in accordance with said Recommendation G . 729 ;
said energy measure is calculated by calculating a full-band frame energy by multiplying a value of ten times (prevent update) a base ten logarithm of a quotient obtained by dividing a first autocorrelation coefficient R(0) , of said autocorrelation coefficients , by a constant value of 240 ;
said comparison of said energy with said reference value includes comparing said full-band frame energy with a reference level ;
said counting step includes changing the value of a frame counter during said initialization only if said full-band frame energy equals or exceeds said reference level ;
and further including : updating initial values for averages of the noise character (noise character) istics in accordance with said Recommendation G . 729 Annex B ;
and

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame energy (autocorrelation coefficients, said first value, initial values, second values) and an average frame (autocorrelation coefficients, said first value, initial values, second values) energy .
EP1265224A1
CLAIM 2
The method of claim 1 , for initializing an ITU Recommendation G . 729 Annex B voice activity detection (VAD) , wherein : said step of extracting includes extracting a set of parameters characterizing said signal from a digital representation of said signal within a data frame , wherein said parameters are the autocorrelation coefficients (correlation value, term value, first energy value, second energy value, average frame, current frame energy, average frame energy) , which are derived in accordance with said Recommendation G . 729 ;
said energy measure is calculated by calculating a full-band frame energy by multiplying a value of ten times a base ten logarithm of a quotient obtained by dividing a first autocorrelation coefficient R(0) , of said autocorrelation coefficients , by a constant value of 240 ;
said comparison of said energy with said reference value includes comparing said full-band frame energy with a reference level ;
said counting step includes changing the value of a frame counter during said initialization only if said full-band frame energy equals or exceeds said reference level ;
and further including : updating initial values (correlation value, term value, first energy value, second energy value, average frame, current frame energy, average frame energy) for averages of the noise characteristics in accordance with said Recommendation G . 729 Annex B ;
and

EP1265224A1
CLAIM 3
A method of converging an ITU Recommendation G . 729 Annex B voice activity detection (VAD) device , comprising the steps of : determining a noise identification threshold value ;
comparing a number of energy measures of a signal to said noise threshold value ;
determining a first value representing an average of said number of energy measures , when said energy measure is less than said noise threshold , wherein only the energy measures of said number of energy measures having values less than said noise threshold value are used to determine said first value (correlation value, term value, first energy value, second energy value, average frame, current frame energy, average frame energy) ;
determining a second value representing an average of said number of energy measures ;
and substituting said first value for said second value when the divergence between said first and second values (correlation value, term value, first energy value, second energy value, average frame, current frame energy, average frame energy) increases with time .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character (noise character) parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value (autocorrelation coefficients, said first value, initial values, second values) for the first group of frequency bands and a second energy (digital representation) value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
EP1265224A1
CLAIM 2
The method of claim 1 , for initializing an ITU Recommendation G . 729 Annex B voice activity detection (VAD) , wherein : said step of extracting includes extracting a set of parameters characterizing said signal from a digital representation (second energy) of said signal within a data frame , wherein said parameters are the autocorrelation coefficients (correlation value, term value, first energy value, second energy value, average frame, current frame energy, average frame energy) , which are derived in accordance with said Recommendation G . 729 ;
said energy measure is calculated by calculating a full-band frame energy by multiplying a value of ten times a base ten logarithm of a quotient obtained by dividing a first autocorrelation coefficient R(0) , of said autocorrelation coefficients , by a constant value of 240 ;
said comparison of said energy with said reference value includes comparing said full-band frame energy with a reference level ;
said counting step includes changing the value of a frame counter during said initialization only if said full-band frame energy equals or exceeds said reference level ;
and further including : updating initial values (correlation value, term value, first energy value, second energy value, average frame, current frame energy, average frame energy) for averages of the noise character (noise character) istics in accordance with said Recommendation G . 729 Annex B ;
and

EP1265224A1
CLAIM 3
A method of converging an ITU Recommendation G . 729 Annex B voice activity detection (VAD) device , comprising the steps of : determining a noise identification threshold value ;
comparing a number of energy measures of a signal to said noise threshold value ;
determining a first value representing an average of said number of energy measures , when said energy measure is less than said noise threshold , wherein only the energy measures of said number of energy measures having values less than said noise threshold value are used to determine said first value (correlation value, term value, first energy value, second energy value, average frame, current frame energy, average frame energy) ;
determining a second value representing an average of said number of energy measures ;
and substituting said first value for said second value when the divergence between said first and second values (correlation value, term value, first energy value, second energy value, average frame, current frame energy, average frame energy) increases with time .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character (noise character) parameter inferior than a given fixed threshold .
EP1265224A1
CLAIM 2
The method of claim 1 , for initializing an ITU Recommendation G . 729 Annex B voice activity detection (VAD) , wherein : said step of extracting includes extracting a set of parameters characterizing said signal from a digital representation of said signal within a data frame , wherein said parameters are the autocorrelation coefficients , which are derived in accordance with said Recommendation G . 729 ;
said energy measure is calculated by calculating a full-band frame energy by multiplying a value of ten times a base ten logarithm of a quotient obtained by dividing a first autocorrelation coefficient R(0) , of said autocorrelation coefficients , by a constant value of 240 ;
said comparison of said energy with said reference value includes comparing said full-band frame energy with a reference level ;
said counting step includes changing the value of a frame counter during said initialization only if said full-band frame energy equals or exceeds said reference level ;
and further including : updating initial values for averages of the noise character (noise character) istics in accordance with said Recommendation G . 729 Annex B ;
and

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character (noise character) of the sound signal for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates .
EP1265224A1
CLAIM 2
The method of claim 1 , for initializing an ITU Recommendation G . 729 Annex B voice activity detection (VAD) , wherein : said step of extracting includes extracting a set of parameters characterizing said signal from a digital representation of said signal within a data frame , wherein said parameters are the autocorrelation coefficients , which are derived in accordance with said Recommendation G . 729 ;
said energy measure is calculated by calculating a full-band frame energy by multiplying a value of ten times a base ten logarithm of a quotient obtained by dividing a first autocorrelation coefficient R(0) , of said autocorrelation coefficients , by a constant value of 240 ;
said comparison of said energy with said reference value includes comparing said full-band frame energy with a reference level ;
said counting step includes changing the value of a frame counter during said initialization only if said full-band frame energy equals or exceeds said reference level ;
and further including : updating initial values for averages of the noise character (noise character) istics in accordance with said Recommendation G . 729 Annex B ;
and




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US7054808B2

Filed: 2002-04-29     Issued: 2006-05-30

Noise suppressing apparatus and noise suppressing method

(Original Assignee) Panasonic Corp     (Current Assignee) Panasonic Corp

Koji Yoshida
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (greater part) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (determining section) of the long term correlation map .
US7054808B2
CLAIM 1
. A noise suppression apparatus comprising : a conversion section that converts an input speech signal to a speech spectrum in frame units ;
a speech/non-speech determining section (initial value) that determines , on a per frame basis , whether or not the speech spectrum includes a speech component ;
a noise estimating section that estimates a noise spectrum based on the speech spectrum ;
an SNR calculating section that calculates a signal-to-noise ratio based on the speech spectrum and the noise spectrum ;
a suppression coefficient control section that : (i) updates a suppression lower limit coefficient using a first predetermined coefficient , when the speech spectrum includes a speech component and the signal-to-noise ratio is greater than a predetermined value , and (ii) for other cases , updates the suppression lower limit coefficient using a second predetermined coefficient , said second coefficient being greater than the first coefficient ;
and a suppressed speech spectrum calculating section that : (i) compares : (a) a subtraction spectrum , in which the noise spectrum is subtracted from the speech spectrum , and (b) a subtraction lower limit spectrum , in which the speech spectrum is multiplied by the suppression lower limit coefficient , and (ii) outputs a suppression speech spectrum formed with greater part (frequency spectrum, frequency bands, first frequency bands) s selected from the subtraction spectrum and the subtraction lower limit spectrum .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (greater part) of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US7054808B2
CLAIM 1
. A noise suppression apparatus comprising : a conversion section that converts an input speech signal to a speech spectrum in frame units ;
a speech/non-speech determining section that determines , on a per frame basis , whether or not the speech spectrum includes a speech component ;
a noise estimating section that estimates a noise spectrum based on the speech spectrum ;
an SNR calculating section that calculates a signal-to-noise ratio based on the speech spectrum and the noise spectrum ;
a suppression coefficient control section that : (i) updates a suppression lower limit coefficient using a first predetermined coefficient , when the speech spectrum includes a speech component and the signal-to-noise ratio is greater than a predetermined value , and (ii) for other cases , updates the suppression lower limit coefficient using a second predetermined coefficient , said second coefficient being greater than the first coefficient ;
and a suppressed speech spectrum calculating section that : (i) compares : (a) a subtraction spectrum , in which the noise spectrum is subtracted from the speech spectrum , and (b) a subtraction lower limit spectrum , in which the speech spectrum is multiplied by the suppression lower limit coefficient , and (ii) outputs a suppression speech spectrum formed with greater part (frequency spectrum, frequency bands, first frequency bands) s selected from the subtraction spectrum and the subtraction lower limit spectrum .

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (first coefficient) ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US7054808B2
CLAIM 1
. A noise suppression apparatus comprising : a conversion section that converts an input speech signal to a speech spectrum in frame units ;
a speech/non-speech determining section that determines , on a per frame basis , whether or not the speech spectrum includes a speech component ;
a noise estimating section that estimates a noise spectrum based on the speech spectrum ;
an SNR calculating section that calculates a signal-to-noise ratio based on the speech spectrum and the noise spectrum ;
a suppression coefficient control section that : (i) updates a suppression lower limit coefficient using a first predetermined coefficient , when the speech spectrum includes a speech component and the signal-to-noise ratio is greater than a predetermined value , and (ii) for other cases , updates the suppression lower limit coefficient using a second predetermined coefficient , said second coefficient being greater than the first coefficient (background noise signal) ;
and a suppressed speech spectrum calculating section that : (i) compares : (a) a subtraction spectrum , in which the noise spectrum is subtracted from the speech spectrum , and (b) a subtraction lower limit spectrum , in which the speech spectrum is multiplied by the suppression lower limit coefficient , and (ii) outputs a suppression speech spectrum formed with greater parts selected from the subtraction spectrum and the subtraction lower limit spectrum .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity in the sound signal further comprises using a signal-to-noise ratio (SNR)-based sound activity detection (conversion section) .
US7054808B2
CLAIM 1
. A noise suppression apparatus comprising : a conversion section (sound activity detection, sound signal prevents updating) that converts an input speech signal to a speech spectrum in frame units ;
a speech/non-speech determining section that determines , on a per frame basis , whether or not the speech spectrum includes a speech component ;
a noise estimating section that estimates a noise spectrum based on the speech spectrum ;
an SNR calculating section that calculates a signal-to-noise ratio based on the speech spectrum and the noise spectrum ;
a suppression coefficient control section that : (i) updates a suppression lower limit coefficient using a first predetermined coefficient , when the speech spectrum includes a speech component and the signal-to-noise ratio is greater than a predetermined value , and (ii) for other cases , updates the suppression lower limit coefficient using a second predetermined coefficient , said second coefficient being greater than the first coefficient ;
and a suppressed speech spectrum calculating section that : (i) compares : (a) a subtraction spectrum , in which the noise spectrum is subtracted from the speech spectrum , and (b) a subtraction lower limit spectrum , in which the speech spectrum is multiplied by the suppression lower limit coefficient , and (ii) outputs a suppression speech spectrum formed with greater parts selected from the subtraction spectrum and the subtraction lower limit spectrum .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection (conversion section) comprises detecting the sound signal based on a frequency dependent signal-to-noise ratio (SNR) .
US7054808B2
CLAIM 1
. A noise suppression apparatus comprising : a conversion section (sound activity detection, sound signal prevents updating) that converts an input speech signal to a speech spectrum in frame units ;
a speech/non-speech determining section that determines , on a per frame basis , whether or not the speech spectrum includes a speech component ;
a noise estimating section that estimates a noise spectrum based on the speech spectrum ;
an SNR calculating section that calculates a signal-to-noise ratio based on the speech spectrum and the noise spectrum ;
a suppression coefficient control section that : (i) updates a suppression lower limit coefficient using a first predetermined coefficient , when the speech spectrum includes a speech component and the signal-to-noise ratio is greater than a predetermined value , and (ii) for other cases , updates the suppression lower limit coefficient using a second predetermined coefficient , said second coefficient being greater than the first coefficient ;
and a suppressed speech spectrum calculating section that : (i) compares : (a) a subtraction spectrum , in which the noise spectrum is subtracted from the speech spectrum , and (b) a subtraction lower limit spectrum , in which the speech spectrum is multiplied by the suppression lower limit coefficient , and (ii) outputs a suppression speech spectrum formed with greater parts selected from the subtraction spectrum and the subtraction lower limit spectrum .

US8990073B2
CLAIM 14
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection (conversion section) comprises comparing an average signal-to-noise ratio (SNR av ) to a threshold calculated as a function of a long-term signal-to-noise ratio (SNR LT ) .
US7054808B2
CLAIM 1
. A noise suppression apparatus comprising : a conversion section (sound activity detection, sound signal prevents updating) that converts an input speech signal to a speech spectrum in frame units ;
a speech/non-speech determining section that determines , on a per frame basis , whether or not the speech spectrum includes a speech component ;
a noise estimating section that estimates a noise spectrum based on the speech spectrum ;
an SNR calculating section that calculates a signal-to-noise ratio based on the speech spectrum and the noise spectrum ;
a suppression coefficient control section that : (i) updates a suppression lower limit coefficient using a first predetermined coefficient , when the speech spectrum includes a speech component and the signal-to-noise ratio is greater than a predetermined value , and (ii) for other cases , updates the suppression lower limit coefficient using a second predetermined coefficient , said second coefficient being greater than the first coefficient ;
and a suppressed speech spectrum calculating section that : (i) compares : (a) a subtraction spectrum , in which the noise spectrum is subtracted from the speech spectrum , and (b) a subtraction lower limit spectrum , in which the speech spectrum is multiplied by the suppression lower limit coefficient , and (ii) outputs a suppression speech spectrum formed with greater parts selected from the subtraction spectrum and the subtraction lower limit spectrum .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection (conversion section) in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
US7054808B2
CLAIM 1
. A noise suppression apparatus comprising : a conversion section (sound activity detection, sound signal prevents updating) that converts an input speech signal to a speech spectrum in frame units ;
a speech/non-speech determining section that determines , on a per frame basis , whether or not the speech spectrum includes a speech component ;
a noise estimating section that estimates a noise spectrum based on the speech spectrum ;
an SNR calculating section that calculates a signal-to-noise ratio based on the speech spectrum and the noise spectrum ;
a suppression coefficient control section that : (i) updates a suppression lower limit coefficient using a first predetermined coefficient , when the speech spectrum includes a speech component and the signal-to-noise ratio is greater than a predetermined value , and (ii) for other cases , updates the suppression lower limit coefficient using a second predetermined coefficient , said second coefficient being greater than the first coefficient ;
and a suppressed speech spectrum calculating section that : (i) compares : (a) a subtraction spectrum , in which the noise spectrum is subtracted from the speech spectrum , and (b) a subtraction lower limit spectrum , in which the speech spectrum is multiplied by the suppression lower limit coefficient , and (ii) outputs a suppression speech spectrum formed with greater parts selected from the subtraction spectrum and the subtraction lower limit spectrum .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection (conversion section) further comprises updating the noise estimates for a next frame .
US7054808B2
CLAIM 1
. A noise suppression apparatus comprising : a conversion section (sound activity detection, sound signal prevents updating) that converts an input speech signal to a speech spectrum in frame units ;
a speech/non-speech determining section that determines , on a per frame basis , whether or not the speech spectrum includes a speech component ;
a noise estimating section that estimates a noise spectrum based on the speech spectrum ;
an SNR calculating section that calculates a signal-to-noise ratio based on the speech spectrum and the noise spectrum ;
a suppression coefficient control section that : (i) updates a suppression lower limit coefficient using a first predetermined coefficient , when the speech spectrum includes a speech component and the signal-to-noise ratio is greater than a predetermined value , and (ii) for other cases , updates the suppression lower limit coefficient using a second predetermined coefficient , said second coefficient being greater than the first coefficient ;
and a suppressed speech spectrum calculating section that : (i) compares : (a) a subtraction spectrum , in which the noise spectrum is subtracted from the speech spectrum , and (b) a subtraction lower limit spectrum , in which the speech spectrum is multiplied by the suppression lower limit coefficient , and (ii) outputs a suppression speech spectrum formed with greater parts selected from the subtraction spectrum and the subtraction lower limit spectrum .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal prevents updating (conversion section) of noise energy estimates when a music signal is detected .
US7054808B2
CLAIM 1
. A noise suppression apparatus comprising : a conversion section (sound activity detection, sound signal prevents updating) that converts an input speech signal to a speech spectrum in frame units ;
a speech/non-speech determining section that determines , on a per frame basis , whether or not the speech spectrum includes a speech component ;
a noise estimating section that estimates a noise spectrum based on the speech spectrum ;
an SNR calculating section that calculates a signal-to-noise ratio based on the speech spectrum and the noise spectrum ;
a suppression coefficient control section that : (i) updates a suppression lower limit coefficient using a first predetermined coefficient , when the speech spectrum includes a speech component and the signal-to-noise ratio is greater than a predetermined value , and (ii) for other cases , updates the suppression lower limit coefficient using a second predetermined coefficient , said second coefficient being greater than the first coefficient ;
and a suppressed speech spectrum calculating section that : (i) compares : (a) a subtraction spectrum , in which the noise spectrum is subtracted from the speech spectrum , and (b) a subtraction lower limit spectrum , in which the speech spectrum is multiplied by the suppression lower limit coefficient , and (ii) outputs a suppression speech spectrum formed with greater parts selected from the subtraction spectrum and the subtraction lower limit spectrum .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (speech signal) in order to distinguish a music signal from a background noise signal (first coefficient) and prevent update of noise energy estimates on the music signal .
US7054808B2
CLAIM 1
. A noise suppression apparatus comprising : a conversion section that converts an input speech signal (noise character parameter, activity prediction parameter) to a speech spectrum in frame units ;
a speech/non-speech determining section that determines , on a per frame basis , whether or not the speech spectrum includes a speech component ;
a noise estimating section that estimates a noise spectrum based on the speech spectrum ;
an SNR calculating section that calculates a signal-to-noise ratio based on the speech spectrum and the noise spectrum ;
a suppression coefficient control section that : (i) updates a suppression lower limit coefficient using a first predetermined coefficient , when the speech spectrum includes a speech component and the signal-to-noise ratio is greater than a predetermined value , and (ii) for other cases , updates the suppression lower limit coefficient using a second predetermined coefficient , said second coefficient being greater than the first coefficient (background noise signal) ;
and a suppressed speech spectrum calculating section that : (i) compares : (a) a subtraction spectrum , in which the noise spectrum is subtracted from the speech spectrum , and (b) a subtraction lower limit spectrum , in which the speech spectrum is multiplied by the suppression lower limit coefficient , and (ii) outputs a suppression speech spectrum formed with greater parts selected from the subtraction spectrum and the subtraction lower limit spectrum .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame and an energy of the sound signal in a previous frame , for frequency bands (greater part) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US7054808B2
CLAIM 1
. A noise suppression apparatus comprising : a conversion section that converts an input speech signal to a speech spectrum in frame units ;
a speech/non-speech determining section that determines , on a per frame basis , whether or not the speech spectrum includes a speech component ;
a noise estimating section that estimates a noise spectrum based on the speech spectrum ;
an SNR calculating section that calculates a signal-to-noise ratio based on the speech spectrum and the noise spectrum ;
a suppression coefficient control section that : (i) updates a suppression lower limit coefficient using a first predetermined coefficient , when the speech spectrum includes a speech component and the signal-to-noise ratio is greater than a predetermined value , and (ii) for other cases , updates the suppression lower limit coefficient using a second predetermined coefficient , said second coefficient being greater than the first coefficient ;
and a suppressed speech spectrum calculating section that : (i) compares : (a) a subtraction spectrum , in which the noise spectrum is subtracted from the speech spectrum , and (b) a subtraction lower limit spectrum , in which the speech spectrum is multiplied by the suppression lower limit coefficient , and (ii) outputs a suppression speech spectrum formed with greater part (frequency spectrum, frequency bands, first frequency bands) s selected from the subtraction spectrum and the subtraction lower limit spectrum .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (speech signal) indicative of an activity of the sound signal .
US7054808B2
CLAIM 1
. A noise suppression apparatus comprising : a conversion section that converts an input speech signal (noise character parameter, activity prediction parameter) to a speech spectrum in frame units ;
a speech/non-speech determining section that determines , on a per frame basis , whether or not the speech spectrum includes a speech component ;
a noise estimating section that estimates a noise spectrum based on the speech spectrum ;
an SNR calculating section that calculates a signal-to-noise ratio based on the speech spectrum and the noise spectrum ;
a suppression coefficient control section that : (i) updates a suppression lower limit coefficient using a first predetermined coefficient , when the speech spectrum includes a speech component and the signal-to-noise ratio is greater than a predetermined value , and (ii) for other cases , updates the suppression lower limit coefficient using a second predetermined coefficient , said second coefficient being greater than the first coefficient ;
and a suppressed speech spectrum calculating section that : (i) compares : (a) a subtraction spectrum , in which the noise spectrum is subtracted from the speech spectrum , and (b) a subtraction lower limit spectrum , in which the speech spectrum is multiplied by the suppression lower limit coefficient , and (ii) outputs a suppression speech spectrum formed with greater parts selected from the subtraction spectrum and the subtraction lower limit spectrum .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (speech signal) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
US7054808B2
CLAIM 1
. A noise suppression apparatus comprising : a conversion section that converts an input speech signal (noise character parameter, activity prediction parameter) to a speech spectrum in frame units ;
a speech/non-speech determining section that determines , on a per frame basis , whether or not the speech spectrum includes a speech component ;
a noise estimating section that estimates a noise spectrum based on the speech spectrum ;
an SNR calculating section that calculates a signal-to-noise ratio based on the speech spectrum and the noise spectrum ;
a suppression coefficient control section that : (i) updates a suppression lower limit coefficient using a first predetermined coefficient , when the speech spectrum includes a speech component and the signal-to-noise ratio is greater than a predetermined value , and (ii) for other cases , updates the suppression lower limit coefficient using a second predetermined coefficient , said second coefficient being greater than the first coefficient ;
and a suppressed speech spectrum calculating section that : (i) compares : (a) a subtraction spectrum , in which the noise spectrum is subtracted from the speech spectrum , and (b) a subtraction lower limit spectrum , in which the speech spectrum is multiplied by the suppression lower limit coefficient , and (ii) outputs a suppression speech spectrum formed with greater parts selected from the subtraction spectrum and the subtraction lower limit spectrum .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (speech signal) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US7054808B2
CLAIM 1
. A noise suppression apparatus comprising : a conversion section that converts an input speech signal (noise character parameter, activity prediction parameter) to a speech spectrum in frame units ;
a speech/non-speech determining section that determines , on a per frame basis , whether or not the speech spectrum includes a speech component ;
a noise estimating section that estimates a noise spectrum based on the speech spectrum ;
an SNR calculating section that calculates a signal-to-noise ratio based on the speech spectrum and the noise spectrum ;
a suppression coefficient control section that : (i) updates a suppression lower limit coefficient using a first predetermined coefficient , when the speech spectrum includes a speech component and the signal-to-noise ratio is greater than a predetermined value , and (ii) for other cases , updates the suppression lower limit coefficient using a second predetermined coefficient , said second coefficient being greater than the first coefficient ;
and a suppressed speech spectrum calculating section that : (i) compares : (a) a subtraction spectrum , in which the noise spectrum is subtracted from the speech spectrum , and (b) a subtraction lower limit spectrum , in which the speech spectrum is multiplied by the suppression lower limit coefficient , and (ii) outputs a suppression speech spectrum formed with greater parts selected from the subtraction spectrum and the subtraction lower limit spectrum .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (speech signal) comprises : dividing a plurality of frequency bands (greater part) into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US7054808B2
CLAIM 1
. A noise suppression apparatus comprising : a conversion section that converts an input speech signal (noise character parameter, activity prediction parameter) to a speech spectrum in frame units ;
a speech/non-speech determining section that determines , on a per frame basis , whether or not the speech spectrum includes a speech component ;
a noise estimating section that estimates a noise spectrum based on the speech spectrum ;
an SNR calculating section that calculates a signal-to-noise ratio based on the speech spectrum and the noise spectrum ;
a suppression coefficient control section that : (i) updates a suppression lower limit coefficient using a first predetermined coefficient , when the speech spectrum includes a speech component and the signal-to-noise ratio is greater than a predetermined value , and (ii) for other cases , updates the suppression lower limit coefficient using a second predetermined coefficient , said second coefficient being greater than the first coefficient ;
and a suppressed speech spectrum calculating section that : (i) compares : (a) a subtraction spectrum , in which the noise spectrum is subtracted from the speech spectrum , and (b) a subtraction lower limit spectrum , in which the speech spectrum is multiplied by the suppression lower limit coefficient , and (ii) outputs a suppression speech spectrum formed with greater part (frequency spectrum, frequency bands, first frequency bands) s selected from the subtraction spectrum and the subtraction lower limit spectrum .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (speech signal) inferior than a given fixed threshold .
US7054808B2
CLAIM 1
. A noise suppression apparatus comprising : a conversion section that converts an input speech signal (noise character parameter, activity prediction parameter) to a speech spectrum in frame units ;
a speech/non-speech determining section that determines , on a per frame basis , whether or not the speech spectrum includes a speech component ;
a noise estimating section that estimates a noise spectrum based on the speech spectrum ;
an SNR calculating section that calculates a signal-to-noise ratio based on the speech spectrum and the noise spectrum ;
a suppression coefficient control section that : (i) updates a suppression lower limit coefficient using a first predetermined coefficient , when the speech spectrum includes a speech component and the signal-to-noise ratio is greater than a predetermined value , and (ii) for other cases , updates the suppression lower limit coefficient using a second predetermined coefficient , said second coefficient being greater than the first coefficient ;
and a suppressed speech spectrum calculating section that : (i) compares : (a) a subtraction spectrum , in which the noise spectrum is subtracted from the speech spectrum , and (b) a subtraction lower limit spectrum , in which the speech spectrum is multiplied by the suppression lower limit coefficient , and (ii) outputs a suppression speech spectrum formed with greater parts selected from the subtraction spectrum and the subtraction lower limit spectrum .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (greater part) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (determining section) of the long-term correlation map .
US7054808B2
CLAIM 1
. A noise suppression apparatus comprising : a conversion section that converts an input speech signal to a speech spectrum in frame units ;
a speech/non-speech determining section (initial value) that determines , on a per frame basis , whether or not the speech spectrum includes a speech component ;
a noise estimating section that estimates a noise spectrum based on the speech spectrum ;
an SNR calculating section that calculates a signal-to-noise ratio based on the speech spectrum and the noise spectrum ;
a suppression coefficient control section that : (i) updates a suppression lower limit coefficient using a first predetermined coefficient , when the speech spectrum includes a speech component and the signal-to-noise ratio is greater than a predetermined value , and (ii) for other cases , updates the suppression lower limit coefficient using a second predetermined coefficient , said second coefficient being greater than the first coefficient ;
and a suppressed speech spectrum calculating section that : (i) compares : (a) a subtraction spectrum , in which the noise spectrum is subtracted from the speech spectrum , and (b) a subtraction lower limit spectrum , in which the speech spectrum is multiplied by the suppression lower limit coefficient , and (ii) outputs a suppression speech spectrum formed with greater part (frequency spectrum, frequency bands, first frequency bands) s selected from the subtraction spectrum and the subtraction lower limit spectrum .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (greater part) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (determining section) of the long-term correlation map .
US7054808B2
CLAIM 1
. A noise suppression apparatus comprising : a conversion section that converts an input speech signal to a speech spectrum in frame units ;
a speech/non-speech determining section (initial value) that determines , on a per frame basis , whether or not the speech spectrum includes a speech component ;
a noise estimating section that estimates a noise spectrum based on the speech spectrum ;
an SNR calculating section that calculates a signal-to-noise ratio based on the speech spectrum and the noise spectrum ;
a suppression coefficient control section that : (i) updates a suppression lower limit coefficient using a first predetermined coefficient , when the speech spectrum includes a speech component and the signal-to-noise ratio is greater than a predetermined value , and (ii) for other cases , updates the suppression lower limit coefficient using a second predetermined coefficient , said second coefficient being greater than the first coefficient ;
and a suppressed speech spectrum calculating section that : (i) compares : (a) a subtraction spectrum , in which the noise spectrum is subtracted from the speech spectrum , and (b) a subtraction lower limit spectrum , in which the speech spectrum is multiplied by the suppression lower limit coefficient , and (ii) outputs a suppression speech spectrum formed with greater part (frequency spectrum, frequency bands, first frequency bands) s selected from the subtraction spectrum and the subtraction lower limit spectrum .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (greater part) of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US7054808B2
CLAIM 1
. A noise suppression apparatus comprising : a conversion section that converts an input speech signal to a speech spectrum in frame units ;
a speech/non-speech determining section that determines , on a per frame basis , whether or not the speech spectrum includes a speech component ;
a noise estimating section that estimates a noise spectrum based on the speech spectrum ;
an SNR calculating section that calculates a signal-to-noise ratio based on the speech spectrum and the noise spectrum ;
a suppression coefficient control section that : (i) updates a suppression lower limit coefficient using a first predetermined coefficient , when the speech spectrum includes a speech component and the signal-to-noise ratio is greater than a predetermined value , and (ii) for other cases , updates the suppression lower limit coefficient using a second predetermined coefficient , said second coefficient being greater than the first coefficient ;
and a suppressed speech spectrum calculating section that : (i) compares : (a) a subtraction spectrum , in which the noise spectrum is subtracted from the speech spectrum , and (b) a subtraction lower limit spectrum , in which the speech spectrum is multiplied by the suppression lower limit coefficient , and (ii) outputs a suppression speech spectrum formed with greater part (frequency spectrum, frequency bands, first frequency bands) s selected from the subtraction spectrum and the subtraction lower limit spectrum .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (first coefficient) ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US7054808B2
CLAIM 1
. A noise suppression apparatus comprising : a conversion section that converts an input speech signal to a speech spectrum in frame units ;
a speech/non-speech determining section that determines , on a per frame basis , whether or not the speech spectrum includes a speech component ;
a noise estimating section that estimates a noise spectrum based on the speech spectrum ;
an SNR calculating section that calculates a signal-to-noise ratio based on the speech spectrum and the noise spectrum ;
a suppression coefficient control section that : (i) updates a suppression lower limit coefficient using a first predetermined coefficient , when the speech spectrum includes a speech component and the signal-to-noise ratio is greater than a predetermined value , and (ii) for other cases , updates the suppression lower limit coefficient using a second predetermined coefficient , said second coefficient being greater than the first coefficient (background noise signal) ;
and a suppressed speech spectrum calculating section that : (i) compares : (a) a subtraction spectrum , in which the noise spectrum is subtracted from the speech spectrum , and (b) a subtraction lower limit spectrum , in which the speech spectrum is multiplied by the suppression lower limit coefficient , and (ii) outputs a suppression speech spectrum formed with greater parts selected from the subtraction spectrum and the subtraction lower limit spectrum .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal (first coefficient) ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US7054808B2
CLAIM 1
. A noise suppression apparatus comprising : a conversion section that converts an input speech signal to a speech spectrum in frame units ;
a speech/non-speech determining section that determines , on a per frame basis , whether or not the speech spectrum includes a speech component ;
a noise estimating section that estimates a noise spectrum based on the speech spectrum ;
an SNR calculating section that calculates a signal-to-noise ratio based on the speech spectrum and the noise spectrum ;
a suppression coefficient control section that : (i) updates a suppression lower limit coefficient using a first predetermined coefficient , when the speech spectrum includes a speech component and the signal-to-noise ratio is greater than a predetermined value , and (ii) for other cases , updates the suppression lower limit coefficient using a second predetermined coefficient , said second coefficient being greater than the first coefficient (background noise signal) ;
and a suppressed speech spectrum calculating section that : (i) compares : (a) a subtraction spectrum , in which the noise spectrum is subtracted from the speech spectrum , and (b) a subtraction lower limit spectrum , in which the speech spectrum is multiplied by the suppression lower limit coefficient , and (ii) outputs a suppression speech spectrum formed with greater parts selected from the subtraction spectrum and the subtraction lower limit spectrum .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal (first coefficient) and preventing update of noise energy estimates .
US7054808B2
CLAIM 1
. A noise suppression apparatus comprising : a conversion section that converts an input speech signal to a speech spectrum in frame units ;
a speech/non-speech determining section that determines , on a per frame basis , whether or not the speech spectrum includes a speech component ;
a noise estimating section that estimates a noise spectrum based on the speech spectrum ;
an SNR calculating section that calculates a signal-to-noise ratio based on the speech spectrum and the noise spectrum ;
a suppression coefficient control section that : (i) updates a suppression lower limit coefficient using a first predetermined coefficient , when the speech spectrum includes a speech component and the signal-to-noise ratio is greater than a predetermined value , and (ii) for other cases , updates the suppression lower limit coefficient using a second predetermined coefficient , said second coefficient being greater than the first coefficient (background noise signal) ;
and a suppressed speech spectrum calculating section that : (i) compares : (a) a subtraction spectrum , in which the noise spectrum is subtracted from the speech spectrum , and (b) a subtraction lower limit spectrum , in which the speech spectrum is multiplied by the suppression lower limit coefficient , and (ii) outputs a suppression speech spectrum formed with greater parts selected from the subtraction spectrum and the subtraction lower limit spectrum .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US7065486B1

Filed: 2002-04-11     Issued: 2006-06-20

Linear prediction based noise suppression

(Original Assignee) Mindspeed Technologies LLC     (Current Assignee) MACOM Technology Solutions Holdings Inc ; WIAV Solutions LLC

Jes Thyssen
US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (said time) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
US7065486B1
CLAIM 1
. A time-domain noise suppression method for suppressing a noise signal in a speech signal , said time (noise character parameter) -domain noise suppression method comprising : estimating a plurality of linear prediction coefficients for said speech signal ;
generating a prediction error estimate based on said plurality of prediction coefficients ;
generating an estimate of said speech signal based on said plurality of linear prediction coefficients ;
using a voice activity detector to determine a voice activity in said speech signal ;
updating a plurality of noise parameters based on said prediction error estimate if said voice activity detector determines no voice activity in said speech signal ;
generating an estimate of said noise signal based on said plurality of noise parameters ;
and passing said speech signal through a filter derived from said estimate of said noise signal and said estimate of said speech signal to generate a clean speech signal estimate ;
wherein said plurality of linear prediction coefficients are associated with a short-term linear predictor indicative of a spectral envelope of said speech signal and a long-term linear predictor indicative of a pitch periodicity of said speech signal , and wherein said plurality of noise parameters include a spectral estimate of said noise signal and a residual energy of said noise signal .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision (linear prediction coefficient) obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
US7065486B1
CLAIM 1
. A time-domain noise suppression method for suppressing a noise signal in a speech signal , said time-domain noise suppression method comprising : estimating a plurality of linear prediction coefficient (binary decision) s for said speech signal ;
generating a prediction error estimate based on said plurality of prediction coefficients ;
generating an estimate of said speech signal based on said plurality of linear prediction coefficients ;
using a voice activity detector to determine a voice activity in said speech signal ;
updating a plurality of noise parameters based on said prediction error estimate if said voice activity detector determines no voice activity in said speech signal ;
generating an estimate of said noise signal based on said plurality of noise parameters ;
and passing said speech signal through a filter derived from said estimate of said noise signal and said estimate of said speech signal to generate a clean speech signal estimate ;
wherein said plurality of linear prediction coefficients are associated with a short-term linear predictor indicative of a spectral envelope of said speech signal and a long-term linear predictor indicative of a pitch periodicity of said speech signal , and wherein said plurality of noise parameters include a spectral estimate of said noise signal and a residual energy of said noise signal .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (said time) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US7065486B1
CLAIM 1
. A time-domain noise suppression method for suppressing a noise signal in a speech signal , said time (noise character parameter) -domain noise suppression method comprising : estimating a plurality of linear prediction coefficients for said speech signal ;
generating a prediction error estimate based on said plurality of prediction coefficients ;
generating an estimate of said speech signal based on said plurality of linear prediction coefficients ;
using a voice activity detector to determine a voice activity in said speech signal ;
updating a plurality of noise parameters based on said prediction error estimate if said voice activity detector determines no voice activity in said speech signal ;
generating an estimate of said noise signal based on said plurality of noise parameters ;
and passing said speech signal through a filter derived from said estimate of said noise signal and said estimate of said speech signal to generate a clean speech signal estimate ;
wherein said plurality of linear prediction coefficients are associated with a short-term linear predictor indicative of a spectral envelope of said speech signal and a long-term linear predictor indicative of a pitch periodicity of said speech signal , and wherein said plurality of noise parameters include a spectral estimate of said noise signal and a residual energy of said noise signal .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (said time) inferior than a given fixed threshold .
US7065486B1
CLAIM 1
. A time-domain noise suppression method for suppressing a noise signal in a speech signal , said time (noise character parameter) -domain noise suppression method comprising : estimating a plurality of linear prediction coefficients for said speech signal ;
generating a prediction error estimate based on said plurality of prediction coefficients ;
generating an estimate of said speech signal based on said plurality of linear prediction coefficients ;
using a voice activity detector to determine a voice activity in said speech signal ;
updating a plurality of noise parameters based on said prediction error estimate if said voice activity detector determines no voice activity in said speech signal ;
generating an estimate of said noise signal based on said plurality of noise parameters ;
and passing said speech signal through a filter derived from said estimate of said noise signal and said estimate of said speech signal to generate a clean speech signal estimate ;
wherein said plurality of linear prediction coefficients are associated with a short-term linear predictor indicative of a spectral envelope of said speech signal and a long-term linear predictor indicative of a pitch periodicity of said speech signal , and wherein said plurality of noise parameters include a spectral estimate of said noise signal and a residual energy of said noise signal .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20030144840A1

Filed: 2002-01-30     Issued: 2003-07-31

Method and apparatus for speech detection using time-frequency variance

(Original Assignee) Motorola Solutions Inc     (Current Assignee) Google Technology Holdings LLC

Changxue Ma, Mark Randolph
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (given frequency) , and an initial value of the long term correlation map .
US20030144840A1
CLAIM 5
. A method according to claim 2 , wherein step (a) of calculating a plurality of power samples of speech comprises X ij = ∑ k  s ijk 2 wherein i is the frame index ;
wherein j is a frequency sub-band index ;
wherein k is the sample index within a frame ;
and wherein S ijk is the speech samples for a given frame index i , a given frequency (current frame) sub-band j and a given sample index k .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame (given frequency) ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US20030144840A1
CLAIM 5
. A method according to claim 2 , wherein step (a) of calculating a plurality of power samples of speech comprises X ij = ∑ k  s ijk 2 wherein i is the frame index ;
wherein j is a frequency sub-band index ;
wherein k is the sample index within a frame ;
and wherein S ijk is the speech samples for a given frame index i , a given frequency (current frame) sub-band j and a given sample index k .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal and prevent update (time sample) of noise energy estimates on the music signal .
US20030144840A1
CLAIM 6
. A method according to claim 2 , wherein step (b) of calculating a variance of the plurality of power measurements comprises VAR = ∑ X ij 2 n - (∑ X ij n) 2 wherein i is a frame index ;
wherein j is a frequency sub-band index ;
wherein X ij is the power measurement for a given time sample (prevent update) index i and a given frequency sub-band j .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame (given frequency) energy and an average frame energy .
US20030144840A1
CLAIM 5
. A method according to claim 2 , wherein step (a) of calculating a plurality of power samples of speech comprises X ij = ∑ k  s ijk 2 wherein i is the frame index ;
wherein j is a frequency sub-band index ;
wherein k is the sample index within a frame ;
and wherein S ijk is the speech samples for a given frame index i , a given frequency (current frame) sub-band j and a given sample index k .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame (given frequency) and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US20030144840A1
CLAIM 5
. A method according to claim 2 , wherein step (a) of calculating a plurality of power samples of speech comprises X ij = ∑ k  s ijk 2 wherein i is the frame index ;
wherein j is a frequency sub-band index ;
wherein k is the sample index within a frame ;
and wherein S ijk is the speech samples for a given frame index i , a given frequency (current frame) sub-band j and a given sample index k .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (given frequency) , and an initial value of the long-term correlation map .
US20030144840A1
CLAIM 5
. A method according to claim 2 , wherein step (a) of calculating a plurality of power samples of speech comprises X ij = ∑ k  s ijk 2 wherein i is the frame index ;
wherein j is a frequency sub-band index ;
wherein k is the sample index within a frame ;
and wherein S ijk is the speech samples for a given frame index i , a given frequency (current frame) sub-band j and a given sample index k .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (given frequency) , and an initial value of the long-term correlation map .
US20030144840A1
CLAIM 5
. A method according to claim 2 , wherein step (a) of calculating a plurality of power samples of speech comprises X ij = ∑ k  s ijk 2 wherein i is the frame index ;
wherein j is a frequency sub-band index ;
wherein k is the sample index within a frame ;
and wherein S ijk is the speech samples for a given frame index i , a given frequency (current frame) sub-band j and a given sample index k .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame (given frequency) ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US20030144840A1
CLAIM 5
. A method according to claim 2 , wherein step (a) of calculating a plurality of power samples of speech comprises X ij = ∑ k  s ijk 2 wherein i is the frame index ;
wherein j is a frequency sub-band index ;
wherein k is the sample index within a frame ;
and wherein S ijk is the speech samples for a given frame index i , a given frequency (current frame) sub-band j and a given sample index k .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal (bandpass filter) to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US20030144840A1
CLAIM 1
. A speech presence detection apparatus , comprising : a plurality of bandpass filter (average signal) s for splitting speech into a bank of sub-bands ;
a plurality of shift registers each connected to and associated with one of the bandpass filters for storing the speech of a corresponding sub-band in register elements ;
a power determining circuit for determining individual power measurements of the speech stored in each register element ;
a variance combining circuit for combining the individual power measurements to provide a variance for the individual registers ;
and a comparitor circuit for comparing the variance with a threshold to indicate whether speech is detected .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US7065485B1

Filed: 2002-01-09     Issued: 2006-06-20

Enhancing speech intelligibility using variable-rate time-scale modification

(Original Assignee) AT&T Corp     (Current Assignee) Nuance Communications Inc

Nicola R. Chong-White, Richard Vandervoort Cox
US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (differential pulse) indicative of sound activity in the sound signal .
US7065485B1
CLAIM 18
. The method of claim 17 , wherein the speech coder is selected from the group consisting of a code excited linear predication (CELP) coder , a vector sum excitation prediction (VSELP) coder , a waveform interpolation (WI) coder , a multiband excitation (MBE) coder , an improved multiband excitation (IMBE) coder , a mixed excitation linear prediction (MELP) coder , a linear prediction coding (LPC) coder , a pulse code modulation (PCM) coder , a differential pulse (adaptive threshold) code modulation (DPCM) coder , and an adaptive differential pulse code modulation (ADPCM) coder .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction (linear prediction) residual error energies .
US7065485B1
CLAIM 18
. The method of claim 17 , wherein the speech coder is selected from the group consisting of a code excited linear predication (CELP) coder , a vector sum excitation prediction (VSELP) coder , a waveform interpolation (WI) coder , a multiband excitation (MBE) coder , an improved multiband excitation (IMBE) coder , a mixed excitation linear prediction (linear prediction, residual error) (MELP) coder , a linear prediction coding (LPC) coder , a pulse code modulation (PCM) coder , a differential pulse code modulation (DPCM) coder , and an adaptive differential pulse code modulation (ADPCM) coder .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal to noise ratio (second value) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US7065485B1
CLAIM 1
. A method for enhancing speech intelligibility of a speech signal , comprising : performing syllable segmentation on a frame of the speech signal in order to detect a syllable ;
dynamically determining a scaling factor for a segment of speech in response to performing syllable segmentation on a frame of the speech signal in order to detect a syllable , wherein the segment is contained in the frame ;
applying the scaling factor to the segment in order to modify a time scaling to the segment ;
and blending the segment with an overlapping segment in order to essentially retain a frequency attribute of the speech signal that is processed , wherein : the syllable is a time-scale modification syllable (TSMS) comprising a consonant-vowel transition and a steady-state vowel , and dynamically determining a scaling factor for a segment of speech comprises : setting the scaling factor to a first value , wherein time expansion occurs during the consonant-vowel transition ;
and setting the scaling factor to a second value (noise ratio) , wherein time compression occurs during the steady-state vowel .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
JP2003195881A

Filed: 2001-12-28     Issued: 2003-07-09

周波数変換ブロック長適応変換装置及びプログラム

(Original Assignee) Victor Co Of Japan Ltd; 日本ビクター株式会社     

Takao Yamabe, 孝朗 山辺
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum (周波数スペクトル) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
JP2003195881A
CLAIM 1
【請求項1】 オーディオ変換符号化における周波数変 換ブロックのブロック長を適応的に切り替える周波数変 換ブロック長適応変換装置であって、 入力オーディオ信号の所定のサンプル数の解析ブロック のブロック幅の1/2ずつ時間シフトした複数の解析ブ ロックのそれぞれについて周波数スペクトル (current residual spectrum) の時間的な 変化量を取得する変化量取得手段と、 前記変化量が最大である周波数スペクトルを中心とする 所定の周波数適合範囲において、前記変化量取得手段に より取得された前記周波数スペクトルの時間的な変化量 と、予め設定したしきい値とを比較する比較手段と、 前記比較手段により前記変化量が前記しきい値を超えた 比較結果が得られた回数の合計が、所定の設定値を越え たか否かを検出し、その検出結果によって前記ブロック 長を決定するブロック変換幅決定手段とを有することを 特徴とする周波数変換ブロック長適応変換装置。

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum (周波数スペクトル) comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
JP2003195881A
CLAIM 1
【請求項1】 オーディオ変換符号化における周波数変 換ブロックのブロック長を適応的に切り替える周波数変 換ブロック長適応変換装置であって、 入力オーディオ信号の所定のサンプル数の解析ブロック のブロック幅の1/2ずつ時間シフトした複数の解析ブ ロックのそれぞれについて周波数スペクトル (current residual spectrum) の時間的な 変化量を取得する変化量取得手段と、 前記変化量が最大である周波数スペクトルを中心とする 所定の周波数適合範囲において、前記変化量取得手段に より取得された前記周波数スペクトルの時間的な変化量 と、予め設定したしきい値とを比較する比較手段と、 前記比較手段により前記変化量が前記しきい値を超えた 比較結果が得られた回数の合計が、所定の設定値を越え たか否かを検出し、その検出結果によって前記ブロック 長を決定するブロック変換幅決定手段とを有することを 特徴とする周波数変換ブロック長適応変換装置。

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum (周波数スペクトル) comprises locating a maximum between each pair of two consecutive minima of the current residual spectrum .
JP2003195881A
CLAIM 1
【請求項1】 オーディオ変換符号化における周波数変 換ブロックのブロック長を適応的に切り替える周波数変 換ブロック長適応変換装置であって、 入力オーディオ信号の所定のサンプル数の解析ブロック のブロック幅の1/2ずつ時間シフトした複数の解析ブ ロックのそれぞれについて周波数スペクトル (current residual spectrum) の時間的な 変化量を取得する変化量取得手段と、 前記変化量が最大である周波数スペクトルを中心とする 所定の周波数適合範囲において、前記変化量取得手段に より取得された前記周波数スペクトルの時間的な変化量 と、予め設定したしきい値とを比較する比較手段と、 前記比較手段により前記変化量が前記しきい値を超えた 比較結果が得られた回数の合計が、所定の設定値を越え たか否かを検出し、その検出結果によって前記ブロック 長を決定するブロック変換幅決定手段とを有することを 特徴とする周波数変換ブロック長適応変換装置。

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum (周波数スペクトル) , calculating a normalized correlation value with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
JP2003195881A
CLAIM 1
【請求項1】 オーディオ変換符号化における周波数変 換ブロックのブロック長を適応的に切り替える周波数変 換ブロック長適応変換装置であって、 入力オーディオ信号の所定のサンプル数の解析ブロック のブロック幅の1/2ずつ時間シフトした複数の解析ブ ロックのそれぞれについて周波数スペクトル (current residual spectrum) の時間的な 変化量を取得する変化量取得手段と、 前記変化量が最大である周波数スペクトルを中心とする 所定の周波数適合範囲において、前記変化量取得手段に より取得された前記周波数スペクトルの時間的な変化量 と、予め設定したしきい値とを比較する比較手段と、 前記比較手段により前記変化量が前記しきい値を超えた 比較結果が得られた回数の合計が、所定の設定値を越え たか否かを検出し、その検出結果によって前記ブロック 長を決定するブロック変換幅決定手段とを有することを 特徴とする周波数変換ブロック長適応変換装置。

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision (決定手段) based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
JP2003195881A
CLAIM 1
【請求項1】 オーディオ変換符号化における周波数変 換ブロックのブロック長を適応的に切り替える周波数変 換ブロック長適応変換装置であって、 入力オーディオ信号の所定のサンプル数の解析ブロック のブロック幅の1/2ずつ時間シフトした複数の解析ブ ロックのそれぞれについて周波数スペクトルの時間的な 変化量を取得する変化量取得手段と、 前記変化量が最大である周波数スペクトルを中心とする 所定の周波数適合範囲において、前記変化量取得手段に より取得された前記周波数スペクトルの時間的な変化量 と、予め設定したしきい値とを比較する比較手段と、 前記比較手段により前記変化量が前記しきい値を超えた 比較結果が得られた回数の合計が、所定の設定値を越え たか否かを検出し、その検出結果によって前記ブロック 長を決定するブロック変換幅決定手段 (binary decision, update decision) とを有することを 特徴とする周波数変換ブロック長適応変換装置。

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision (決定手段) obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
JP2003195881A
CLAIM 1
【請求項1】 オーディオ変換符号化における周波数変 換ブロックのブロック長を適応的に切り替える周波数変 換ブロック長適応変換装置であって、 入力オーディオ信号の所定のサンプル数の解析ブロック のブロック幅の1/2ずつ時間シフトした複数の解析ブ ロックのそれぞれについて周波数スペクトルの時間的な 変化量を取得する変化量取得手段と、 前記変化量が最大である周波数スペクトルを中心とする 所定の周波数適合範囲において、前記変化量取得手段に より取得された前記周波数スペクトルの時間的な変化量 と、予め設定したしきい値とを比較する比較手段と、 前記比較手段により前記変化量が前記しきい値を超えた 比較結果が得られた回数の合計が、所定の設定値を越え たか否かを検出し、その検出結果によって前記ブロック 長を決定するブロック変換幅決定手段 (binary decision, update decision) とを有することを 特徴とする周波数変換ブロック長適応変換装置。

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum (周波数スペクトル) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
JP2003195881A
CLAIM 1
【請求項1】 オーディオ変換符号化における周波数変 換ブロックのブロック長を適応的に切り替える周波数変 換ブロック長適応変換装置であって、 入力オーディオ信号の所定のサンプル数の解析ブロック のブロック幅の1/2ずつ時間シフトした複数の解析ブ ロックのそれぞれについて周波数スペクトル (current residual spectrum) の時間的な 変化量を取得する変化量取得手段と、 前記変化量が最大である周波数スペクトルを中心とする 所定の周波数適合範囲において、前記変化量取得手段に より取得された前記周波数スペクトルの時間的な変化量 と、予め設定したしきい値とを比較する比較手段と、 前記比較手段により前記変化量が前記しきい値を超えた 比較結果が得られた回数の合計が、所定の設定値を越え たか否かを検出し、その検出結果によって前記ブロック 長を決定するブロック変換幅決定手段とを有することを 特徴とする周波数変換ブロック長適応変換装置。

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum (周波数スペクトル) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
JP2003195881A
CLAIM 1
【請求項1】 オーディオ変換符号化における周波数変 換ブロックのブロック長を適応的に切り替える周波数変 換ブロック長適応変換装置であって、 入力オーディオ信号の所定のサンプル数の解析ブロック のブロック幅の1/2ずつ時間シフトした複数の解析ブ ロックのそれぞれについて周波数スペクトル (current residual spectrum) の時間的な 変化量を取得する変化量取得手段と、 前記変化量が最大である周波数スペクトルを中心とする 所定の周波数適合範囲において、前記変化量取得手段に より取得された前記周波数スペクトルの時間的な変化量 と、予め設定したしきい値とを比較する比較手段と、 前記比較手段により前記変化量が前記しきい値を超えた 比較結果が得られた回数の合計が、所定の設定値を越え たか否かを検出し、その検出結果によって前記ブロック 長を決定するブロック変換幅決定手段とを有することを 特徴とする周波数変換ブロック長適応変換装置。

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum (周波数スペクトル) comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
JP2003195881A
CLAIM 1
【請求項1】 オーディオ変換符号化における周波数変 換ブロックのブロック長を適応的に切り替える周波数変 換ブロック長適応変換装置であって、 入力オーディオ信号の所定のサンプル数の解析ブロック のブロック幅の1/2ずつ時間シフトした複数の解析ブ ロックのそれぞれについて周波数スペクトル (current residual spectrum) の時間的な 変化量を取得する変化量取得手段と、 前記変化量が最大である周波数スペクトルを中心とする 所定の周波数適合範囲において、前記変化量取得手段に より取得された前記周波数スペクトルの時間的な変化量 と、予め設定したしきい値とを比較する比較手段と、 前記比較手段により前記変化量が前記しきい値を超えた 比較結果が得られた回数の合計が、所定の設定値を越え たか否かを検出し、その検出結果によって前記ブロック 長を決定するブロック変換幅決定手段とを有することを 特徴とする周波数変換ブロック長適応変換装置。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20030110029A1

Filed: 2001-12-07     Issued: 2003-06-12

Noise detection and cancellation in communications systems

(Original Assignee) Nortel Networks Ltd     (Current Assignee) Nortel Networks Ltd

Masoud Ahmadi, Joachim Fouret, Marian Neagoe
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (comparison means) , and an initial value of the long term correlation map .
US20030110029A1
CLAIM 11
. Apparatus for distinguishing noise from speech signals in a communications path , the apparatus comprising ;
a store for storing a sequence of frames of signal samples , and comparison means (current frame) for comparing successive frames so as to determine a measure of similarity therebetween , and thereby determine the signal to be speech or noise when said successive frames are found to have respectively a low or high similarity .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame (comparison means) ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US20030110029A1
CLAIM 11
. Apparatus for distinguishing noise from speech signals in a communications path , the apparatus comprising ;
a store for storing a sequence of frames of signal samples , and comparison means (current frame) for comparing successive frames so as to determine a measure of similarity therebetween , and thereby determine the signal to be speech or noise when said successive frames are found to have respectively a low or high similarity .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value (successive frames) with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US20030110029A1
CLAIM 1
. A method of distinguishing noise from speech signals in a communications path , the method comprising ;
storing a sequence of frames of signal samples , comparing successive frames (correlation value) so as to determine a measure of similarity therebetween , and determining the signal to be speech or noise when said successive frames are found to have respectively a low or high similarity .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order (includes means) and a sixteenth order (includes means) of linear prediction residual error energies .
US20030110029A1
CLAIM 13
. Apparatus as claimed in claim 8 , wherein the communications path includes an echo canceller , and wherein the apparatus includes means (second order, sixteenth order) for disabling the echo canceller in the absence of speech signals and the presence of noise signals .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame (comparison means) energy and an average frame energy .
US20030110029A1
CLAIM 11
. Apparatus for distinguishing noise from speech signals in a communications path , the apparatus comprising ;
a store for storing a sequence of frames of signal samples , and comparison means (current frame) for comparing successive frames so as to determine a measure of similarity therebetween , and thereby determine the signal to be speech or noise when said successive frames are found to have respectively a low or high similarity .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame (comparison means) and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US20030110029A1
CLAIM 11
. Apparatus for distinguishing noise from speech signals in a communications path , the apparatus comprising ;
a store for storing a sequence of frames of signal samples , and comparison means (current frame) for comparing successive frames so as to determine a measure of similarity therebetween , and thereby determine the signal to be speech or noise when said successive frames are found to have respectively a low or high similarity .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (comparison means) , and an initial value of the long-term correlation map .
US20030110029A1
CLAIM 11
. Apparatus for distinguishing noise from speech signals in a communications path , the apparatus comprising ;
a store for storing a sequence of frames of signal samples , and comparison means (current frame) for comparing successive frames so as to determine a measure of similarity therebetween , and thereby determine the signal to be speech or noise when said successive frames are found to have respectively a low or high similarity .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (comparison means) , and an initial value of the long-term correlation map .
US20030110029A1
CLAIM 11
. Apparatus for distinguishing noise from speech signals in a communications path , the apparatus comprising ;
a store for storing a sequence of frames of signal samples , and comparison means (current frame) for comparing successive frames so as to determine a measure of similarity therebetween , and thereby determine the signal to be speech or noise when said successive frames are found to have respectively a low or high similarity .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame (comparison means) ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US20030110029A1
CLAIM 11
. Apparatus for distinguishing noise from speech signals in a communications path , the apparatus comprising ;
a store for storing a sequence of frames of signal samples , and comparison means (current frame) for comparing successive frames so as to determine a measure of similarity therebetween , and thereby determine the signal to be speech or noise when said successive frames are found to have respectively a low or high similarity .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6785645B2

Filed: 2001-11-29     Issued: 2004-08-31

Real-time speech and music classifier

(Original Assignee) Microsoft Corp     (Current Assignee) Microsoft Technology Licensing LLC

Hosam Adel Khalil, Vladimir Cuperman, Tian Wang
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (data frames) , and an initial value (predefined criterion) of the long term correlation map .
US6785645B2
CLAIM 1
. A method of classifying a current coding frame in a sequence of audio data frames (current frame) including the current frame and at least one subsequent frame in real-time for switching a multi-mode audio coding system operated in a current coding mode between different modes , the method comprising : recording the sequence of audio data frames , including the current frame and the at least one subsequent frame ;
extracting at least one long-term feature and at least one short-term feature relative to each of the current frame and the at least one subsequent frame , wherein the features substantially exhibit distinct values for different signal types ;
detecting a potential switch point according to the at least one short-term feature of the current frame and the current coding mode ;
and determining whether to switch the current coding mode of the coding system at the potential switch point based on the at least one long-term feature .

US6785645B2
CLAIM 14
. A coder system for coding a sequence of audio frames composed of speech data frames and music data frames including the current frame and at least one subsequent frame , the coder system comprising : an encoder having multiple operating modes , at least one of which is for encoding speech data and another of which is for encoding music data ;
and an encoding classifier in communication with the encoder , wherein the encoding classifier is adapted for determining a potential switching time for the encoder to switch its operating mode based on one or more extracted short-term features of a frame , classifying each frame in the sequence , including the current frame and the at least one subsequent frame , according to one or more long-term features according to a predefined criterion (initial value) , and providing a set of classification information classifying at least one frame of the frames as a speech data or music data fame .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame (data frames) ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US6785645B2
CLAIM 1
. A method of classifying a current coding frame in a sequence of audio data frames (current frame) including the current frame and at least one subsequent frame in real-time for switching a multi-mode audio coding system operated in a current coding mode between different modes , the method comprising : recording the sequence of audio data frames , including the current frame and the at least one subsequent frame ;
extracting at least one long-term feature and at least one short-term feature relative to each of the current frame and the at least one subsequent frame , wherein the features substantially exhibit distinct values for different signal types ;
detecting a potential switch point according to the at least one short-term feature of the current frame and the current coding mode ;
and determining whether to switch the current coding mode of the coding system at the potential switch point based on the at least one long-term feature .

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum comprises locating a maximum between each pair of two consecutive minima (Mahalanobis distance) of the current residual spectrum .
US6785645B2
CLAIM 12
. The method of claim 11 , wherein the pattern recognition method comprises the steps of : calculating a separate Mahalanobis distance (two consecutive minima) value from the feature point of each frame to the center of a speech frame feature pattern and the center of a music frame feature pattern ;
calculating a likelihood value of each frame based on the Mahalanobis distance value for the frame ;
and classifying each frame based , at least in part , on its calculated likelihood value .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value (pattern recognition method) with the previous residual spectrum , over frequency bins between two consecutive minima (Mahalanobis distance) in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US6785645B2
CLAIM 12
. The method of claim 11 , wherein the pattern recognition method (correlation value) comprises the steps of : calculating a separate Mahalanobis distance (two consecutive minima) value from the feature point of each frame to the center of a speech frame feature pattern and the center of a music frame feature pattern ;
calculating a likelihood value of each frame based on the Mahalanobis distance value for the frame ;
and classifying each frame based , at least in part , on its calculated likelihood value .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character (Euclidean distance) parameter in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
US6785645B2
CLAIM 13
. The method of claim 12 , further comprising the step of calculating a separate Euclidean distance (noise character) in the feature space .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame (data frames) energy (switching time) and an average frame energy .
US6785645B2
CLAIM 1
. A method of classifying a current coding frame in a sequence of audio data frames (current frame) including the current frame and at least one subsequent frame in real-time for switching a multi-mode audio coding system operated in a current coding mode between different modes , the method comprising : recording the sequence of audio data frames , including the current frame and the at least one subsequent frame ;
extracting at least one long-term feature and at least one short-term feature relative to each of the current frame and the at least one subsequent frame , wherein the features substantially exhibit distinct values for different signal types ;
detecting a potential switch point according to the at least one short-term feature of the current frame and the current coding mode ;
and determining whether to switch the current coding mode of the coding system at the potential switch point based on the at least one long-term feature .

US6785645B2
CLAIM 14
. A coder system for coding a sequence of audio frames composed of speech data frames and music data frames including the current frame and at least one subsequent frame , the coder system comprising : an encoder having multiple operating modes , at least one of which is for encoding speech data and another of which is for encoding music data ;
and an encoding classifier in communication with the encoder , wherein the encoding classifier is adapted for determining a potential switching time (second energy, current frame energy, second energy value) for the encoder to switch its operating mode based on one or more extracted short-term features of a frame , classifying each frame in the sequence , including the current frame and the at least one subsequent frame , according to one or more long-term features according to a predefined criterion , and providing a set of classification information classifying at least one frame of the frames as a speech data or music data fame .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame (data frames) and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US6785645B2
CLAIM 1
. A method of classifying a current coding frame in a sequence of audio data frames (current frame) including the current frame and at least one subsequent frame in real-time for switching a multi-mode audio coding system operated in a current coding mode between different modes , the method comprising : recording the sequence of audio data frames , including the current frame and the at least one subsequent frame ;
extracting at least one long-term feature and at least one short-term feature relative to each of the current frame and the at least one subsequent frame , wherein the features substantially exhibit distinct values for different signal types ;
detecting a potential switch point according to the at least one short-term feature of the current frame and the current coding mode ;
and determining whether to switch the current coding mode of the coding system at the potential switch point based on the at least one long-term feature .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character (Euclidean distance) parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy (switching time) value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US6785645B2
CLAIM 13
. The method of claim 12 , further comprising the step of calculating a separate Euclidean distance (noise character) in the feature space .

US6785645B2
CLAIM 14
. A coder system for coding a sequence of audio frames composed of speech data frames and music data frames including the current frame and at least one subsequent frame , the coder system comprising : an encoder having multiple operating modes , at least one of which is for encoding speech data and another of which is for encoding music data ;
and an encoding classifier in communication with the encoder , wherein the encoding classifier is adapted for determining a potential switching time (second energy, current frame energy, second energy value) for the encoder to switch its operating mode based on one or more extracted short-term features of a frame , classifying each frame in the sequence , including the current frame and the at least one subsequent frame , according to one or more long-term features according to a predefined criterion , and providing a set of classification information classifying at least one frame of the frames as a speech data or music data fame .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character (Euclidean distance) parameter inferior than a given fixed threshold .
US6785645B2
CLAIM 13
. The method of claim 12 , further comprising the step of calculating a separate Euclidean distance (noise character) in the feature space .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (data frames) , and an initial value (predefined criterion) of the long-term correlation map .
US6785645B2
CLAIM 1
. A method of classifying a current coding frame in a sequence of audio data frames (current frame) including the current frame and at least one subsequent frame in real-time for switching a multi-mode audio coding system operated in a current coding mode between different modes , the method comprising : recording the sequence of audio data frames , including the current frame and the at least one subsequent frame ;
extracting at least one long-term feature and at least one short-term feature relative to each of the current frame and the at least one subsequent frame , wherein the features substantially exhibit distinct values for different signal types ;
detecting a potential switch point according to the at least one short-term feature of the current frame and the current coding mode ;
and determining whether to switch the current coding mode of the coding system at the potential switch point based on the at least one long-term feature .

US6785645B2
CLAIM 14
. A coder system for coding a sequence of audio frames composed of speech data frames and music data frames including the current frame and at least one subsequent frame , the coder system comprising : an encoder having multiple operating modes , at least one of which is for encoding speech data and another of which is for encoding music data ;
and an encoding classifier in communication with the encoder , wherein the encoding classifier is adapted for determining a potential switching time for the encoder to switch its operating mode based on one or more extracted short-term features of a frame , classifying each frame in the sequence , including the current frame and the at least one subsequent frame , according to one or more long-term features according to a predefined criterion (initial value) , and providing a set of classification information classifying at least one frame of the frames as a speech data or music data fame .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (data frames) , and an initial value (predefined criterion) of the long-term correlation map .
US6785645B2
CLAIM 1
. A method of classifying a current coding frame in a sequence of audio data frames (current frame) including the current frame and at least one subsequent frame in real-time for switching a multi-mode audio coding system operated in a current coding mode between different modes , the method comprising : recording the sequence of audio data frames , including the current frame and the at least one subsequent frame ;
extracting at least one long-term feature and at least one short-term feature relative to each of the current frame and the at least one subsequent frame , wherein the features substantially exhibit distinct values for different signal types ;
detecting a potential switch point according to the at least one short-term feature of the current frame and the current coding mode ;
and determining whether to switch the current coding mode of the coding system at the potential switch point based on the at least one long-term feature .

US6785645B2
CLAIM 14
. A coder system for coding a sequence of audio frames composed of speech data frames and music data frames including the current frame and at least one subsequent frame , the coder system comprising : an encoder having multiple operating modes , at least one of which is for encoding speech data and another of which is for encoding music data ;
and an encoding classifier in communication with the encoder , wherein the encoding classifier is adapted for determining a potential switching time for the encoder to switch its operating mode based on one or more extracted short-term features of a frame , classifying each frame in the sequence , including the current frame and the at least one subsequent frame , according to one or more long-term features according to a predefined criterion (initial value) , and providing a set of classification information classifying at least one frame of the frames as a speech data or music data fame .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame (data frames) ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US6785645B2
CLAIM 1
. A method of classifying a current coding frame in a sequence of audio data frames (current frame) including the current frame and at least one subsequent frame in real-time for switching a multi-mode audio coding system operated in a current coding mode between different modes , the method comprising : recording the sequence of audio data frames , including the current frame and the at least one subsequent frame ;
extracting at least one long-term feature and at least one short-term feature relative to each of the current frame and the at least one subsequent frame , wherein the features substantially exhibit distinct values for different signal types ;
detecting a potential switch point according to the at least one short-term feature of the current frame and the current coding mode ;
and determining whether to switch the current coding mode of the coding system at the potential switch point based on the at least one long-term feature .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal (test window) to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US6785645B2
CLAIM 5
. The method of claim 1 , wherein the step of determining whether to switch further comprises the steps of : defining a switching-test window (average signal) ;
analyzing a classified sequence of frames in the window to generate a determination whether to switch ;
and if a determination to switch is generated , generating a switching instruction .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character (Euclidean distance) of the sound signal for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates .
US6785645B2
CLAIM 13
. The method of claim 12 , further comprising the step of calculating a separate Euclidean distance (noise character) in the feature space .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20030101050A1

Filed: 2001-11-29     Issued: 2003-05-29

Real-time speech and music classifier

(Original Assignee) Microsoft Corp     (Current Assignee) Microsoft Technology Licensing LLC

Hosam Khalil, Vladimir Cuperman, Tian Wang
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value (predefined criterion) of the long term correlation map .
US20030101050A1
CLAIM 1
. A method of classifying a current coding frame in a sequence of audio data frames in real-time for switching a multi-mode audio coding system operated in a current coding mode between different modes , the method comprising : recording the sequence of audio data frames ;
extracting at least one long-term feature and at least one short-term feature relative to each audio data frame , wherein the features substantially exhibit distinct values for different signal types ;
detecting a potential switch point according to the at least one short-term feature of the current frame (current frame) and the current coding mode ;
and determining whether to switch the current coding mode of the coding system at the potential switch point based on the at least one long-term feature .

US20030101050A1
CLAIM 16
. A coder system for coding a sequence of audio frames composed of speech data frames and music data frames , the coder system comprising : an encoder having multiple operating modes , at least one of which is for encoding speech data and another of which is for encoding music data ;
and an encoding classifier in communication with the encoder , wherein the encoding classifier is adapted for determining a potential switching time for the encoder to switch its operating mode based on one or more extracted short-term features of a frame , classifying each frame in the sequence according to one or more long-term features extracted from the frame , determining whether to switch a current operating mode of the encoder based on the one or more long-term features according to a predefined criterion (initial value) , and providing a set of classification information classifying at least one frame as a speech data or music data frame .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame (current frame) ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US20030101050A1
CLAIM 1
. A method of classifying a current coding frame in a sequence of audio data frames in real-time for switching a multi-mode audio coding system operated in a current coding mode between different modes , the method comprising : recording the sequence of audio data frames ;
extracting at least one long-term feature and at least one short-term feature relative to each audio data frame , wherein the features substantially exhibit distinct values for different signal types ;
detecting a potential switch point according to the at least one short-term feature of the current frame (current frame) and the current coding mode ;
and determining whether to switch the current coding mode of the coding system at the potential switch point based on the at least one long-term feature .

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum comprises locating a maximum between each pair of two consecutive minima (Mahalanobis distance) of the current residual spectrum .
US20030101050A1
CLAIM 12
. The method of claim 11 , wherein the pattern recognition method comprises the steps of : calculating a separate Mahalanobis distance (two consecutive minima) value from the feature point of each frame to the center of a speech frame feature pattern and the center of a music frame feature pattern ;
calculating a likelihood value of each frame based on the Mahalanobis distance value for the frame ;
and classifying each frame based , at least in part , on its calculated likelihood value .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value (pattern recognition method) with the previous residual spectrum , over frequency bins between two consecutive minima (Mahalanobis distance) in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US20030101050A1
CLAIM 12
. The method of claim 11 , wherein the pattern recognition method (correlation value) comprises the steps of : calculating a separate Mahalanobis distance (two consecutive minima) value from the feature point of each frame to the center of a speech frame feature pattern and the center of a music frame feature pattern ;
calculating a likelihood value of each frame based on the Mahalanobis distance value for the frame ;
and classifying each frame based , at least in part , on its calculated likelihood value .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character (Euclidean distance) parameter in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
US20030101050A1
CLAIM 13
. The method of claim 12 , further comprising the step of calculating a separate Euclidean distance (noise character) in the feature space .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame (current frame) energy (switching time) and an average frame energy .
US20030101050A1
CLAIM 1
. A method of classifying a current coding frame in a sequence of audio data frames in real-time for switching a multi-mode audio coding system operated in a current coding mode between different modes , the method comprising : recording the sequence of audio data frames ;
extracting at least one long-term feature and at least one short-term feature relative to each audio data frame , wherein the features substantially exhibit distinct values for different signal types ;
detecting a potential switch point according to the at least one short-term feature of the current frame (current frame) and the current coding mode ;
and determining whether to switch the current coding mode of the coding system at the potential switch point based on the at least one long-term feature .

US20030101050A1
CLAIM 14
. A classifier for use in a coder having at least a speech coding mode and a music coding mode for switching the coder between the speech and music coding modes , the classifier comprising : a look-ahead buffer for storing a received sequence of frames ;
a feature extractor for extracting one or more long-term features and one or more short-term features for each frame in the buffer , wherein the long-term features and short-term features are capable of distinguishing a frame comprising speech data from a frame comprising music data , and for outputting the extracted long-term features and short-term features ;
and a classification module for receiving the long-term and short-term features from the feature extractor and classifying each frame according to its long-term features and indicating a switching time (second energy, current frame energy, second energy value) for the coder to change its coding mode according to the short-term features .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame (current frame) and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US20030101050A1
CLAIM 1
. A method of classifying a current coding frame in a sequence of audio data frames in real-time for switching a multi-mode audio coding system operated in a current coding mode between different modes , the method comprising : recording the sequence of audio data frames ;
extracting at least one long-term feature and at least one short-term feature relative to each audio data frame , wherein the features substantially exhibit distinct values for different signal types ;
detecting a potential switch point according to the at least one short-term feature of the current frame (current frame) and the current coding mode ;
and determining whether to switch the current coding mode of the coding system at the potential switch point based on the at least one long-term feature .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character (Euclidean distance) parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy (switching time) value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20030101050A1
CLAIM 13
. The method of claim 12 , further comprising the step of calculating a separate Euclidean distance (noise character) in the feature space .

US20030101050A1
CLAIM 14
. A classifier for use in a coder having at least a speech coding mode and a music coding mode for switching the coder between the speech and music coding modes , the classifier comprising : a look-ahead buffer for storing a received sequence of frames ;
a feature extractor for extracting one or more long-term features and one or more short-term features for each frame in the buffer , wherein the long-term features and short-term features are capable of distinguishing a frame comprising speech data from a frame comprising music data , and for outputting the extracted long-term features and short-term features ;
and a classification module for receiving the long-term and short-term features from the feature extractor and classifying each frame according to its long-term features and indicating a switching time (second energy, current frame energy, second energy value) for the coder to change its coding mode according to the short-term features .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character (Euclidean distance) parameter inferior than a given fixed threshold .
US20030101050A1
CLAIM 13
. The method of claim 12 , further comprising the step of calculating a separate Euclidean distance (noise character) in the feature space .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value (predefined criterion) of the long-term correlation map .
US20030101050A1
CLAIM 1
. A method of classifying a current coding frame in a sequence of audio data frames in real-time for switching a multi-mode audio coding system operated in a current coding mode between different modes , the method comprising : recording the sequence of audio data frames ;
extracting at least one long-term feature and at least one short-term feature relative to each audio data frame , wherein the features substantially exhibit distinct values for different signal types ;
detecting a potential switch point according to the at least one short-term feature of the current frame (current frame) and the current coding mode ;
and determining whether to switch the current coding mode of the coding system at the potential switch point based on the at least one long-term feature .

US20030101050A1
CLAIM 16
. A coder system for coding a sequence of audio frames composed of speech data frames and music data frames , the coder system comprising : an encoder having multiple operating modes , at least one of which is for encoding speech data and another of which is for encoding music data ;
and an encoding classifier in communication with the encoder , wherein the encoding classifier is adapted for determining a potential switching time for the encoder to switch its operating mode based on one or more extracted short-term features of a frame , classifying each frame in the sequence according to one or more long-term features extracted from the frame , determining whether to switch a current operating mode of the encoder based on the one or more long-term features according to a predefined criterion (initial value) , and providing a set of classification information classifying at least one frame as a speech data or music data frame .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value (predefined criterion) of the long-term correlation map .
US20030101050A1
CLAIM 1
. A method of classifying a current coding frame in a sequence of audio data frames in real-time for switching a multi-mode audio coding system operated in a current coding mode between different modes , the method comprising : recording the sequence of audio data frames ;
extracting at least one long-term feature and at least one short-term feature relative to each audio data frame , wherein the features substantially exhibit distinct values for different signal types ;
detecting a potential switch point according to the at least one short-term feature of the current frame (current frame) and the current coding mode ;
and determining whether to switch the current coding mode of the coding system at the potential switch point based on the at least one long-term feature .

US20030101050A1
CLAIM 16
. A coder system for coding a sequence of audio frames composed of speech data frames and music data frames , the coder system comprising : an encoder having multiple operating modes , at least one of which is for encoding speech data and another of which is for encoding music data ;
and an encoding classifier in communication with the encoder , wherein the encoding classifier is adapted for determining a potential switching time for the encoder to switch its operating mode based on one or more extracted short-term features of a frame , classifying each frame in the sequence according to one or more long-term features extracted from the frame , determining whether to switch a current operating mode of the encoder based on the one or more long-term features according to a predefined criterion (initial value) , and providing a set of classification information classifying at least one frame as a speech data or music data frame .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame (current frame) ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US20030101050A1
CLAIM 1
. A method of classifying a current coding frame in a sequence of audio data frames in real-time for switching a multi-mode audio coding system operated in a current coding mode between different modes , the method comprising : recording the sequence of audio data frames ;
extracting at least one long-term feature and at least one short-term feature relative to each audio data frame , wherein the features substantially exhibit distinct values for different signal types ;
detecting a potential switch point according to the at least one short-term feature of the current frame (current frame) and the current coding mode ;
and determining whether to switch the current coding mode of the coding system at the potential switch point based on the at least one long-term feature .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal (test window) to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US20030101050A1
CLAIM 5
. The method of claim 1 , wherein the step of determining whether to switch further comprises the steps of : defining a switching-test window (average signal) ;
analyzing a classified sequence of frames in the window to generate a determination whether to switch ;
if a determination to switch is generated , generating a switching instruction .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character (Euclidean distance) of the sound signal for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates .
US20030101050A1
CLAIM 13
. The method of claim 12 , further comprising the step of calculating a separate Euclidean distance (noise character) in the feature space .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20030101052A1

Filed: 2001-10-05     Issued: 2003-05-29

Voice recognition and activation system

(Original Assignee) VIPEX TECHNOLOGIES Inc     (Current Assignee) VIPEX TECHNOLOGIES Inc

Lang Chen, Michael Yeung, Zhenyu Liu
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum (frequency band) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (middle line) , and an initial value of the long term correlation map .
US20030101052A1
CLAIM 5
. A voice recognition process comprising : constructing a voice bit array representing a portion of a voice signal , wherein the voice bit array includes a plurality of lines , each line includes a plurality of bits , and each bit in a line has a value indicating whether in the voice signal , a frequency band (current residual spectrum) corresponding to the bit had an energy greater than a threshold level during a time interval corresponding to the line ;
and comparing the voice bit array to a stored bit arrays to determine whether the voice bit array matches the stored bit array .

US20030101052A1
CLAIM 6
. The process of claim 5 , wherein comparing the voice bit array to a first of the stored arrays comprises : (a) selecting one of the voice bit array and the stored bit array as a source array and selecting the other of the voice bit array and the stored bit array as a target array ;
(b) comparing a middle line (current frame, current frame energy) of the source array to each line in a range of lines containing a middle line of the target array ;
(c) identifying in the range , a best match line that is the line best matching the middle line of the source array ;
(d) splitting the source array into a first segment including lines from a beginning of the source array to the middle line of the source array and a second segment including lines from the middle line of the source array to an end of the source array ;
and (e) splitting the target array into a first segment including lines from a beginning of the target array to the best-match line of the target array and a second segment including lines from the best-match line of the target array to an end of the target array .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum (frequency band) comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame (middle line) ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US20030101052A1
CLAIM 5
. A voice recognition process comprising : constructing a voice bit array representing a portion of a voice signal , wherein the voice bit array includes a plurality of lines , each line includes a plurality of bits , and each bit in a line has a value indicating whether in the voice signal , a frequency band (current residual spectrum) corresponding to the bit had an energy greater than a threshold level during a time interval corresponding to the line ;
and comparing the voice bit array to a stored bit arrays to determine whether the voice bit array matches the stored bit array .

US20030101052A1
CLAIM 6
. The process of claim 5 , wherein comparing the voice bit array to a first of the stored arrays comprises : (a) selecting one of the voice bit array and the stored bit array as a source array and selecting the other of the voice bit array and the stored bit array as a target array ;
(b) comparing a middle line (current frame, current frame energy) of the source array to each line in a range of lines containing a middle line of the target array ;
(c) identifying in the range , a best match line that is the line best matching the middle line of the source array ;
(d) splitting the source array into a first segment including lines from a beginning of the source array to the middle line of the source array and a second segment including lines from the middle line of the source array to an end of the source array ;
and (e) splitting the target array into a first segment including lines from a beginning of the target array to the best-match line of the target array and a second segment including lines from the best-match line of the target array to an end of the target array .

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum (frequency band) comprises locating a maximum between each pair of two consecutive minima of the current residual spectrum .
US20030101052A1
CLAIM 5
. A voice recognition process comprising : constructing a voice bit array representing a portion of a voice signal , wherein the voice bit array includes a plurality of lines , each line includes a plurality of bits , and each bit in a line has a value indicating whether in the voice signal , a frequency band (current residual spectrum) corresponding to the bit had an energy greater than a threshold level during a time interval corresponding to the line ;
and comparing the voice bit array to a stored bit arrays to determine whether the voice bit array matches the stored bit array .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum (frequency band) , calculating a normalized correlation value with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US20030101052A1
CLAIM 5
. A voice recognition process comprising : constructing a voice bit array representing a portion of a voice signal , wherein the voice bit array includes a plurality of lines , each line includes a plurality of bits , and each bit in a line has a value indicating whether in the voice signal , a frequency band (current residual spectrum) corresponding to the bit had an energy greater than a threshold level during a time interval corresponding to the line ;
and comparing the voice bit array to a stored bit arrays to determine whether the voice bit array matches the stored bit array .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (second threshold) indicative of sound activity in the sound signal .
US20030101052A1
CLAIM 19
. The process of claim 18 , wherein comparing the voice bit array to the plurality of stored bit arrays comprises : comparing lines in the voice bit array to lines in each of the plurality of stored bit arrays to generate for each stored bit array a match value indicating how well the voice bit array matches the stored bit array ;
identifying a first stored bit array having a first match value indicating out of the plurality of stored arrays , the first stored bit array matches the voice bit array best ;
generating a result indicating no match was found and ending the process if the first match value is less than a first threshold ;
identifying a second stored bit array having a second match value indicating out of the plurality of stored arrays , the second stored bit array matches the voice bit array second best ;
generating the result indicating no match was found and ending the process if a difference between the first match value and the second match value is less than a required gap ;
partitioning each of the voice bit array and the first stored bit array into a plurality of segments , wherein each segment contains a plurality of lines and each segment of the voice bit array has a corresponding segment in the first stored bit array ;
for each segment , performing an OR operation on the lines of the segment to generate a result line for the segment ;
comparing the result lines for corresponding segments to generate a third match value indicating how well the voice bit array matches the stored bit array ;
and generating the result indicating no match was found if the third value is less than a second threshold (adaptive threshold) , otherwise a second result indicating a match was found .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates (respective output signals) when a tonal sound signal is detected .
US20030101052A1
CLAIM 1
. A voice circuit comprising : a plurality of filters connected in parallel to receive a voice signal ;
and an energy detection circuit connected to outputs of the filters , the energy detection circuit determining amounts of energy in respective output signals (noise energy estimates) from the filters .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates (respective output signals) calculated in a previous frame in a SNR calculation .
US20030101052A1
CLAIM 1
. A voice circuit comprising : a plurality of filters connected in parallel to receive a voice signal ;
and an energy detection circuit connected to outputs of the filters , the energy detection circuit determining amounts of energy in respective output signals (noise energy estimates) from the filters .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates (respective output signals) for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
US20030101052A1
CLAIM 1
. A voice circuit comprising : a plurality of filters connected in parallel to receive a voice signal ;
and an energy detection circuit connected to outputs of the filters , the energy detection circuit determining amounts of energy in respective output signals (noise energy estimates) from the filters .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal prevents updating of noise energy estimates (respective output signals) when a music signal is detected .
US20030101052A1
CLAIM 1
. A voice circuit comprising : a plurality of filters connected in parallel to receive a voice signal ;
and an energy detection circuit connected to outputs of the filters , the energy detection circuit determining amounts of energy in respective output signals (noise energy estimates) from the filters .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates (respective output signals) on the music signal .
US20030101052A1
CLAIM 1
. A voice circuit comprising : a plurality of filters connected in parallel to receive a voice signal ;
and an energy detection circuit connected to outputs of the filters , the energy detection circuit determining amounts of energy in respective output signals (noise energy estimates) from the filters .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame (middle line) energy and an average frame energy .
US20030101052A1
CLAIM 6
. The process of claim 5 , wherein comparing the voice bit array to a first of the stored arrays comprises : (a) selecting one of the voice bit array and the stored bit array as a source array and selecting the other of the voice bit array and the stored bit array as a target array ;
(b) comparing a middle line (current frame, current frame energy) of the source array to each line in a range of lines containing a middle line of the target array ;
(c) identifying in the range , a best match line that is the line best matching the middle line of the source array ;
(d) splitting the source array into a first segment including lines from a beginning of the source array to the middle line of the source array and a second segment including lines from the middle line of the source array to an end of the source array ;
and (e) splitting the target array into a first segment including lines from a beginning of the target array to the best-match line of the target array and a second segment including lines from the best-match line of the target array to an end of the target array .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame (middle line) and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US20030101052A1
CLAIM 6
. The process of claim 5 , wherein comparing the voice bit array to a first of the stored arrays comprises : (a) selecting one of the voice bit array and the stored bit array as a source array and selecting the other of the voice bit array and the stored bit array as a target array ;
(b) comparing a middle line (current frame, current frame energy) of the source array to each line in a range of lines containing a middle line of the target array ;
(c) identifying in the range , a best match line that is the line best matching the middle line of the source array ;
(d) splitting the source array into a first segment including lines from a beginning of the source array to the middle line of the source array and a second segment including lines from the middle line of the source array to an end of the source array ;
and (e) splitting the target array into a first segment including lines from a beginning of the target array to the best-match line of the target array and a second segment including lines from the best-match line of the target array to an end of the target array .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates (respective output signals) is prevented in response to having simultaneously the activity prediction parameter larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US20030101052A1
CLAIM 1
. A voice circuit comprising : a plurality of filters connected in parallel to receive a voice signal ;
and an energy detection circuit connected to outputs of the filters , the energy detection circuit determining amounts of energy in respective output signals (noise energy estimates) from the filters .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value (repeating step) for the first group of frequency bands and a second energy value (repeating step) of the second group of frequency bands ;

calculating a ratio between the first and second energy values (repeating step) so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20030101052A1
CLAIM 7
. The process of claim 6 , further comprising : selecting one of the first segment of the source array and the first segment of the target array as the source array and selecting the other of the first segment of the source array and the first segment of the target array as the target array ;
and then repeating step (first energy value, second energy value, second energy values) s (b) through (e) of claim 6 .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates (respective output signals) is prevented in response to having the noise character parameter inferior than a given fixed threshold .
US20030101052A1
CLAIM 1
. A voice circuit comprising : a plurality of filters connected in parallel to receive a voice signal ;
and an energy detection circuit connected to outputs of the filters , the energy detection circuit determining amounts of energy in respective output signals (noise energy estimates) from the filters .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum (frequency band) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (middle line) , and an initial value of the long-term correlation map .
US20030101052A1
CLAIM 5
. A voice recognition process comprising : constructing a voice bit array representing a portion of a voice signal , wherein the voice bit array includes a plurality of lines , each line includes a plurality of bits , and each bit in a line has a value indicating whether in the voice signal , a frequency band (current residual spectrum) corresponding to the bit had an energy greater than a threshold level during a time interval corresponding to the line ;
and comparing the voice bit array to a stored bit arrays to determine whether the voice bit array matches the stored bit array .

US20030101052A1
CLAIM 6
. The process of claim 5 , wherein comparing the voice bit array to a first of the stored arrays comprises : (a) selecting one of the voice bit array and the stored bit array as a source array and selecting the other of the voice bit array and the stored bit array as a target array ;
(b) comparing a middle line (current frame, current frame energy) of the source array to each line in a range of lines containing a middle line of the target array ;
(c) identifying in the range , a best match line that is the line best matching the middle line of the source array ;
(d) splitting the source array into a first segment including lines from a beginning of the source array to the middle line of the source array and a second segment including lines from the middle line of the source array to an end of the source array ;
and (e) splitting the target array into a first segment including lines from a beginning of the target array to the best-match line of the target array and a second segment including lines from the best-match line of the target array to an end of the target array .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum (frequency band) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (middle line) , and an initial value of the long-term correlation map .
US20030101052A1
CLAIM 5
. A voice recognition process comprising : constructing a voice bit array representing a portion of a voice signal , wherein the voice bit array includes a plurality of lines , each line includes a plurality of bits , and each bit in a line has a value indicating whether in the voice signal , a frequency band (current residual spectrum) corresponding to the bit had an energy greater than a threshold level during a time interval corresponding to the line ;
and comparing the voice bit array to a stored bit arrays to determine whether the voice bit array matches the stored bit array .

US20030101052A1
CLAIM 6
. The process of claim 5 , wherein comparing the voice bit array to a first of the stored arrays comprises : (a) selecting one of the voice bit array and the stored bit array as a source array and selecting the other of the voice bit array and the stored bit array as a target array ;
(b) comparing a middle line (current frame, current frame energy) of the source array to each line in a range of lines containing a middle line of the target array ;
(c) identifying in the range , a best match line that is the line best matching the middle line of the source array ;
(d) splitting the source array into a first segment including lines from a beginning of the source array to the middle line of the source array and a second segment including lines from the middle line of the source array to an end of the source array ;
and (e) splitting the target array into a first segment including lines from a beginning of the target array to the best-match line of the target array and a second segment including lines from the best-match line of the target array to an end of the target array .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum (frequency band) comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame (middle line) ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US20030101052A1
CLAIM 5
. A voice recognition process comprising : constructing a voice bit array representing a portion of a voice signal , wherein the voice bit array includes a plurality of lines , each line includes a plurality of bits , and each bit in a line has a value indicating whether in the voice signal , a frequency band (current residual spectrum) corresponding to the bit had an energy greater than a threshold level during a time interval corresponding to the line ;
and comparing the voice bit array to a stored bit arrays to determine whether the voice bit array matches the stored bit array .

US20030101052A1
CLAIM 6
. The process of claim 5 , wherein comparing the voice bit array to a first of the stored arrays comprises : (a) selecting one of the voice bit array and the stored bit array as a source array and selecting the other of the voice bit array and the stored bit array as a target array ;
(b) comparing a middle line (current frame, current frame energy) of the source array to each line in a range of lines containing a middle line of the target array ;
(c) identifying in the range , a best match line that is the line best matching the middle line of the source array ;
(d) splitting the source array into a first segment including lines from a beginning of the source array to the middle line of the source array and a second segment including lines from the middle line of the source array to an end of the source array ;
and (e) splitting the target array into a first segment including lines from a beginning of the target array to the best-match line of the target array and a second segment including lines from the best-match line of the target array to an end of the target array .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal to noise ratio (second value) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US20030101052A1
CLAIM 16
. The process of claim 15 , wherein comparing the first and second lines comprises determining a factor by combining contributions associated with the bits of the first line , wherein each bit in the first line has a contribution that is : zero if the bit is equal to an identically-positioned bit in the second line ;
a first value if the bit is not equal to the identically-positioned bit in the second line and is equal to either bit adjacent to the identically-positioned bit in the second line ;
and a second value (noise ratio) if the bit is not equal to an identically-positioned bit in the second line and equal to neither bit adjacent to the identically-positioned bit in the second line .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates (respective output signals) in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector .
US20030101052A1
CLAIM 1
. A voice circuit comprising : a plurality of filters connected in parallel to receive a voice signal ;
and an energy detection circuit connected to outputs of the filters , the energy detection circuit determining amounts of energy in respective output signals (noise energy estimates) from the filters .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates (respective output signals) .
US20030101052A1
CLAIM 1
. A voice circuit comprising : a plurality of filters connected in parallel to receive a voice signal ;
and an energy detection circuit connected to outputs of the filters , the energy detection circuit determining amounts of energy in respective output signals (noise energy estimates) from the filters .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20030093278A1

Filed: 2001-10-04     Issued: 2003-05-15

Method of bandwidth extension for narrow-band speech

(Original Assignee) AT&T Corp     (Current Assignee) Nuance Communications Inc

David Malah
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (higher band) using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum (prediction residual, lower band, linear prediction) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US20030093278A1
CLAIM 19
. The method of producing a wideband signal from a narrowband signal of claim 16 , the method further comprising : (8) generating the excitation signal from a narrowband prediction residual (current residual spectrum, residual error, linear prediction residual error energies, linear prediction) signal using fullwave rectification .

US20030093278A1
CLAIM 29
. The method of extending the bandwidth of a narrowband signal of claim 25 , wherein the higher band (sound signal, sound activity detector, detecting sound activity) of the excitation signal is highpass filtered white noise .

US20030093278A1
CLAIM 30
. A method of extending the bandwidth of a narrowband signal , the method comprising : (1) computing narrowband linear predictive coefficients (LPCs) from the narrowband signal ;
(2) computing M nb area coefficients using the narrowband LPCs ;
(3) extracting M wb area coefficients from the M nb area coefficients using interpolation ;
(4) converting the M wb area coefficients into wideband LPCs ;
and (5) synthesizing a wideband signal y wb using the wideband LPCs and highpass filtered white noise in the higher band of an excitation signal and a linear prediction (current residual spectrum, residual error, linear prediction residual error energies, linear prediction) residual signal in the lower band (current residual spectrum, residual error, linear prediction residual error energies, linear prediction) of the excitation signal .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum (prediction residual, lower band, linear prediction) comprises : searching for the minima in the frequency spectrum of the sound signal (higher band) in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US20030093278A1
CLAIM 19
. The method of producing a wideband signal from a narrowband signal of claim 16 , the method further comprising : (8) generating the excitation signal from a narrowband prediction residual (current residual spectrum, residual error, linear prediction residual error energies, linear prediction) signal using fullwave rectification .

US20030093278A1
CLAIM 29
. The method of extending the bandwidth of a narrowband signal of claim 25 , wherein the higher band (sound signal, sound activity detector, detecting sound activity) of the excitation signal is highpass filtered white noise .

US20030093278A1
CLAIM 30
. A method of extending the bandwidth of a narrowband signal , the method comprising : (1) computing narrowband linear predictive coefficients (LPCs) from the narrowband signal ;
(2) computing M nb area coefficients using the narrowband LPCs ;
(3) extracting M wb area coefficients from the M nb area coefficients using interpolation ;
(4) converting the M wb area coefficients into wideband LPCs ;
and (5) synthesizing a wideband signal y wb using the wideband LPCs and highpass filtered white noise in the higher band of an excitation signal and a linear prediction (current residual spectrum, residual error, linear prediction residual error energies, linear prediction) residual signal in the lower band (current residual spectrum, residual error, linear prediction residual error energies, linear prediction) of the excitation signal .

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum (prediction residual, lower band, linear prediction) comprises locating a maximum between each pair of two consecutive minima of the current residual spectrum .
US20030093278A1
CLAIM 19
. The method of producing a wideband signal from a narrowband signal of claim 16 , the method further comprising : (8) generating the excitation signal from a narrowband prediction residual (current residual spectrum, residual error, linear prediction residual error energies, linear prediction) signal using fullwave rectification .

US20030093278A1
CLAIM 30
. A method of extending the bandwidth of a narrowband signal , the method comprising : (1) computing narrowband linear predictive coefficients (LPCs) from the narrowband signal ;
(2) computing M nb area coefficients using the narrowband LPCs ;
(3) extracting M wb area coefficients from the M nb area coefficients using interpolation ;
(4) converting the M wb area coefficients into wideband LPCs ;
and (5) synthesizing a wideband signal y wb using the wideband LPCs and highpass filtered white noise in the higher band of an excitation signal and a linear prediction (current residual spectrum, residual error, linear prediction residual error energies, linear prediction) residual signal in the lower band (current residual spectrum, residual error, linear prediction residual error energies, linear prediction) of the excitation signal .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum (prediction residual, lower band, linear prediction) , calculating a normalized correlation value with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US20030093278A1
CLAIM 19
. The method of producing a wideband signal from a narrowband signal of claim 16 , the method further comprising : (8) generating the excitation signal from a narrowband prediction residual (current residual spectrum, residual error, linear prediction residual error energies, linear prediction) signal using fullwave rectification .

US20030093278A1
CLAIM 30
. A method of extending the bandwidth of a narrowband signal , the method comprising : (1) computing narrowband linear predictive coefficients (LPCs) from the narrowband signal ;
(2) computing M nb area coefficients using the narrowband LPCs ;
(3) extracting M wb area coefficients from the M nb area coefficients using interpolation ;
(4) converting the M wb area coefficients into wideband LPCs ;
and (5) synthesizing a wideband signal y wb using the wideband LPCs and highpass filtered white noise in the higher band of an excitation signal and a linear prediction (current residual spectrum, residual error, linear prediction residual error energies, linear prediction) residual signal in the lower band (current residual spectrum, residual error, linear prediction residual error energies, linear prediction) of the excitation signal .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis (sampling rate) ;

and summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US20030093278A1
CLAIM 1
. A method of producing a wideband signal from a narrowband signal , the method comprising : computing M nb area coefficients from the narrowband signal ;
interpolating the M nb area coefficients into M wb area coefficients ;
generating a highband signal using the M wb area coefficients ;
and combining the highband signal with the narrowband signal interpolated to the highband sampling rate (frequency bin basis) to form the wideband signal .

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (higher band) .
US20030093278A1
CLAIM 29
. The method of extending the bandwidth of a narrowband signal of claim 25 , wherein the higher band (sound signal, sound activity detector, detecting sound activity) of the excitation signal is highpass filtered white noise .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (higher band) comprises searching in the correlation map for frequency bins having a magnitude that exceeds a given fixed threshold .
US20030093278A1
CLAIM 29
. The method of extending the bandwidth of a narrowband signal of claim 25 , wherein the higher band (sound signal, sound activity detector, detecting sound activity) of the excitation signal is highpass filtered white noise .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (higher band) comprises comparing the summed long-term correlation map with an adaptive threshold indicative of sound activity in the sound signal .
US20030093278A1
CLAIM 29
. The method of extending the bandwidth of a narrowband signal of claim 25 , wherein the higher band (sound signal, sound activity detector, detecting sound activity) of the excitation signal is highpass filtered white noise .

US8990073B2
CLAIM 10
. A method for detecting sound activity (higher band) in a sound signal (higher band) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US20030093278A1
CLAIM 29
. The method of extending the bandwidth of a narrowband signal of claim 25 , wherein the higher band (sound signal, sound activity detector, detecting sound activity) of the excitation signal is highpass filtered white noise .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound signal (higher band) is detected .
US20030093278A1
CLAIM 29
. The method of extending the bandwidth of a narrowband signal of claim 25 , wherein the higher band (sound signal, sound activity detector, detecting sound activity) of the excitation signal is highpass filtered white noise .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity in the sound signal (higher band) further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
US20030093278A1
CLAIM 29
. The method of extending the bandwidth of a narrowband signal of claim 25 , wherein the higher band (sound signal, sound activity detector, detecting sound activity) of the excitation signal is highpass filtered white noise .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection comprises detecting the sound signal (higher band) based on a frequency dependent signal-to-noise ratio (SNR) .
US20030093278A1
CLAIM 29
. The method of extending the bandwidth of a narrowband signal of claim 25 , wherein the higher band (sound signal, sound activity detector, detecting sound activity) of the excitation signal is highpass filtered white noise .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal (higher band) further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
US20030093278A1
CLAIM 29
. The method of extending the bandwidth of a narrowband signal of claim 25 , wherein the higher band (sound signal, sound activity detector, detecting sound activity) of the excitation signal is highpass filtered white noise .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal (higher band) and a ratio between a second order and a sixteenth order of linear prediction residual error (prediction residual, lower band, linear prediction) energies .
US20030093278A1
CLAIM 19
. The method of producing a wideband signal from a narrowband signal of claim 16 , the method further comprising : (8) generating the excitation signal from a narrowband prediction residual (current residual spectrum, residual error, linear prediction residual error energies, linear prediction) signal using fullwave rectification .

US20030093278A1
CLAIM 29
. The method of extending the bandwidth of a narrowband signal of claim 25 , wherein the higher band (sound signal, sound activity detector, detecting sound activity) of the excitation signal is highpass filtered white noise .

US20030093278A1
CLAIM 30
. A method of extending the bandwidth of a narrowband signal , the method comprising : (1) computing narrowband linear predictive coefficients (LPCs) from the narrowband signal ;
(2) computing M nb area coefficients using the narrowband LPCs ;
(3) extracting M wb area coefficients from the M nb area coefficients using interpolation ;
(4) converting the M wb area coefficients into wideband LPCs ;
and (5) synthesizing a wideband signal y wb using the wideband LPCs and highpass filtered white noise in the higher band of an excitation signal and a linear prediction (current residual spectrum, residual error, linear prediction residual error energies, linear prediction) residual signal in the lower band (current residual spectrum, residual error, linear prediction residual error energies, linear prediction) of the excitation signal .

US8990073B2
CLAIM 18
. A method as defined in claim 14 , comprising classifying the sound signal (higher band) as one of an inactive sound signal and active sound signal , which comprises determining an inactive sound signal when the average signal-to-noise ratio (SNR av ) is inferior to the calculated threshold .
US20030093278A1
CLAIM 29
. The method of extending the bandwidth of a narrowband signal of claim 25 , wherein the higher band (sound signal, sound activity detector, detecting sound activity) of the excitation signal is highpass filtered white noise .

US8990073B2
CLAIM 19
. A method as defined in claim 14 , comprising classifying the sound signal (higher band) as one of an inactive sound signal and active sound signal , which comprises determining an active sound signal when the average signal-to-noise ratio (SNR av ) is larger than the calculated threshold .
US20030093278A1
CLAIM 29
. The method of extending the bandwidth of a narrowband signal of claim 25 , wherein the higher band (sound signal, sound activity detector, detecting sound activity) of the excitation signal is highpass filtered white noise .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal (higher band) prevents updating of noise energy estimates when a music signal is detected .
US20030093278A1
CLAIM 29
. The method of extending the bandwidth of a narrowband signal of claim 25 , wherein the higher band (sound signal, sound activity detector, detecting sound activity) of the excitation signal is highpass filtered white noise .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal (higher band) in a current frame and an energy of the sound signal in a previous frame , for frequency bands (bandwidth extension, band data) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US20030093278A1
CLAIM 8
. A method of bandwidth extension (frequency bands) of a narrowband signal , the method comprising : computing M nb log-area coefficients from the narrowband signal ;
interpolating the M nb log-area coefficients into M wb log-area coefficients ;
generating a highband signal using the interpolated M wb log-area coefficients ;
and combining the highband signal with the narrowband signal interpolated to the highband sampling rate to generate a wideband signal .

US20030093278A1
CLAIM 29
. The method of extending the bandwidth of a narrowband signal of claim 25 , wherein the higher band (sound signal, sound activity detector, detecting sound activity) of the excitation signal is highpass filtered white noise .

US20030093278A1
CLAIM 34
. A method of producing a wideband signal from a narrowband signal , the method receiving data associated with a narrowband signal , the method comprising : (1) computing M nb area coefficients using the narrowband data (frequency bands) ;
(2) extracting M wb area coefficients from the M nb area coefficients using interpolation ;
and (3) synthesizing a wideband signal y wb using wideband coefficients processed from data associated with the M nb area coefficients and an excitation signal .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (linear predictive coefficient) indicative of an activity of the sound signal (higher band) .
US20030093278A1
CLAIM 16
. A method of producing a wideband signal from a narrowband signal , the method comprising : (1) computing narrowband linear predictive coefficient (activity prediction parameter) s (LPCs) from the narrowband signal ;
(2) computing narrowband parcors r i associated with the narrowband LPCs ;
(3) computing M nb area coefficients A i nb , i=1 , 2 , . . . , M nb using the following : A i = 1 + r i 1 - r i  A i + 1 ;
i = M nb , M nb - 1 , …    , 1 , i=M nb , M nb −1 , . . . , 1 , where A 1 corresponds to a cross-section at lips , A M nb +1 and corresponds to a cross-section of a vocal tract at a glottis opening ;
(4) extracting M wb area coefficients from the M nb area coefficients using interpolation ;
(5) computing wideband parcors using the M wb area coefficients according to the following : r i wb = A i wb - A i + 1 wb A i wb + A i + 1 wb , i = 1 , 2 , …    , M wb ;
(6) computing wideband LPCs a i wb , i=1 , 2 , . . . , M wb , from the wideband parcors ;
and (7) synthesizing a wideband signal y wb using the wideband LPCs and an excitation signal .

US20030093278A1
CLAIM 29
. The method of extending the bandwidth of a narrowband signal of claim 25 , wherein the higher band (sound signal, sound activity detector, detecting sound activity) of the excitation signal is highpass filtered white noise .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (linear predictive coefficient) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal (higher band) and the complementary non-stationarity parameter .
US20030093278A1
CLAIM 16
. A method of producing a wideband signal from a narrowband signal , the method comprising : (1) computing narrowband linear predictive coefficient (activity prediction parameter) s (LPCs) from the narrowband signal ;
(2) computing narrowband parcors r i associated with the narrowband LPCs ;
(3) computing M nb area coefficients A i nb , i=1 , 2 , . . . , M nb using the following : A i = 1 + r i 1 - r i  A i + 1 ;
i = M nb , M nb - 1 , …    , 1 , i=M nb , M nb −1 , . . . , 1 , where A 1 corresponds to a cross-section at lips , A M nb +1 and corresponds to a cross-section of a vocal tract at a glottis opening ;
(4) extracting M wb area coefficients from the M nb area coefficients using interpolation ;
(5) computing wideband parcors using the M wb area coefficients according to the following : r i wb = A i wb - A i + 1 wb A i wb + A i + 1 wb , i = 1 , 2 , …    , M wb ;
(6) computing wideband LPCs a i wb , i=1 , 2 , . . . , M wb , from the wideband parcors ;
and (7) synthesizing a wideband signal y wb using the wideband LPCs and an excitation signal .

US20030093278A1
CLAIM 29
. The method of extending the bandwidth of a narrowband signal of claim 25 , wherein the higher band (sound signal, sound activity detector, detecting sound activity) of the excitation signal is highpass filtered white noise .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (linear predictive coefficient) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US20030093278A1
CLAIM 16
. A method of producing a wideband signal from a narrowband signal , the method comprising : (1) computing narrowband linear predictive coefficient (activity prediction parameter) s (LPCs) from the narrowband signal ;
(2) computing narrowband parcors r i associated with the narrowband LPCs ;
(3) computing M nb area coefficients A i nb , i=1 , 2 , . . . , M nb using the following : A i = 1 + r i 1 - r i  A i + 1 ;
i = M nb , M nb - 1 , …    , 1 , i=M nb , M nb −1 , . . . , 1 , where A 1 corresponds to a cross-section at lips , A M nb +1 and corresponds to a cross-section of a vocal tract at a glottis opening ;
(4) extracting M wb area coefficients from the M nb area coefficients using interpolation ;
(5) computing wideband parcors using the M wb area coefficients according to the following : r i wb = A i wb - A i + 1 wb A i wb + A i + 1 wb , i = 1 , 2 , …    , M wb ;
(6) computing wideband LPCs a i wb , i=1 , 2 , . . . , M wb , from the wideband parcors ;
and (7) synthesizing a wideband signal y wb using the wideband LPCs and an excitation signal .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (bandwidth extension, band data) into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20030093278A1
CLAIM 8
. A method of bandwidth extension (frequency bands) of a narrowband signal , the method comprising : computing M nb log-area coefficients from the narrowband signal ;
interpolating the M nb log-area coefficients into M wb log-area coefficients ;
generating a highband signal using the interpolated M wb log-area coefficients ;
and combining the highband signal with the narrowband signal interpolated to the highband sampling rate to generate a wideband signal .

US20030093278A1
CLAIM 34
. A method of producing a wideband signal from a narrowband signal , the method receiving data associated with a narrowband signal , the method comprising : (1) computing M nb area coefficients using the narrowband data (frequency bands) ;
(2) extracting M wb area coefficients from the M nb area coefficients using interpolation ;
and (3) synthesizing a wideband signal y wb using wideband coefficients processed from data associated with the M nb area coefficients and an excitation signal .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal (higher band) using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum (prediction residual, lower band, linear prediction) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20030093278A1
CLAIM 19
. The method of producing a wideband signal from a narrowband signal of claim 16 , the method further comprising : (8) generating the excitation signal from a narrowband prediction residual (current residual spectrum, residual error, linear prediction residual error energies, linear prediction) signal using fullwave rectification .

US20030093278A1
CLAIM 29
. The method of extending the bandwidth of a narrowband signal of claim 25 , wherein the higher band (sound signal, sound activity detector, detecting sound activity) of the excitation signal is highpass filtered white noise .

US20030093278A1
CLAIM 30
. A method of extending the bandwidth of a narrowband signal , the method comprising : (1) computing narrowband linear predictive coefficients (LPCs) from the narrowband signal ;
(2) computing M nb area coefficients using the narrowband LPCs ;
(3) extracting M wb area coefficients from the M nb area coefficients using interpolation ;
(4) converting the M wb area coefficients into wideband LPCs ;
and (5) synthesizing a wideband signal y wb using the wideband LPCs and highpass filtered white noise in the higher band of an excitation signal and a linear prediction (current residual spectrum, residual error, linear prediction residual error energies, linear prediction) residual signal in the lower band (current residual spectrum, residual error, linear prediction residual error energies, linear prediction) of the excitation signal .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal (higher band) using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum (prediction residual, lower band, linear prediction) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20030093278A1
CLAIM 19
. The method of producing a wideband signal from a narrowband signal of claim 16 , the method further comprising : (8) generating the excitation signal from a narrowband prediction residual (current residual spectrum, residual error, linear prediction residual error energies, linear prediction) signal using fullwave rectification .

US20030093278A1
CLAIM 29
. The method of extending the bandwidth of a narrowband signal of claim 25 , wherein the higher band (sound signal, sound activity detector, detecting sound activity) of the excitation signal is highpass filtered white noise .

US20030093278A1
CLAIM 30
. A method of extending the bandwidth of a narrowband signal , the method comprising : (1) computing narrowband linear predictive coefficients (LPCs) from the narrowband signal ;
(2) computing M nb area coefficients using the narrowband LPCs ;
(3) extracting M wb area coefficients from the M nb area coefficients using interpolation ;
(4) converting the M wb area coefficients into wideband LPCs ;
and (5) synthesizing a wideband signal y wb using the wideband LPCs and highpass filtered white noise in the higher band of an excitation signal and a linear prediction (current residual spectrum, residual error, linear prediction residual error energies, linear prediction) residual signal in the lower band (current residual spectrum, residual error, linear prediction residual error energies, linear prediction) of the excitation signal .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum (prediction residual, lower band, linear prediction) comprises : a locator of the minima in the frequency spectrum of the sound signal (higher band) in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US20030093278A1
CLAIM 19
. The method of producing a wideband signal from a narrowband signal of claim 16 , the method further comprising : (8) generating the excitation signal from a narrowband prediction residual (current residual spectrum, residual error, linear prediction residual error energies, linear prediction) signal using fullwave rectification .

US20030093278A1
CLAIM 29
. The method of extending the bandwidth of a narrowband signal of claim 25 , wherein the higher band (sound signal, sound activity detector, detecting sound activity) of the excitation signal is highpass filtered white noise .

US20030093278A1
CLAIM 30
. A method of extending the bandwidth of a narrowband signal , the method comprising : (1) computing narrowband linear predictive coefficients (LPCs) from the narrowband signal ;
(2) computing M nb area coefficients using the narrowband LPCs ;
(3) extracting M wb area coefficients from the M nb area coefficients using interpolation ;
(4) converting the M wb area coefficients into wideband LPCs ;
and (5) synthesizing a wideband signal y wb using the wideband LPCs and highpass filtered white noise in the higher band of an excitation signal and a linear prediction (current residual spectrum, residual error, linear prediction residual error energies, linear prediction) residual signal in the lower band (current residual spectrum, residual error, linear prediction residual error energies, linear prediction) of the excitation signal .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis (sampling rate) ;

and an adder for summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US20030093278A1
CLAIM 1
. A method of producing a wideband signal from a narrowband signal , the method comprising : computing M nb area coefficients from the narrowband signal ;
interpolating the M nb area coefficients into M wb area coefficients ;
generating a highband signal using the M wb area coefficients ;
and combining the highband signal with the narrowband signal interpolated to the highband sampling rate (frequency bin basis) to form the wideband signal .

US8990073B2
CLAIM 34
. A device as defined in claim 31 , further comprising a detector of strong tones in the sound signal (higher band) .
US20030093278A1
CLAIM 29
. The method of extending the bandwidth of a narrowband signal of claim 25 , wherein the higher band (sound signal, sound activity detector, detecting sound activity) of the excitation signal is highpass filtered white noise .

US8990073B2
CLAIM 35
. A device for detecting sound activity (higher band) in a sound signal (higher band) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US20030093278A1
CLAIM 29
. The method of extending the bandwidth of a narrowband signal of claim 25 , wherein the higher band (sound signal, sound activity detector, detecting sound activity) of the excitation signal is highpass filtered white noise .

US8990073B2
CLAIM 36
. A device for detecting sound activity (higher band) in a sound signal (higher band) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US20030093278A1
CLAIM 29
. The method of extending the bandwidth of a narrowband signal of claim 25 , wherein the higher band (sound signal, sound activity detector, detecting sound activity) of the excitation signal is highpass filtered white noise .

US8990073B2
CLAIM 37
. A device as defined in claim 36 , further comprising a signal-to-noise ratio (SNR)-based sound activity detector (higher band) .
US20030093278A1
CLAIM 29
. The method of extending the bandwidth of a narrowband signal of claim 25 , wherein the higher band (sound signal, sound activity detector, detecting sound activity) of the excitation signal is highpass filtered white noise .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector (higher band) comprises a comparator of an average signal to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US20030093278A1
CLAIM 29
. The method of extending the bandwidth of a narrowband signal of claim 25 , wherein the higher band (sound signal, sound activity detector, detecting sound activity) of the excitation signal is highpass filtered white noise .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector (higher band) .
US20030093278A1
CLAIM 29
. The method of extending the bandwidth of a narrowband signal of claim 25 , wherein the higher band (sound signal, sound activity detector, detecting sound activity) of the excitation signal is highpass filtered white noise .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal (higher band) for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates .
US20030093278A1
CLAIM 29
. The method of extending the bandwidth of a narrowband signal of claim 25 , wherein the higher band (sound signal, sound activity detector, detecting sound activity) of the excitation signal is highpass filtered white noise .

US8990073B2
CLAIM 41
. A device as defined in claim 36 , further comprising a calculator of a spectral parameter used for detecting spectral changes and spectral attacks in the sound signal (higher band) .
US20030093278A1
CLAIM 29
. The method of extending the bandwidth of a narrowband signal of claim 25 , wherein the higher band (sound signal, sound activity detector, detecting sound activity) of the excitation signal is highpass filtered white noise .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
CN1344067A

Filed: 2001-07-09     Issued: 2002-04-10

采用不同编码原理的传送系统

(Original Assignee) 皇家菲利浦电子有限公司     

F·伍帕曼
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum (获得的) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
CN1344067A
CLAIM 2
. 一解码系统,用于从第一和第二编码信号中获取重组信号,该系统包括第一和第二解码器,其中至少其中一个解码器是频率域解码器,以及解码系统包含信号组合装置,用于将从频率域解码器获得的 (current residual spectrum) 解码信号与从另一个解码器获得的解码信号组合成重组信号,其特征在于,第一解码器是频率域解码器,第二解码器是时间域解码器。

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum (获得的) comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
CN1344067A
CLAIM 2
. 一解码系统,用于从第一和第二编码信号中获取重组信号,该系统包括第一和第二解码器,其中至少其中一个解码器是频率域解码器,以及解码系统包含信号组合装置,用于将从频率域解码器获得的 (current residual spectrum) 解码信号与从另一个解码器获得的解码信号组合成重组信号,其特征在于,第一解码器是频率域解码器,第二解码器是时间域解码器。

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum (获得的) comprises locating a maximum between each pair of two consecutive minima of the current residual spectrum .
CN1344067A
CLAIM 2
. 一解码系统,用于从第一和第二编码信号中获取重组信号,该系统包括第一和第二解码器,其中至少其中一个解码器是频率域解码器,以及解码系统包含信号组合装置,用于将从频率域解码器获得的 (current residual spectrum) 解码信号与从另一个解码器获得的解码信号组合成重组信号,其特征在于,第一解码器是频率域解码器,第二解码器是时间域解码器。

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum (获得的) , calculating a normalized correlation value with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
CN1344067A
CLAIM 2
. 一解码系统,用于从第一和第二编码信号中获取重组信号,该系统包括第一和第二解码器,其中至少其中一个解码器是频率域解码器,以及解码系统包含信号组合装置,用于将从频率域解码器获得的 (current residual spectrum) 解码信号与从另一个解码器获得的解码信号组合成重组信号,其特征在于,第一解码器是频率域解码器,第二解码器是时间域解码器。

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency (至少其中一个, 第二解) bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
CN1344067A
CLAIM 1
. 一接收器,用于从第一和第二解 (first frequency, first frequency bands) 码信号中获取一个重组信号,该接收器包括第一和第二解码器,其中该接收器包括信号组合装置,用于将频率域解码器中获取的解码信号与另一个解码器获取的解码信号组合成一个重组信号,其特征在于,第一解码器是频率域解码器,第二解码器是时间域解码器。

CN1344067A
CLAIM 2
. 一解码系统,用于从第一和第二编码信号中获取重组信号,该系统包括第一和第二解码器,其中至少其中一个 (first frequency, first frequency bands) 解码器是频率域解码器,以及解码系统包含信号组合装置,用于将从频率域解码器获得的解码信号与从另一个解码器获得的解码信号组合成重组信号,其特征在于,第一解码器是频率域解码器,第二解码器是时间域解码器。

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum (获得的) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
CN1344067A
CLAIM 2
. 一解码系统,用于从第一和第二编码信号中获取重组信号,该系统包括第一和第二解码器,其中至少其中一个解码器是频率域解码器,以及解码系统包含信号组合装置,用于将从频率域解码器获得的 (current residual spectrum) 解码信号与从另一个解码器获得的解码信号组合成重组信号,其特征在于,第一解码器是频率域解码器,第二解码器是时间域解码器。

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum (获得的) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
CN1344067A
CLAIM 2
. 一解码系统,用于从第一和第二编码信号中获取重组信号,该系统包括第一和第二解码器,其中至少其中一个解码器是频率域解码器,以及解码系统包含信号组合装置,用于将从频率域解码器获得的 (current residual spectrum) 解码信号与从另一个解码器获得的解码信号组合成重组信号,其特征在于,第一解码器是频率域解码器,第二解码器是时间域解码器。

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum (获得的) comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
CN1344067A
CLAIM 2
. 一解码系统,用于从第一和第二编码信号中获取重组信号,该系统包括第一和第二解码器,其中至少其中一个解码器是频率域解码器,以及解码系统包含信号组合装置,用于将从频率域解码器获得的 (current residual spectrum) 解码信号与从另一个解码器获得的解码信号组合成重组信号,其特征在于,第一解码器是频率域解码器,第二解码器是时间域解码器。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20010044722A1

Filed: 2001-01-05     Issued: 2001-11-22

System and method for modifying speech signals

(Original Assignee) Telefonaktiebolaget LM Ericsson AB     (Current Assignee) Optis Wireless Technology LLC

Harald Gustafsson, Ulf Lindgren, Clas Thurban, Petra Deutgen
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (frequency spectrum) of the sound signal , the method comprising : calculating a current residual spectrum (frequency band, lower band) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US20010044722A1
CLAIM 1
. A method for processing a speech signal , comprising the steps of : analyzing a received , narrowband signal to determine synthetic upper band content ;
reproducing a lower band (current residual spectrum) of the speech signal using the received , narrowband signal ;
and combining the reproduced lower band with the determined , synthetic upper band to produce a wideband speech signal having a synthesized component .

US20010044722A1
CLAIM 3
. The method of claim 1 , wherein the step of analyzing further comprises the steps of : performing a spectral analysis on the received narrowband signal to determine parameters associated with a speech model and a residual error signal ;
determining a pitch associated with the residual error signal ;
identifying peaks associated with the received , narrowband signal ;
and copying information from the received , narrowband signal into an upper frequency band (current residual spectrum) based on at least one of the determined pitch and the identified peaks to provide the synthetic upper band content .

US20010044722A1
CLAIM 17
. A system for processing a narrowband speech signal at a receiver , comprising : an upsampler that receives the narrowband speech signal and increases the sampling frequency to generate an output signal having an increased frequency spectrum (frequency spectrum) ;
a parametric spectral analysis module that receives the output signal from the upsampler and analyzes the output signal to generate parameters associated with a speech model and a residual error signal ;
a pitch decision module that receives the residual error signal from the parametric spectral analysis module and generates a pitch signal that represents the pitch of the speech signal and an indicator signal that indicates whether the speech signal represents voiced speech or unvoiced speech ;
a residual extender and copy module that receives and processes the residual error signal and the pitch signal to generate a synthetic upper band signal component .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum (frequency band, lower band) comprises : searching for the minima in the frequency spectrum (frequency spectrum) of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US20010044722A1
CLAIM 1
. A method for processing a speech signal , comprising the steps of : analyzing a received , narrowband signal to determine synthetic upper band content ;
reproducing a lower band (current residual spectrum) of the speech signal using the received , narrowband signal ;
and combining the reproduced lower band with the determined , synthetic upper band to produce a wideband speech signal having a synthesized component .

US20010044722A1
CLAIM 3
. The method of claim 1 , wherein the step of analyzing further comprises the steps of : performing a spectral analysis on the received narrowband signal to determine parameters associated with a speech model and a residual error signal ;
determining a pitch associated with the residual error signal ;
identifying peaks associated with the received , narrowband signal ;
and copying information from the received , narrowband signal into an upper frequency band (current residual spectrum) based on at least one of the determined pitch and the identified peaks to provide the synthetic upper band content .

US20010044722A1
CLAIM 17
. A system for processing a narrowband speech signal at a receiver , comprising : an upsampler that receives the narrowband speech signal and increases the sampling frequency to generate an output signal having an increased frequency spectrum (frequency spectrum) ;
a parametric spectral analysis module that receives the output signal from the upsampler and analyzes the output signal to generate parameters associated with a speech model and a residual error signal ;
a pitch decision module that receives the residual error signal from the parametric spectral analysis module and generates a pitch signal that represents the pitch of the speech signal and an indicator signal that indicates whether the speech signal represents voiced speech or unvoiced speech ;
a residual extender and copy module that receives and processes the residual error signal and the pitch signal to generate a synthetic upper band signal component .

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum (frequency band, lower band) comprises locating a maximum between each pair of two consecutive minima of the current residual spectrum .
US20010044722A1
CLAIM 1
. A method for processing a speech signal , comprising the steps of : analyzing a received , narrowband signal to determine synthetic upper band content ;
reproducing a lower band (current residual spectrum) of the speech signal using the received , narrowband signal ;
and combining the reproduced lower band with the determined , synthetic upper band to produce a wideband speech signal having a synthesized component .

US20010044722A1
CLAIM 3
. The method of claim 1 , wherein the step of analyzing further comprises the steps of : performing a spectral analysis on the received narrowband signal to determine parameters associated with a speech model and a residual error signal ;
determining a pitch associated with the residual error signal ;
identifying peaks associated with the received , narrowband signal ;
and copying information from the received , narrowband signal into an upper frequency band (current residual spectrum) based on at least one of the determined pitch and the identified peaks to provide the synthetic upper band content .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum (frequency band, lower band) , calculating a normalized correlation value with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US20010044722A1
CLAIM 1
. A method for processing a speech signal , comprising the steps of : analyzing a received , narrowband signal to determine synthetic upper band content ;
reproducing a lower band (current residual spectrum) of the speech signal using the received , narrowband signal ;
and combining the reproduced lower band with the determined , synthetic upper band to produce a wideband speech signal having a synthesized component .

US20010044722A1
CLAIM 3
. The method of claim 1 , wherein the step of analyzing further comprises the steps of : performing a spectral analysis on the received narrowband signal to determine parameters associated with a speech model and a residual error signal ;
determining a pitch associated with the residual error signal ;
identifying peaks associated with the received , narrowband signal ;
and copying information from the received , narrowband signal into an upper frequency band (current residual spectrum) based on at least one of the determined pitch and the identified peaks to provide the synthetic upper band content .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin (fast Fourier transform) by frequency bin basis ;

and summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US20010044722A1
CLAIM 11
. A system according to claim 10 , wherein the residual extender and copy module comprises : a fast Fourier transform (frequency bin) module for converting the error signal from the parametric spectral analysis module into the frequency domain ;
a peak detector for identifying the harmonic frequencies of the error signal ;
and a copy module for copying the peaks identified by the peak detector into the upper frequency range .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates (predetermined frequency range, harmonic frequencies, narrow band) when a tonal sound signal is detected .
US20010044722A1
CLAIM 6
. The method of claim 1 , further comprising the step of selectively boosting a predetermined frequency range (first frequency, frequency bands, first frequency bands, noise energy estimates) of the wideband signal .

US20010044722A1
CLAIM 10
. A system according to claim 9 , wherein the means for analyzing a received , narrowband signal to determine synthetic upper band content comprises : a parametric spectral analysis module for analyzing the formant structure of the narrowband signal and generating parameters descriptive of the narrow band (first frequency, frequency bands, first frequency bands, noise energy estimates) voice signal and an error signal ;
a pitch decision module for determining the pitch of the sound segment represented by the narrowband signal ;
and a residual extender and copy module for processing information derived from the narrowband voice signal and generating a synthetic upper band signal component .

US20010044722A1
CLAIM 11
. A system according to claim 10 , wherein the residual extender and copy module comprises : a fast Fourier transform module for converting the error signal from the parametric spectral analysis module into the frequency domain ;
a peak detector for identifying the harmonic frequencies (first frequency, frequency bands, first frequency bands, noise energy estimates) of the error signal ;
and a copy module for copying the peaks identified by the peak detector into the upper frequency range .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates (predetermined frequency range, harmonic frequencies, narrow band) calculated in a previous frame in a SNR calculation .
US20010044722A1
CLAIM 6
. The method of claim 1 , further comprising the step of selectively boosting a predetermined frequency range (first frequency, frequency bands, first frequency bands, noise energy estimates) of the wideband signal .

US20010044722A1
CLAIM 10
. A system according to claim 9 , wherein the means for analyzing a received , narrowband signal to determine synthetic upper band content comprises : a parametric spectral analysis module for analyzing the formant structure of the narrowband signal and generating parameters descriptive of the narrow band (first frequency, frequency bands, first frequency bands, noise energy estimates) voice signal and an error signal ;
a pitch decision module for determining the pitch of the sound segment represented by the narrowband signal ;
and a residual extender and copy module for processing information derived from the narrowband voice signal and generating a synthetic upper band signal component .

US20010044722A1
CLAIM 11
. A system according to claim 10 , wherein the residual extender and copy module comprises : a fast Fourier transform module for converting the error signal from the parametric spectral analysis module into the frequency domain ;
a peak detector for identifying the harmonic frequencies (first frequency, frequency bands, first frequency bands, noise energy estimates) of the error signal ;
and a copy module for copying the peaks identified by the peak detector into the upper frequency range .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection further comprises updating the noise estimates for a next frame (analog format) .
US20010044722A1
CLAIM 7
. The method of claim 1 , further comprising the step of converting the wideband signal to an analog format (next frame) .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates (predetermined frequency range, harmonic frequencies, narrow band) for a next frame (analog format) comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error (residual error) energies .
US20010044722A1
CLAIM 3
. The method of claim 1 , wherein the step of analyzing further comprises the steps of : performing a spectral analysis on the received narrowband signal to determine parameters associated with a speech model and a residual error (residual error) signal ;
determining a pitch associated with the residual error signal ;
identifying peaks associated with the received , narrowband signal ;
and copying information from the received , narrowband signal into an upper frequency band based on at least one of the determined pitch and the identified peaks to provide the synthetic upper band content .

US20010044722A1
CLAIM 6
. The method of claim 1 , further comprising the step of selectively boosting a predetermined frequency range (first frequency, frequency bands, first frequency bands, noise energy estimates) of the wideband signal .

US20010044722A1
CLAIM 7
. The method of claim 1 , further comprising the step of converting the wideband signal to an analog format (next frame) .

US20010044722A1
CLAIM 10
. A system according to claim 9 , wherein the means for analyzing a received , narrowband signal to determine synthetic upper band content comprises : a parametric spectral analysis module for analyzing the formant structure of the narrowband signal and generating parameters descriptive of the narrow band (first frequency, frequency bands, first frequency bands, noise energy estimates) voice signal and an error signal ;
a pitch decision module for determining the pitch of the sound segment represented by the narrowband signal ;
and a residual extender and copy module for processing information derived from the narrowband voice signal and generating a synthetic upper band signal component .

US20010044722A1
CLAIM 11
. A system according to claim 10 , wherein the residual extender and copy module comprises : a fast Fourier transform module for converting the error signal from the parametric spectral analysis module into the frequency domain ;
a peak detector for identifying the harmonic frequencies (first frequency, frequency bands, first frequency bands, noise energy estimates) of the error signal ;
and a copy module for copying the peaks identified by the peak detector into the upper frequency range .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal prevents updating of noise energy estimates (predetermined frequency range, harmonic frequencies, narrow band) when a music signal is detected .
US20010044722A1
CLAIM 6
. The method of claim 1 , further comprising the step of selectively boosting a predetermined frequency range (first frequency, frequency bands, first frequency bands, noise energy estimates) of the wideband signal .

US20010044722A1
CLAIM 10
. A system according to claim 9 , wherein the means for analyzing a received , narrowband signal to determine synthetic upper band content comprises : a parametric spectral analysis module for analyzing the formant structure of the narrowband signal and generating parameters descriptive of the narrow band (first frequency, frequency bands, first frequency bands, noise energy estimates) voice signal and an error signal ;
a pitch decision module for determining the pitch of the sound segment represented by the narrowband signal ;
and a residual extender and copy module for processing information derived from the narrowband voice signal and generating a synthetic upper band signal component .

US20010044722A1
CLAIM 11
. A system according to claim 10 , wherein the residual extender and copy module comprises : a fast Fourier transform module for converting the error signal from the parametric spectral analysis module into the frequency domain ;
a peak detector for identifying the harmonic frequencies (first frequency, frequency bands, first frequency bands, noise energy estimates) of the error signal ;
and a copy module for copying the peaks identified by the peak detector into the upper frequency range .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates (predetermined frequency range, harmonic frequencies, narrow band) on the music signal .
US20010044722A1
CLAIM 6
. The method of claim 1 , further comprising the step of selectively boosting a predetermined frequency range (first frequency, frequency bands, first frequency bands, noise energy estimates) of the wideband signal .

US20010044722A1
CLAIM 10
. A system according to claim 9 , wherein the means for analyzing a received , narrowband signal to determine synthetic upper band content comprises : a parametric spectral analysis module for analyzing the formant structure of the narrowband signal and generating parameters descriptive of the narrow band (first frequency, frequency bands, first frequency bands, noise energy estimates) voice signal and an error signal ;
a pitch decision module for determining the pitch of the sound segment represented by the narrowband signal ;
and a residual extender and copy module for processing information derived from the narrowband voice signal and generating a synthetic upper band signal component .

US20010044722A1
CLAIM 11
. A system according to claim 10 , wherein the residual extender and copy module comprises : a fast Fourier transform module for converting the error signal from the parametric spectral analysis module into the frequency domain ;
a peak detector for identifying the harmonic frequencies (first frequency, frequency bands, first frequency bands, noise energy estimates) of the error signal ;
and a copy module for copying the peaks identified by the peak detector into the upper frequency range .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame and an energy of the sound signal in a previous frame , for frequency bands (predetermined frequency range, harmonic frequencies, narrow band) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US20010044722A1
CLAIM 6
. The method of claim 1 , further comprising the step of selectively boosting a predetermined frequency range (first frequency, frequency bands, first frequency bands, noise energy estimates) of the wideband signal .

US20010044722A1
CLAIM 10
. A system according to claim 9 , wherein the means for analyzing a received , narrowband signal to determine synthetic upper band content comprises : a parametric spectral analysis module for analyzing the formant structure of the narrowband signal and generating parameters descriptive of the narrow band (first frequency, frequency bands, first frequency bands, noise energy estimates) voice signal and an error signal ;
a pitch decision module for determining the pitch of the sound segment represented by the narrowband signal ;
and a residual extender and copy module for processing information derived from the narrowband voice signal and generating a synthetic upper band signal component .

US20010044722A1
CLAIM 11
. A system according to claim 10 , wherein the residual extender and copy module comprises : a fast Fourier transform module for converting the error signal from the parametric spectral analysis module into the frequency domain ;
a peak detector for identifying the harmonic frequencies (first frequency, frequency bands, first frequency bands, noise energy estimates) of the error signal ;
and a copy module for copying the peaks identified by the peak detector into the upper frequency range .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates (predetermined frequency range, harmonic frequencies, narrow band) is prevented in response to having simultaneously the activity prediction parameter larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US20010044722A1
CLAIM 6
. The method of claim 1 , further comprising the step of selectively boosting a predetermined frequency range (first frequency, frequency bands, first frequency bands, noise energy estimates) of the wideband signal .

US20010044722A1
CLAIM 10
. A system according to claim 9 , wherein the means for analyzing a received , narrowband signal to determine synthetic upper band content comprises : a parametric spectral analysis module for analyzing the formant structure of the narrowband signal and generating parameters descriptive of the narrow band (first frequency, frequency bands, first frequency bands, noise energy estimates) voice signal and an error signal ;
a pitch decision module for determining the pitch of the sound segment represented by the narrowband signal ;
and a residual extender and copy module for processing information derived from the narrowband voice signal and generating a synthetic upper band signal component .

US20010044722A1
CLAIM 11
. A system according to claim 10 , wherein the residual extender and copy module comprises : a fast Fourier transform module for converting the error signal from the parametric spectral analysis module into the frequency domain ;
a peak detector for identifying the harmonic frequencies (first frequency, frequency bands, first frequency bands, noise energy estimates) of the error signal ;
and a copy module for copying the peaks identified by the peak detector into the upper frequency range .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (predetermined frequency range, harmonic frequencies, narrow band) into a first group of a certain number of first frequency (predetermined frequency range, harmonic frequencies, narrow band) bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20010044722A1
CLAIM 6
. The method of claim 1 , further comprising the step of selectively boosting a predetermined frequency range (first frequency, frequency bands, first frequency bands, noise energy estimates) of the wideband signal .

US20010044722A1
CLAIM 10
. A system according to claim 9 , wherein the means for analyzing a received , narrowband signal to determine synthetic upper band content comprises : a parametric spectral analysis module for analyzing the formant structure of the narrowband signal and generating parameters descriptive of the narrow band (first frequency, frequency bands, first frequency bands, noise energy estimates) voice signal and an error signal ;
a pitch decision module for determining the pitch of the sound segment represented by the narrowband signal ;
and a residual extender and copy module for processing information derived from the narrowband voice signal and generating a synthetic upper band signal component .

US20010044722A1
CLAIM 11
. A system according to claim 10 , wherein the residual extender and copy module comprises : a fast Fourier transform module for converting the error signal from the parametric spectral analysis module into the frequency domain ;
a peak detector for identifying the harmonic frequencies (first frequency, frequency bands, first frequency bands, noise energy estimates) of the error signal ;
and a copy module for copying the peaks identified by the peak detector into the upper frequency range .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates (predetermined frequency range, harmonic frequencies, narrow band) is prevented in response to having the noise character parameter inferior than a given fixed threshold .
US20010044722A1
CLAIM 6
. The method of claim 1 , further comprising the step of selectively boosting a predetermined frequency range (first frequency, frequency bands, first frequency bands, noise energy estimates) of the wideband signal .

US20010044722A1
CLAIM 10
. A system according to claim 9 , wherein the means for analyzing a received , narrowband signal to determine synthetic upper band content comprises : a parametric spectral analysis module for analyzing the formant structure of the narrowband signal and generating parameters descriptive of the narrow band (first frequency, frequency bands, first frequency bands, noise energy estimates) voice signal and an error signal ;
a pitch decision module for determining the pitch of the sound segment represented by the narrowband signal ;
and a residual extender and copy module for processing information derived from the narrowband voice signal and generating a synthetic upper band signal component .

US20010044722A1
CLAIM 11
. A system according to claim 10 , wherein the residual extender and copy module comprises : a fast Fourier transform module for converting the error signal from the parametric spectral analysis module into the frequency domain ;
a peak detector for identifying the harmonic frequencies (first frequency, frequency bands, first frequency bands, noise energy estimates) of the error signal ;
and a copy module for copying the peaks identified by the peak detector into the upper frequency range .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (frequency spectrum) of the sound signal , the device comprising : means for calculating a current residual spectrum (frequency band, lower band) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20010044722A1
CLAIM 1
. A method for processing a speech signal , comprising the steps of : analyzing a received , narrowband signal to determine synthetic upper band content ;
reproducing a lower band (current residual spectrum) of the speech signal using the received , narrowband signal ;
and combining the reproduced lower band with the determined , synthetic upper band to produce a wideband speech signal having a synthesized component .

US20010044722A1
CLAIM 3
. The method of claim 1 , wherein the step of analyzing further comprises the steps of : performing a spectral analysis on the received narrowband signal to determine parameters associated with a speech model and a residual error signal ;
determining a pitch associated with the residual error signal ;
identifying peaks associated with the received , narrowband signal ;
and copying information from the received , narrowband signal into an upper frequency band (current residual spectrum) based on at least one of the determined pitch and the identified peaks to provide the synthetic upper band content .

US20010044722A1
CLAIM 17
. A system for processing a narrowband speech signal at a receiver , comprising : an upsampler that receives the narrowband speech signal and increases the sampling frequency to generate an output signal having an increased frequency spectrum (frequency spectrum) ;
a parametric spectral analysis module that receives the output signal from the upsampler and analyzes the output signal to generate parameters associated with a speech model and a residual error signal ;
a pitch decision module that receives the residual error signal from the parametric spectral analysis module and generates a pitch signal that represents the pitch of the speech signal and an indicator signal that indicates whether the speech signal represents voiced speech or unvoiced speech ;
a residual extender and copy module that receives and processes the residual error signal and the pitch signal to generate a synthetic upper band signal component .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (frequency spectrum) of the sound signal , the device comprising : a calculator of a current residual spectrum (frequency band, lower band) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US20010044722A1
CLAIM 1
. A method for processing a speech signal , comprising the steps of : analyzing a received , narrowband signal to determine synthetic upper band content ;
reproducing a lower band (current residual spectrum) of the speech signal using the received , narrowband signal ;
and combining the reproduced lower band with the determined , synthetic upper band to produce a wideband speech signal having a synthesized component .

US20010044722A1
CLAIM 3
. The method of claim 1 , wherein the step of analyzing further comprises the steps of : performing a spectral analysis on the received narrowband signal to determine parameters associated with a speech model and a residual error signal ;
determining a pitch associated with the residual error signal ;
identifying peaks associated with the received , narrowband signal ;
and copying information from the received , narrowband signal into an upper frequency band (current residual spectrum) based on at least one of the determined pitch and the identified peaks to provide the synthetic upper band content .

US20010044722A1
CLAIM 17
. A system for processing a narrowband speech signal at a receiver , comprising : an upsampler that receives the narrowband speech signal and increases the sampling frequency to generate an output signal having an increased frequency spectrum (frequency spectrum) ;
a parametric spectral analysis module that receives the output signal from the upsampler and analyzes the output signal to generate parameters associated with a speech model and a residual error signal ;
a pitch decision module that receives the residual error signal from the parametric spectral analysis module and generates a pitch signal that represents the pitch of the speech signal and an indicator signal that indicates whether the speech signal represents voiced speech or unvoiced speech ;
a residual extender and copy module that receives and processes the residual error signal and the pitch signal to generate a synthetic upper band signal component .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum (frequency band, lower band) comprises : a locator of the minima in the frequency spectrum (frequency spectrum) of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US20010044722A1
CLAIM 1
. A method for processing a speech signal , comprising the steps of : analyzing a received , narrowband signal to determine synthetic upper band content ;
reproducing a lower band (current residual spectrum) of the speech signal using the received , narrowband signal ;
and combining the reproduced lower band with the determined , synthetic upper band to produce a wideband speech signal having a synthesized component .

US20010044722A1
CLAIM 3
. The method of claim 1 , wherein the step of analyzing further comprises the steps of : performing a spectral analysis on the received narrowband signal to determine parameters associated with a speech model and a residual error signal ;
determining a pitch associated with the residual error signal ;
identifying peaks associated with the received , narrowband signal ;
and copying information from the received , narrowband signal into an upper frequency band (current residual spectrum) based on at least one of the determined pitch and the identified peaks to provide the synthetic upper band content .

US20010044722A1
CLAIM 17
. A system for processing a narrowband speech signal at a receiver , comprising : an upsampler that receives the narrowband speech signal and increases the sampling frequency to generate an output signal having an increased frequency spectrum (frequency spectrum) ;
a parametric spectral analysis module that receives the output signal from the upsampler and analyzes the output signal to generate parameters associated with a speech model and a residual error signal ;
a pitch decision module that receives the residual error signal from the parametric spectral analysis module and generates a pitch signal that represents the pitch of the speech signal and an indicator signal that indicates whether the speech signal represents voiced speech or unvoiced speech ;
a residual extender and copy module that receives and processes the residual error signal and the pitch signal to generate a synthetic upper band signal component .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin (fast Fourier transform) by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US20010044722A1
CLAIM 11
. A system according to claim 10 , wherein the residual extender and copy module comprises : a fast Fourier transform (frequency bin) module for converting the error signal from the parametric spectral analysis module into the frequency domain ;
a peak detector for identifying the harmonic frequencies of the error signal ;
and a copy module for copying the peaks identified by the peak detector into the upper frequency range .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates (predetermined frequency range, harmonic frequencies, narrow band) in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector .
US20010044722A1
CLAIM 6
. The method of claim 1 , further comprising the step of selectively boosting a predetermined frequency range (first frequency, frequency bands, first frequency bands, noise energy estimates) of the wideband signal .

US20010044722A1
CLAIM 10
. A system according to claim 9 , wherein the means for analyzing a received , narrowband signal to determine synthetic upper band content comprises : a parametric spectral analysis module for analyzing the formant structure of the narrowband signal and generating parameters descriptive of the narrow band (first frequency, frequency bands, first frequency bands, noise energy estimates) voice signal and an error signal ;
a pitch decision module for determining the pitch of the sound segment represented by the narrowband signal ;
and a residual extender and copy module for processing information derived from the narrowband voice signal and generating a synthetic upper band signal component .

US20010044722A1
CLAIM 11
. A system according to claim 10 , wherein the residual extender and copy module comprises : a fast Fourier transform module for converting the error signal from the parametric spectral analysis module into the frequency domain ;
a peak detector for identifying the harmonic frequencies (first frequency, frequency bands, first frequency bands, noise energy estimates) of the error signal ;
and a copy module for copying the peaks identified by the peak detector into the upper frequency range .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates (predetermined frequency range, harmonic frequencies, narrow band) .
US20010044722A1
CLAIM 6
. The method of claim 1 , further comprising the step of selectively boosting a predetermined frequency range (first frequency, frequency bands, first frequency bands, noise energy estimates) of the wideband signal .

US20010044722A1
CLAIM 10
. A system according to claim 9 , wherein the means for analyzing a received , narrowband signal to determine synthetic upper band content comprises : a parametric spectral analysis module for analyzing the formant structure of the narrowband signal and generating parameters descriptive of the narrow band (first frequency, frequency bands, first frequency bands, noise energy estimates) voice signal and an error signal ;
a pitch decision module for determining the pitch of the sound segment represented by the narrowband signal ;
and a residual extender and copy module for processing information derived from the narrowband voice signal and generating a synthetic upper band signal component .

US20010044722A1
CLAIM 11
. A system according to claim 10 , wherein the residual extender and copy module comprises : a fast Fourier transform module for converting the error signal from the parametric spectral analysis module into the frequency domain ;
a peak detector for identifying the harmonic frequencies (first frequency, frequency bands, first frequency bands, noise energy estimates) of the error signal ;
and a copy module for copying the peaks identified by the peak detector into the upper frequency range .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6708145B1

Filed: 2000-12-20     Issued: 2004-03-16

Enhancing perceptual performance of sbr and related hfr coding methods by adaptive noise-floor addition and noise substitution limiting

(Original Assignee) Coding Technologies Sweden AB     (Current Assignee) Dolby International AB

Lars Gustaf Liljeryd, Kristofer Kjorling, Per Ekstrand, Fredrik Henn
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum (source encoding) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map (decoding method) .
US6708145B1
CLAIM 1
. A method for enhancing a source encoding (current residual spectrum, residual error) method , the source encoding method generating an encoded signal by encoding an original signal , the original signal having a low band portion and a high band portion , the encoded signal including the low band portion of the original signal and not including the high band portion of the original signal , comprising the following steps : estimating a noise-floor level of the high band portion of the original signal , the noise floor level being a measure for a difference between a first spectral envelope determined by local minimum points of a spectral representation of the original signal and a second spectral envelope determined by local maximum points of a spectral representation of the original signal ;
and multiplexing the encoded signal including the low band portion of the original signal and the noise-floor level of the high band portion of the original signal to obtain an encoder output signal .

US6708145B1
CLAIM 7
. A method according to claim 1 , in which a spectral envelope of the high band portion of the original signal is estimated and additionally multiplexed into the encoder output signal to be used by a decoding method (second group, term correlation map, second energy values) using a high-frequency reconstruction technique .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum (source encoding) comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US6708145B1
CLAIM 1
. A method for enhancing a source encoding (current residual spectrum, residual error) method , the source encoding method generating an encoded signal by encoding an original signal , the original signal having a low band portion and a high band portion , the encoded signal including the low band portion of the original signal and not including the high band portion of the original signal , comprising the following steps : estimating a noise-floor level of the high band portion of the original signal , the noise floor level being a measure for a difference between a first spectral envelope determined by local minimum points of a spectral representation of the original signal and a second spectral envelope determined by local maximum points of a spectral representation of the original signal ;
and multiplexing the encoded signal including the low band portion of the original signal and the noise-floor level of the high band portion of the original signal to obtain an encoder output signal .

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum (source encoding) comprises locating a maximum between each pair of two consecutive minima (minimum point) of the current residual spectrum .
US6708145B1
CLAIM 1
. A method for enhancing a source encoding (current residual spectrum, residual error) method , the source encoding method generating an encoded signal by encoding an original signal , the original signal having a low band portion and a high band portion , the encoded signal including the low band portion of the original signal and not including the high band portion of the original signal , comprising the following steps : estimating a noise-floor level of the high band portion of the original signal , the noise floor level being a measure for a difference between a first spectral envelope determined by local minimum point (consecutive minima, two consecutive minima) s of a spectral representation of the original signal and a second spectral envelope determined by local maximum points of a spectral representation of the original signal ;
and multiplexing the encoded signal including the low band portion of the original signal and the noise-floor level of the high band portion of the original signal to obtain an encoder output signal .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum (source encoding) , calculating a normalized correlation value with the previous residual spectrum , over frequency bins between two consecutive minima (minimum point) in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US6708145B1
CLAIM 1
. A method for enhancing a source encoding (current residual spectrum, residual error) method , the source encoding method generating an encoded signal by encoding an original signal , the original signal having a low band portion and a high band portion , the encoded signal including the low band portion of the original signal and not including the high band portion of the original signal , comprising the following steps : estimating a noise-floor level of the high band portion of the original signal , the noise floor level being a measure for a difference between a first spectral envelope determined by local minimum point (consecutive minima, two consecutive minima) s of a spectral representation of the original signal and a second spectral envelope determined by local maximum points of a spectral representation of the original signal ;
and multiplexing the encoded signal including the low band portion of the original signal and the noise-floor level of the high band portion of the original signal to obtain an encoder output signal .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order (following steps) and a sixteenth order of linear prediction residual error (source encoding) energies .
US6708145B1
CLAIM 1
. A method for enhancing a source encoding (current residual spectrum, residual error) method , the source encoding method generating an encoded signal by encoding an original signal , the original signal having a low band portion and a high band portion , the encoded signal including the low band portion of the original signal and not including the high band portion of the original signal , comprising the following steps (second order) : estimating a noise-floor level of the high band portion of the original signal , the noise floor level being a measure for a difference between a first spectral envelope determined by local minimum points of a spectral representation of the original signal and a second spectral envelope determined by local maximum points of a spectral representation of the original signal ;
and multiplexing the encoded signal including the low band portion of the original signal and the noise-floor level of the high band portion of the original signal to obtain an encoder output signal .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame energy and an average frame (local maximum) energy .
US6708145B1
CLAIM 1
. A method for enhancing a source encoding method , the source encoding method generating an encoded signal by encoding an original signal , the original signal having a low band portion and a high band portion , the encoded signal including the low band portion of the original signal and not including the high band portion of the original signal , comprising the following steps : estimating a noise-floor level of the high band portion of the original signal , the noise floor level being a measure for a difference between a first spectral envelope determined by local minimum points of a spectral representation of the original signal and a second spectral envelope determined by local maximum (average frame) points of a spectral representation of the original signal ;
and multiplexing the encoded signal including the low band portion of the original signal and the noise-floor level of the high band portion of the original signal to obtain an encoder output signal .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame and an energy of the sound signal in a previous frame , for frequency bands (frequency bands, band frequency) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US6708145B1
CLAIM 2
. A method according to claim 1 , in which the step of estimating includes the following step : mapping the noise-floor level to several frequency bands (frequency bands, first frequency bands) to obtain a noise-floor level for each of the several frequency bands .

US6708145B1
CLAIM 17
. An apparatus for enhancing a source decoder , the source decoder generating a decoded signal by decoding an encoded signal obtained by source encoding of an original signal , the original signal having a low band portion and a high band portion , the encoded signal including the low band portion of the original signal and not including the high band portion of the original signal , wherein the decoded signal is used for high-frequency reconstruction to obtain a high-frequency reconstructed signal including a reconstructed high band portion of the original signal , comprising : a high frequency reconstruction module for generating a signal , the high-frequency reconstruction module having a summer for summing several high-frequency reconstructed signals , originating from different low band frequency (frequency bands, first frequency bands) ranges of the decoded signal to obtain the signal , and an analyzer for analyzing the low band portion of the decoded signal and for providing control data to the summer .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision (polynomial representation) obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
US6708145B1
CLAIM 5
. A method according to claim 1 , in which the noise-floor level is represented using linear predictive coding , or any other polynomial representation (binary decision) .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (frequency bands, band frequency) into a first group of a certain number of first frequency bands and a second group (decoding method) of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values (decoding method) so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US6708145B1
CLAIM 2
. A method according to claim 1 , in which the step of estimating includes the following step : mapping the noise-floor level to several frequency bands (frequency bands, first frequency bands) to obtain a noise-floor level for each of the several frequency bands .

US6708145B1
CLAIM 7
. A method according to claim 1 , in which a spectral envelope of the high band portion of the original signal is estimated and additionally multiplexed into the encoder output signal to be used by a decoding method (second group, term correlation map, second energy values) using a high-frequency reconstruction technique .

US6708145B1
CLAIM 17
. An apparatus for enhancing a source decoder , the source decoder generating a decoded signal by decoding an encoded signal obtained by source encoding of an original signal , the original signal having a low band portion and a high band portion , the encoded signal including the low band portion of the original signal and not including the high band portion of the original signal , wherein the decoded signal is used for high-frequency reconstruction to obtain a high-frequency reconstructed signal including a reconstructed high band portion of the original signal , comprising : a high frequency reconstruction module for generating a signal , the high-frequency reconstruction module having a summer for summing several high-frequency reconstructed signals , originating from different low band frequency (frequency bands, first frequency bands) ranges of the decoded signal to obtain the signal , and an analyzer for analyzing the low band portion of the decoded signal and for providing control data to the summer .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum (source encoding) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US6708145B1
CLAIM 1
. A method for enhancing a source encoding (current residual spectrum, residual error) method , the source encoding method generating an encoded signal by encoding an original signal , the original signal having a low band portion and a high band portion , the encoded signal including the low band portion of the original signal and not including the high band portion of the original signal , comprising the following steps : estimating a noise-floor level of the high band portion of the original signal , the noise floor level being a measure for a difference between a first spectral envelope determined by local minimum points of a spectral representation of the original signal and a second spectral envelope determined by local maximum points of a spectral representation of the original signal ;
and multiplexing the encoded signal including the low band portion of the original signal and the noise-floor level of the high band portion of the original signal to obtain an encoder output signal .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum (source encoding) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US6708145B1
CLAIM 1
. A method for enhancing a source encoding (current residual spectrum, residual error) method , the source encoding method generating an encoded signal by encoding an original signal , the original signal having a low band portion and a high band portion , the encoded signal including the low band portion of the original signal and not including the high band portion of the original signal , comprising the following steps : estimating a noise-floor level of the high band portion of the original signal , the noise floor level being a measure for a difference between a first spectral envelope determined by local minimum points of a spectral representation of the original signal and a second spectral envelope determined by local maximum points of a spectral representation of the original signal ;
and multiplexing the encoded signal including the low band portion of the original signal and the noise-floor level of the high band portion of the original signal to obtain an encoder output signal .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum (source encoding) comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US6708145B1
CLAIM 1
. A method for enhancing a source encoding (current residual spectrum, residual error) method , the source encoding method generating an encoded signal by encoding an original signal , the original signal having a low band portion and a high band portion , the encoded signal including the low band portion of the original signal and not including the high band portion of the original signal , comprising the following steps : estimating a noise-floor level of the high band portion of the original signal , the noise floor level being a measure for a difference between a first spectral envelope determined by local minimum points of a spectral representation of the original signal and a second spectral envelope determined by local maximum points of a spectral representation of the original signal ;
and multiplexing the encoded signal including the low band portion of the original signal and the noise-floor level of the high band portion of the original signal to obtain an encoder output signal .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20010001853A1

Filed: 2000-12-20     Issued: 2001-05-24

Low frequency spectral enhancement system and method

(Original Assignee) Mauro Anthony P.; Sih Gilbert C.     

Anthony Mauro, Gilbert Sih
US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold indicative of sound activity (fundamental frequencies) in the sound signal .
US20010001853A1
CLAIM 13
. A method for enhancing a signal , comprising : determining a window of frequencies of the signal ;
determining a first set of frequencies within the window , wherein the first set of frequencies have signal-to-noise ratios above a threshold ;
and amplifying fundamental frequencies (sound activity) within the first set .

US8990073B2
CLAIM 10
. A method for detecting sound activity (fundamental frequencies) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US20010001853A1
CLAIM 13
. A method for enhancing a signal , comprising : determining a window of frequencies of the signal ;
determining a first set of frequencies within the window , wherein the first set of frequencies have signal-to-noise ratios above a threshold ;
and amplifying fundamental frequencies (sound activity) within the first set .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity (fundamental frequencies) in the sound signal further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
US20010001853A1
CLAIM 4
. The system of claim 1 , further comprising : a speech detector (sound activity detection) operative to detect speech content of the signal .

US20010001853A1
CLAIM 13
. A method for enhancing a signal , comprising : determining a window of frequencies of the signal ;
determining a first set of frequencies within the window , wherein the first set of frequencies have signal-to-noise ratios above a threshold ;
and amplifying fundamental frequencies (sound activity) within the first set .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (fundamental frequencies) detection comprises detecting the sound signal based on a frequency dependent signal-to-noise ratio (SNR) .
US20010001853A1
CLAIM 4
. The system of claim 1 , further comprising : a speech detector (sound activity detection) operative to detect speech content of the signal .

US20010001853A1
CLAIM 13
. A method for enhancing a signal , comprising : determining a window of frequencies of the signal ;
determining a first set of frequencies within the window , wherein the first set of frequencies have signal-to-noise ratios above a threshold ;
and amplifying fundamental frequencies (sound activity) within the first set .

US8990073B2
CLAIM 14
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (fundamental frequencies) detection comprises comparing an average signal-to-noise ratio (SNR av ) to a threshold calculated as a function of a long-term signal-to-noise ratio (SNR LT ) .
US20010001853A1
CLAIM 4
. The system of claim 1 , further comprising : a speech detector (sound activity detection) operative to detect speech content of the signal .

US20010001853A1
CLAIM 13
. A method for enhancing a signal , comprising : determining a window of frequencies of the signal ;
determining a first set of frequencies within the window , wherein the first set of frequencies have signal-to-noise ratios above a threshold ;
and amplifying fundamental frequencies (sound activity) within the first set .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity (fundamental frequencies) detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (noise ratio) .
US20010001853A1
CLAIM 4
. The system of claim 1 , further comprising : a speech detector (sound activity detection) operative to detect speech content of the signal .

US20010001853A1
CLAIM 13
. A method for enhancing a signal , comprising : determining a window of frequencies of the signal ;
determining a first set of frequencies within the window , wherein the first set of frequencies have signal-to-noise ratio (noise ratio, SNR LT, SNR calculation) s above a threshold ;
and amplifying fundamental frequencies (sound activity) within the first set .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity (fundamental frequencies) detection further comprises updating the noise estimates for a next frame .
US20010001853A1
CLAIM 4
. The system of claim 1 , further comprising : a speech detector (sound activity detection) operative to detect speech content of the signal .

US20010001853A1
CLAIM 13
. A method for enhancing a signal , comprising : determining a window of frequencies of the signal ;
determining a first set of frequencies within the window , wherein the first set of frequencies have signal-to-noise ratios above a threshold ;
and amplifying fundamental frequencies (sound activity) within the first set .

US8990073B2
CLAIM 35
. A device for detecting sound activity (fundamental frequencies) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US20010001853A1
CLAIM 13
. A method for enhancing a signal , comprising : determining a window of frequencies of the signal ;
determining a first set of frequencies within the window , wherein the first set of frequencies have signal-to-noise ratios above a threshold ;
and amplifying fundamental frequencies (sound activity) within the first set .

US8990073B2
CLAIM 36
. A device for detecting sound activity (fundamental frequencies) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US20010001853A1
CLAIM 13
. A method for enhancing a signal , comprising : determining a window of frequencies of the signal ;
determining a first set of frequencies within the window , wherein the first set of frequencies have signal-to-noise ratios above a threshold ;
and amplifying fundamental frequencies (sound activity) within the first set .

US8990073B2
CLAIM 37
. A device as defined in claim 36 , further comprising a signal-to-noise ratio (SNR)-based sound activity (fundamental frequencies) detector .
US20010001853A1
CLAIM 13
. A method for enhancing a signal , comprising : determining a window of frequencies of the signal ;
determining a first set of frequencies within the window , wherein the first set of frequencies have signal-to-noise ratios above a threshold ;
and amplifying fundamental frequencies (sound activity) within the first set .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity (fundamental frequencies) detector comprises a comparator of an average signal to noise ratio (noise ratio) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US20010001853A1
CLAIM 13
. A method for enhancing a signal , comprising : determining a window of frequencies of the signal ;
determining a first set of frequencies within the window , wherein the first set of frequencies have signal-to-noise ratio (noise ratio, SNR LT, SNR calculation) s above a threshold ;
and amplifying fundamental frequencies (sound activity) within the first set .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity (fundamental frequencies) detector .
US20010001853A1
CLAIM 13
. A method for enhancing a signal , comprising : determining a window of frequencies of the signal ;
determining a first set of frequencies within the window , wherein the first set of frequencies have signal-to-noise ratios above a threshold ;
and amplifying fundamental frequencies (sound activity) within the first set .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
EP1158494A1

Filed: 2000-12-08     Issued: 2001-11-28

Method and apparatus for performing audio coding and decoding by interleaving smoothed critical band evelopes at higher frequencies

(Original Assignee) Nokia of America Corp     (Current Assignee) Nokia of America Corp

Oded Ghitza
US8990073B2
CLAIM 10
. A method for detecting sound activity (low frequency band) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (music signal) from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
EP1158494A1
CLAIM 1
A method of coding an audio signal comprising the steps of : (a) dividing the audio signal into a plurality of frequency band signals , one or more of said frequency band signals being low frequency band (detecting sound activity) signals which comprise frequency components below a given threshold frequency , and one or more of said frequency band signals being high frequency band signals which comprise frequency components above the given threshold frequency ;
(b) coding at least one of said low frequency band signals so as to preserve at least some of the phase information comprised in a waveform representative of said low frequency band signal ;
(c) generating , for at least one of said high frequency band signals , a corresponding critical band envelope signal which is representative of at least a portion of an envelope of a waveform representative of said corresponding high frequency band signal but which substantially excludes phase information associated with said waveform representative of said corresponding high frequency band signal ;
and (d) coding said at least one of said high frequency band signals by encoding said critical band envelope signals corresponding thereto .

EP1158494A1
CLAIM 14
The method of any of claims 1 to 12 wherein the audio signal comprises a music signal (music signal) .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal prevents updating of noise energy estimates when a music signal (music signal) is detected .
EP1158494A1
CLAIM 14
The method of any of claims 1 to 12 wherein the audio signal comprises a music signal (music signal) .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (speech signal) in order to distinguish a music signal (music signal) from a background noise signal and prevent update of noise energy estimates on the music signal .
EP1158494A1
CLAIM 13
The method of any of the preceding claims wherein the audio signal comprises a speech signal (noise character parameter, activity prediction parameter) .

EP1158494A1
CLAIM 14
The method of any of claims 1 to 12 wherein the audio signal comprises a music signal (music signal) .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame and an energy of the sound signal in a previous frame , for frequency bands (frequency bands) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
EP1158494A1
CLAIM 15
The method of any of the preceding claims wherein said one or more frequency bands (frequency bands) are approximately equally distributed along a Bark scale .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (speech signal) indicative of an activity of the sound signal .
EP1158494A1
CLAIM 13
The method of any of the preceding claims wherein the audio signal comprises a speech signal (noise character parameter, activity prediction parameter) .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (speech signal) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
EP1158494A1
CLAIM 13
The method of any of the preceding claims wherein the audio signal comprises a speech signal (noise character parameter, activity prediction parameter) .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (speech signal) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
EP1158494A1
CLAIM 13
The method of any of the preceding claims wherein the audio signal comprises a speech signal (noise character parameter, activity prediction parameter) .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (speech signal) comprises : dividing a plurality of frequency bands (frequency bands) into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
EP1158494A1
CLAIM 13
The method of any of the preceding claims wherein the audio signal comprises a speech signal (noise character parameter, activity prediction parameter) .

EP1158494A1
CLAIM 15
The method of any of the preceding claims wherein said one or more frequency bands (frequency bands) are approximately equally distributed along a Bark scale .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (speech signal) inferior than a given fixed threshold .
EP1158494A1
CLAIM 13
The method of any of the preceding claims wherein the audio signal comprises a speech signal (noise character parameter, activity prediction parameter) .

US8990073B2
CLAIM 35
. A device for detecting sound activity (low frequency band) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (music signal) from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
EP1158494A1
CLAIM 1
A method of coding an audio signal comprising the steps of : (a) dividing the audio signal into a plurality of frequency band signals , one or more of said frequency band signals being low frequency band (detecting sound activity) signals which comprise frequency components below a given threshold frequency , and one or more of said frequency band signals being high frequency band signals which comprise frequency components above the given threshold frequency ;
(b) coding at least one of said low frequency band signals so as to preserve at least some of the phase information comprised in a waveform representative of said low frequency band signal ;
(c) generating , for at least one of said high frequency band signals , a corresponding critical band envelope signal which is representative of at least a portion of an envelope of a waveform representative of said corresponding high frequency band signal but which substantially excludes phase information associated with said waveform representative of said corresponding high frequency band signal ;
and (d) coding said at least one of said high frequency band signals by encoding said critical band envelope signals corresponding thereto .

EP1158494A1
CLAIM 14
The method of any of claims 1 to 12 wherein the audio signal comprises a music signal (music signal) .

US8990073B2
CLAIM 36
. A device for detecting sound activity (low frequency band) in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal (music signal) from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
EP1158494A1
CLAIM 1
A method of coding an audio signal comprising the steps of : (a) dividing the audio signal into a plurality of frequency band signals , one or more of said frequency band signals being low frequency band (detecting sound activity) signals which comprise frequency components below a given threshold frequency , and one or more of said frequency band signals being high frequency band signals which comprise frequency components above the given threshold frequency ;
(b) coding at least one of said low frequency band signals so as to preserve at least some of the phase information comprised in a waveform representative of said low frequency band signal ;
(c) generating , for at least one of said high frequency band signals , a corresponding critical band envelope signal which is representative of at least a portion of an envelope of a waveform representative of said corresponding high frequency band signal but which substantially excludes phase information associated with said waveform representative of said corresponding high frequency band signal ;
and (d) coding said at least one of said high frequency band signals by encoding said critical band envelope signals corresponding thereto .

EP1158494A1
CLAIM 14
The method of any of claims 1 to 12 wherein the audio signal comprises a music signal (music signal) .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal (music signal) from a background noise signal and preventing update of noise energy estimates .
EP1158494A1
CLAIM 14
The method of any of claims 1 to 12 wherein the audio signal comprises a music signal (music signal) .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20020111798A1

Filed: 2000-12-08     Issued: 2002-08-15

Method and apparatus for robust speech classification

(Original Assignee) Qualcomm Inc     (Current Assignee) Qualcomm Inc

Pengjun Huang
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (correlation Coefficient Function threshold, parameter analyzer) of the long term correlation map .
US20020111798A1
CLAIM 1
. A method of speech classification , comprising : inputting classification parameters to a speech classifier from external components ;
generating , in the speech classifier , internal classification parameters from at least one of the input parameters ;
setting a Normalized Auto-correlation Coefficient Function threshold (initial value) and selecting a parameter analyzer (initial value) according to a signal environment ;
and analyzing the input parameters and the internal parameters to produce a speech mode classification .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (speech signal) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
US20020111798A1
CLAIM 2
. The method of claim 1 wherein the input parameters comprise a noise suppressed speech signal (noise character parameter, activity prediction parameter) .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame energy (current frame energy) and an average frame energy .
US20020111798A1
CLAIM 10
. The method of claim 1 wherein the internal parameters comprise a current frame energy (current frame energy) parameter .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (speech signal) indicative of an activity of the sound signal .
US20020111798A1
CLAIM 2
. The method of claim 1 wherein the input parameters comprise a noise suppressed speech signal (noise character parameter, activity prediction parameter) .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (speech signal) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
US20020111798A1
CLAIM 2
. The method of claim 1 wherein the input parameters comprise a noise suppressed speech signal (noise character parameter, activity prediction parameter) .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (speech signal) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US20020111798A1
CLAIM 2
. The method of claim 1 wherein the input parameters comprise a noise suppressed speech signal (noise character parameter, activity prediction parameter) .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (speech signal) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20020111798A1
CLAIM 2
. The method of claim 1 wherein the input parameters comprise a noise suppressed speech signal (noise character parameter, activity prediction parameter) .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (speech signal) inferior than a given fixed threshold .
US20020111798A1
CLAIM 2
. The method of claim 1 wherein the input parameters comprise a noise suppressed speech signal (noise character parameter, activity prediction parameter) .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (correlation Coefficient Function threshold, parameter analyzer) of the long-term correlation map .
US20020111798A1
CLAIM 1
. A method of speech classification , comprising : inputting classification parameters to a speech classifier from external components ;
generating , in the speech classifier , internal classification parameters from at least one of the input parameters ;
setting a Normalized Auto-correlation Coefficient Function threshold (initial value) and selecting a parameter analyzer (initial value) according to a signal environment ;
and analyzing the input parameters and the internal parameters to produce a speech mode classification .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (correlation Coefficient Function threshold, parameter analyzer) of the long-term correlation map .
US20020111798A1
CLAIM 1
. A method of speech classification , comprising : inputting classification parameters to a speech classifier from external components ;
generating , in the speech classifier , internal classification parameters from at least one of the input parameters ;
setting a Normalized Auto-correlation Coefficient Function threshold (initial value) and selecting a parameter analyzer (initial value) according to a signal environment ;
and analyzing the input parameters and the internal parameters to produce a speech mode classification .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal to noise ratio (Noise Ratio) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US20020111798A1
CLAIM 3
. The method of claim 1 wherein the input parameters comprise Signal to Noise Ratio (noise ratio) information for a noise suppressed speech signal .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
JP2002118517A

Filed: 2000-12-04     Issued: 2002-04-19

直交変換装置及び方法、逆直交変換装置及び方法、変換符号化装置及び方法、並びに復号装置及び方法

(Original Assignee) Sony Corp; ソニー株式会社     

Kenichi Makino, Atsushi Matsumoto, Masayuki Nishiguchi, 淳 松本, 堅一 牧野, 正之 西口
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (音声信号) using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum (なるサンプル) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
JP2002118517A
CLAIM 1
【請求項1】 入力された時系列サンプルをオーバーラ ップさせながら直交変換する直交変換装置において、 時系列Mサンプルを直交変換するときに、逆直交変換時 にエイリアシングが生じる境界となるサンプル (current residual spectrum) 位置αを 0≦α<Mの範囲で任意に決定して直交変換を行うこと を特徴とする直交変換装置。

JP2002118517A
CLAIM 14
【請求項14】 上記入力信号は音声信号 (sound signal) 及び/又は音 響信号であることを特徴とする請求項8記載の変換符号 化装置。

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum (なるサンプル) comprises : searching for the minima in the frequency spectrum of the sound signal (音声信号) in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
JP2002118517A
CLAIM 1
【請求項1】 入力された時系列サンプルをオーバーラ ップさせながら直交変換する直交変換装置において、 時系列Mサンプルを直交変換するときに、逆直交変換時 にエイリアシングが生じる境界となるサンプル (current residual spectrum) 位置αを 0≦α<Mの範囲で任意に決定して直交変換を行うこと を特徴とする直交変換装置。

JP2002118517A
CLAIM 14
【請求項14】 上記入力信号は音声信号 (sound signal) 及び/又は音 響信号であることを特徴とする請求項8記載の変換符号 化装置。

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum (なるサンプル) comprises locating a maximum between each pair of two consecutive minima of the current residual spectrum .
JP2002118517A
CLAIM 1
【請求項1】 入力された時系列サンプルをオーバーラ ップさせながら直交変換する直交変換装置において、 時系列Mサンプルを直交変換するときに、逆直交変換時 にエイリアシングが生じる境界となるサンプル (current residual spectrum) 位置αを 0≦α<Mの範囲で任意に決定して直交変換を行うこと を特徴とする直交変換装置。

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum (なるサンプル) , calculating a normalized correlation value with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
JP2002118517A
CLAIM 1
【請求項1】 入力された時系列サンプルをオーバーラ ップさせながら直交変換する直交変換装置において、 時系列Mサンプルを直交変換するときに、逆直交変換時 にエイリアシングが生じる境界となるサンプル (current residual spectrum) 位置αを 0≦α<Mの範囲で任意に決定して直交変換を行うこと を特徴とする直交変換装置。

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (音声信号) .
JP2002118517A
CLAIM 14
【請求項14】 上記入力信号は音声信号 (sound signal) 及び/又は音 響信号であることを特徴とする請求項8記載の変換符号 化装置。

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (音声信号) comprises searching in the correlation map for frequency bins having a magnitude that exceeds a given fixed threshold .
JP2002118517A
CLAIM 14
【請求項14】 上記入力信号は音声信号 (sound signal) 及び/又は音 響信号であることを特徴とする請求項8記載の変換符号 化装置。

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (音声信号) comprises comparing the summed long-term correlation map with an adaptive threshold indicative of sound activity in the sound signal .
JP2002118517A
CLAIM 14
【請求項14】 上記入力信号は音声信号 (sound signal) 及び/又は音 響信号であることを特徴とする請求項8記載の変換符号 化装置。

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal (音声信号) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
JP2002118517A
CLAIM 14
【請求項14】 上記入力信号は音声信号 (sound signal) 及び/又は音 響信号であることを特徴とする請求項8記載の変換符号 化装置。

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound signal (音声信号) is detected .
JP2002118517A
CLAIM 14
【請求項14】 上記入力信号は音声信号 (sound signal) 及び/又は音 響信号であることを特徴とする請求項8記載の変換符号 化装置。

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity in the sound signal (音声信号) further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
JP2002118517A
CLAIM 14
【請求項14】 上記入力信号は音声信号 (sound signal) 及び/又は音 響信号であることを特徴とする請求項8記載の変換符号 化装置。

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection comprises detecting the sound signal (音声信号) based on a frequency dependent signal-to-noise ratio (SNR) .
JP2002118517A
CLAIM 14
【請求項14】 上記入力信号は音声信号 (sound signal) 及び/又は音 響信号であることを特徴とする請求項8記載の変換符号 化装置。

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal (音声信号) further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
JP2002118517A
CLAIM 14
【請求項14】 上記入力信号は音声信号 (sound signal) 及び/又は音 響信号であることを特徴とする請求項8記載の変換符号 化装置。

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision (決定手段) based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal (音声信号) and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
JP2002118517A
CLAIM 8
【請求項8】 入力信号を直交変換して圧縮符号化する 変換符号化装置において、 上記入力信号を所定サンプルずつ取り込み、予測分析し て予測残差を出力する予測分析手段と、 上記入力信号の所定サンプル毎の特性を判断する特性判 断手段と、 上記特性判断手段で判断された特性に基づいて直交変換 時のブロック長を決定するブロック長決定手段 (binary decision, update decision) と、 上記ブロック長決定手段で決定されたブロック長で、逆 直交変換時にエイリアシングが生じる境界となるサンプ ル位置αを0≦α<Mの範囲で任意に決定して、上記予 測分析手段から出力される上記予測残差を入力時系列M サンプルとしてオーバーラップさせながら、上記時系列 Mサンプルに直交変換処理を施して直交変換係数を生成 する直交変換手段と、 上記直交変換手段で生成された直交変換係数を量子化す る量子化手段とを備えることを特徴とする変換符号化装 置。

JP2002118517A
CLAIM 14
【請求項14】 上記入力信号は音声信号 (sound signal) 及び/又は音 響信号であることを特徴とする請求項8記載の変換符号 化装置。

US8990073B2
CLAIM 18
. A method as defined in claim 14 , comprising classifying the sound signal (音声信号) as one of an inactive sound signal and active sound signal , which comprises determining an inactive sound signal when the average signal-to-noise ratio (SNR av ) is inferior to the calculated threshold .
JP2002118517A
CLAIM 14
【請求項14】 上記入力信号は音声信号 (sound signal) 及び/又は音 響信号であることを特徴とする請求項8記載の変換符号 化装置。

US8990073B2
CLAIM 19
. A method as defined in claim 14 , comprising classifying the sound signal (音声信号) as one of an inactive sound signal and active sound signal , which comprises determining an active sound signal when the average signal-to-noise ratio (SNR av ) is larger than the calculated threshold .
JP2002118517A
CLAIM 14
【請求項14】 上記入力信号は音声信号 (sound signal) 及び/又は音 響信号であることを特徴とする請求項8記載の変換符号 化装置。

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal (音声信号) prevents updating of noise energy estimates when a music signal is detected .
JP2002118517A
CLAIM 14
【請求項14】 上記入力信号は音声信号 (sound signal) 及び/又は音 響信号であることを特徴とする請求項8記載の変換符号 化装置。

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame energy and an average frame (フレーム間) energy .
JP2002118517A
CLAIM 3
【請求項3】 窓関数を適切に選んで上記サンプル位置 αを隣接するフレーム間 (average frame) で合わせることを特徴とする請 求項2記載の直交変換装置。

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal (音声信号) in a current frame and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
JP2002118517A
CLAIM 14
【請求項14】 上記入力信号は音声信号 (sound signal) 及び/又は音 響信号であることを特徴とする請求項8記載の変換符号 化装置。

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter indicative of an activity of the sound signal (音声信号) .
JP2002118517A
CLAIM 14
【請求項14】 上記入力信号は音声信号 (sound signal) 及び/又は音 響信号であることを特徴とする請求項8記載の変換符号 化装置。

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision (決定手段) obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal (音声信号) and the complementary non-stationarity parameter .
JP2002118517A
CLAIM 8
【請求項8】 入力信号を直交変換して圧縮符号化する 変換符号化装置において、 上記入力信号を所定サンプルずつ取り込み、予測分析し て予測残差を出力する予測分析手段と、 上記入力信号の所定サンプル毎の特性を判断する特性判 断手段と、 上記特性判断手段で判断された特性に基づいて直交変換 時のブロック長を決定するブロック長決定手段 (binary decision, update decision) と、 上記ブロック長決定手段で決定されたブロック長で、逆 直交変換時にエイリアシングが生じる境界となるサンプ ル位置αを0≦α<Mの範囲で任意に決定して、上記予 測分析手段から出力される上記予測残差を入力時系列M サンプルとしてオーバーラップさせながら、上記時系列 Mサンプルに直交変換処理を施して直交変換係数を生成 する直交変換手段と、 上記直交変換手段で生成された直交変換係数を量子化す る量子化手段とを備えることを特徴とする変換符号化装 置。

JP2002118517A
CLAIM 14
【請求項14】 上記入力信号は音声信号 (sound signal) 及び/又は音 響信号であることを特徴とする請求項8記載の変換符号 化装置。

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group (えること) of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
JP2002118517A
CLAIM 8
【請求項8】 入力信号を直交変換して圧縮符号化する 変換符号化装置において、 上記入力信号を所定サンプルずつ取り込み、予測分析し て予測残差を出力する予測分析手段と、 上記入力信号の所定サンプル毎の特性を判断する特性判 断手段と、 上記特性判断手段で判断された特性に基づいて直交変換 時のブロック長を決定するブロック長決定手段と、 上記ブロック長決定手段で決定されたブロック長で、逆 直交変換時にエイリアシングが生じる境界となるサンプ ル位置αを0≦α<Mの範囲で任意に決定して、上記予 測分析手段から出力される上記予測残差を入力時系列M サンプルとしてオーバーラップさせながら、上記時系列 Mサンプルに直交変換処理を施して直交変換係数を生成 する直交変換手段と、 上記直交変換手段で生成された直交変換係数を量子化す る量子化手段とを備えること (first group) を特徴とする変換符号化装 置。

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal (音声信号) using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum (なるサンプル) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
JP2002118517A
CLAIM 1
【請求項1】 入力された時系列サンプルをオーバーラ ップさせながら直交変換する直交変換装置において、 時系列Mサンプルを直交変換するときに、逆直交変換時 にエイリアシングが生じる境界となるサンプル (current residual spectrum) 位置αを 0≦α<Mの範囲で任意に決定して直交変換を行うこと を特徴とする直交変換装置。

JP2002118517A
CLAIM 14
【請求項14】 上記入力信号は音声信号 (sound signal) 及び/又は音 響信号であることを特徴とする請求項8記載の変換符号 化装置。

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal (音声信号) using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum (なるサンプル) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
JP2002118517A
CLAIM 1
【請求項1】 入力された時系列サンプルをオーバーラ ップさせながら直交変換する直交変換装置において、 時系列Mサンプルを直交変換するときに、逆直交変換時 にエイリアシングが生じる境界となるサンプル (current residual spectrum) 位置αを 0≦α<Mの範囲で任意に決定して直交変換を行うこと を特徴とする直交変換装置。

JP2002118517A
CLAIM 14
【請求項14】 上記入力信号は音声信号 (sound signal) 及び/又は音 響信号であることを特徴とする請求項8記載の変換符号 化装置。

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum (なるサンプル) comprises : a locator of the minima in the frequency spectrum of the sound signal (音声信号) in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
JP2002118517A
CLAIM 1
【請求項1】 入力された時系列サンプルをオーバーラ ップさせながら直交変換する直交変換装置において、 時系列Mサンプルを直交変換するときに、逆直交変換時 にエイリアシングが生じる境界となるサンプル (current residual spectrum) 位置αを 0≦α<Mの範囲で任意に決定して直交変換を行うこと を特徴とする直交変換装置。

JP2002118517A
CLAIM 14
【請求項14】 上記入力信号は音声信号 (sound signal) 及び/又は音 響信号であることを特徴とする請求項8記載の変換符号 化装置。

US8990073B2
CLAIM 34
. A device as defined in claim 31 , further comprising a detector of strong tones in the sound signal (音声信号) .
JP2002118517A
CLAIM 14
【請求項14】 上記入力信号は音声信号 (sound signal) 及び/又は音 響信号であることを特徴とする請求項8記載の変換符号 化装置。

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal (音声信号) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
JP2002118517A
CLAIM 14
【請求項14】 上記入力信号は音声信号 (sound signal) 及び/又は音 響信号であることを特徴とする請求項8記載の変換符号 化装置。

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal (音声信号) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
JP2002118517A
CLAIM 14
【請求項14】 上記入力信号は音声信号 (sound signal) 及び/又は音 響信号であることを特徴とする請求項8記載の変換符号 化装置。

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal (音声信号) for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates .
JP2002118517A
CLAIM 14
【請求項14】 上記入力信号は音声信号 (sound signal) 及び/又は音 響信号であることを特徴とする請求項8記載の変換符号 化装置。

US8990073B2
CLAIM 41
. A device as defined in claim 36 , further comprising a calculator of a spectral parameter used for detecting spectral changes and spectral attacks in the sound signal (音声信号) .
JP2002118517A
CLAIM 14
【請求項14】 上記入力信号は音声信号 (sound signal) 及び/又は音 響信号であることを特徴とする請求項8記載の変換符号 化装置。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US7191123B1

Filed: 2000-11-17     Issued: 2007-03-13

Gain-smoothing in wideband speech and audio signal decoder

(Original Assignee) VoiceAge Corp     (Current Assignee) SAINT LAWRENCE COMMUNICATIONS LLC

Bruno Bessette, Redwan Salami, Roch Lefebvre
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (adaptive codebook) using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US7191123B1
CLAIM 6
. A gain-smoothed codevector producing method as claimed in claim 1 , wherein : finding a codevector comprises finding an innovative codevector in an innovative codebook in relation to an index k of said innovative codebook , said index k forming said at least one first wideband signal encoding parameter ;
and calculating a first factor comprises computing a voicing factor rv by means of the following relation : rv =(Ev−Ec)/(Ev+Ec) where : Ev is the energy of a scaled adaptive codevector bvT ;
Ec is the energy of a scaled innovative codevector gck ;
b is a pitch gain computed during encoding of the wideband signal ;
T is a pitch delay computed during encoding of the wideband signal ;
vT is an adaptive codebook (sound signal) vector at pitch delay T ;
g is an innovative codebook gain computed during encoding of the wideband signal ;
k is an index of the innovative codebook computed during encoding of the wideband signal ;
and ck is the innovative codevector of said innovative codebook at index k .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal (adaptive codebook) in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US7191123B1
CLAIM 6
. A gain-smoothed codevector producing method as claimed in claim 1 , wherein : finding a codevector comprises finding an innovative codevector in an innovative codebook in relation to an index k of said innovative codebook , said index k forming said at least one first wideband signal encoding parameter ;
and calculating a first factor comprises computing a voicing factor rv by means of the following relation : rv =(Ev−Ec)/(Ev+Ec) where : Ev is the energy of a scaled adaptive codevector bvT ;
Ec is the energy of a scaled innovative codevector gck ;
b is a pitch gain computed during encoding of the wideband signal ;
T is a pitch delay computed during encoding of the wideband signal ;
vT is an adaptive codebook (sound signal) vector at pitch delay T ;
g is an innovative codebook gain computed during encoding of the wideband signal ;
k is an index of the innovative codebook computed during encoding of the wideband signal ;
and ck is the innovative codevector of said innovative codebook at index k .

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (adaptive codebook) .
US7191123B1
CLAIM 6
. A gain-smoothed codevector producing method as claimed in claim 1 , wherein : finding a codevector comprises finding an innovative codevector in an innovative codebook in relation to an index k of said innovative codebook , said index k forming said at least one first wideband signal encoding parameter ;
and calculating a first factor comprises computing a voicing factor rv by means of the following relation : rv =(Ev−Ec)/(Ev+Ec) where : Ev is the energy of a scaled adaptive codevector bvT ;
Ec is the energy of a scaled innovative codevector gck ;
b is a pitch gain computed during encoding of the wideband signal ;
T is a pitch delay computed during encoding of the wideband signal ;
vT is an adaptive codebook (sound signal) vector at pitch delay T ;
g is an innovative codebook gain computed during encoding of the wideband signal ;
k is an index of the innovative codebook computed during encoding of the wideband signal ;
and ck is the innovative codevector of said innovative codebook at index k .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (adaptive codebook) comprises searching in the correlation map for frequency bins having a magnitude that exceeds a given fixed threshold .
US7191123B1
CLAIM 6
. A gain-smoothed codevector producing method as claimed in claim 1 , wherein : finding a codevector comprises finding an innovative codevector in an innovative codebook in relation to an index k of said innovative codebook , said index k forming said at least one first wideband signal encoding parameter ;
and calculating a first factor comprises computing a voicing factor rv by means of the following relation : rv =(Ev−Ec)/(Ev+Ec) where : Ev is the energy of a scaled adaptive codevector bvT ;
Ec is the energy of a scaled innovative codevector gck ;
b is a pitch gain computed during encoding of the wideband signal ;
T is a pitch delay computed during encoding of the wideband signal ;
vT is an adaptive codebook (sound signal) vector at pitch delay T ;
g is an innovative codebook gain computed during encoding of the wideband signal ;
k is an index of the innovative codebook computed during encoding of the wideband signal ;
and ck is the innovative codevector of said innovative codebook at index k .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (adaptive codebook) comprises comparing the summed long-term correlation map with an adaptive threshold indicative of sound activity in the sound signal .
US7191123B1
CLAIM 6
. A gain-smoothed codevector producing method as claimed in claim 1 , wherein : finding a codevector comprises finding an innovative codevector in an innovative codebook in relation to an index k of said innovative codebook , said index k forming said at least one first wideband signal encoding parameter ;
and calculating a first factor comprises computing a voicing factor rv by means of the following relation : rv =(Ev−Ec)/(Ev+Ec) where : Ev is the energy of a scaled adaptive codevector bvT ;
Ec is the energy of a scaled innovative codevector gck ;
b is a pitch gain computed during encoding of the wideband signal ;
T is a pitch delay computed during encoding of the wideband signal ;
vT is an adaptive codebook (sound signal) vector at pitch delay T ;
g is an innovative codebook gain computed during encoding of the wideband signal ;
k is an index of the innovative codebook computed during encoding of the wideband signal ;
and ck is the innovative codevector of said innovative codebook at index k .

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal (adaptive codebook) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US7191123B1
CLAIM 6
. A gain-smoothed codevector producing method as claimed in claim 1 , wherein : finding a codevector comprises finding an innovative codevector in an innovative codebook in relation to an index k of said innovative codebook , said index k forming said at least one first wideband signal encoding parameter ;
and calculating a first factor comprises computing a voicing factor rv by means of the following relation : rv =(Ev−Ec)/(Ev+Ec) where : Ev is the energy of a scaled adaptive codevector bvT ;
Ec is the energy of a scaled innovative codevector gck ;
b is a pitch gain computed during encoding of the wideband signal ;
T is a pitch delay computed during encoding of the wideband signal ;
vT is an adaptive codebook (sound signal) vector at pitch delay T ;
g is an innovative codebook gain computed during encoding of the wideband signal ;
k is an index of the innovative codebook computed during encoding of the wideband signal ;
and ck is the innovative codevector of said innovative codebook at index k .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound signal (adaptive codebook) is detected .
US7191123B1
CLAIM 6
. A gain-smoothed codevector producing method as claimed in claim 1 , wherein : finding a codevector comprises finding an innovative codevector in an innovative codebook in relation to an index k of said innovative codebook , said index k forming said at least one first wideband signal encoding parameter ;
and calculating a first factor comprises computing a voicing factor rv by means of the following relation : rv =(Ev−Ec)/(Ev+Ec) where : Ev is the energy of a scaled adaptive codevector bvT ;
Ec is the energy of a scaled innovative codevector gck ;
b is a pitch gain computed during encoding of the wideband signal ;
T is a pitch delay computed during encoding of the wideband signal ;
vT is an adaptive codebook (sound signal) vector at pitch delay T ;
g is an innovative codebook gain computed during encoding of the wideband signal ;
k is an index of the innovative codebook computed during encoding of the wideband signal ;
and ck is the innovative codevector of said innovative codebook at index k .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity in the sound signal (adaptive codebook) further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
US7191123B1
CLAIM 6
. A gain-smoothed codevector producing method as claimed in claim 1 , wherein : finding a codevector comprises finding an innovative codevector in an innovative codebook in relation to an index k of said innovative codebook , said index k forming said at least one first wideband signal encoding parameter ;
and calculating a first factor comprises computing a voicing factor rv by means of the following relation : rv =(Ev−Ec)/(Ev+Ec) where : Ev is the energy of a scaled adaptive codevector bvT ;
Ec is the energy of a scaled innovative codevector gck ;
b is a pitch gain computed during encoding of the wideband signal ;
T is a pitch delay computed during encoding of the wideband signal ;
vT is an adaptive codebook (sound signal) vector at pitch delay T ;
g is an innovative codebook gain computed during encoding of the wideband signal ;
k is an index of the innovative codebook computed during encoding of the wideband signal ;
and ck is the innovative codevector of said innovative codebook at index k .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection comprises detecting the sound signal (adaptive codebook) based on a frequency dependent signal-to-noise ratio (SNR) .
US7191123B1
CLAIM 6
. A gain-smoothed codevector producing method as claimed in claim 1 , wherein : finding a codevector comprises finding an innovative codevector in an innovative codebook in relation to an index k of said innovative codebook , said index k forming said at least one first wideband signal encoding parameter ;
and calculating a first factor comprises computing a voicing factor rv by means of the following relation : rv =(Ev−Ec)/(Ev+Ec) where : Ev is the energy of a scaled adaptive codevector bvT ;
Ec is the energy of a scaled innovative codevector gck ;
b is a pitch gain computed during encoding of the wideband signal ;
T is a pitch delay computed during encoding of the wideband signal ;
vT is an adaptive codebook (sound signal) vector at pitch delay T ;
g is an innovative codebook gain computed during encoding of the wideband signal ;
k is an index of the innovative codebook computed during encoding of the wideband signal ;
and ck is the innovative codevector of said innovative codebook at index k .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal (adaptive codebook) further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
US7191123B1
CLAIM 6
. A gain-smoothed codevector producing method as claimed in claim 1 , wherein : finding a codevector comprises finding an innovative codevector in an innovative codebook in relation to an index k of said innovative codebook , said index k forming said at least one first wideband signal encoding parameter ;
and calculating a first factor comprises computing a voicing factor rv by means of the following relation : rv =(Ev−Ec)/(Ev+Ec) where : Ev is the energy of a scaled adaptive codevector bvT ;
Ec is the energy of a scaled innovative codevector gck ;
b is a pitch gain computed during encoding of the wideband signal ;
T is a pitch delay computed during encoding of the wideband signal ;
vT is an adaptive codebook (sound signal) vector at pitch delay T ;
g is an innovative codebook gain computed during encoding of the wideband signal ;
k is an index of the innovative codebook computed during encoding of the wideband signal ;
and ck is the innovative codevector of said innovative codebook at index k .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal (adaptive codebook) and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
US7191123B1
CLAIM 6
. A gain-smoothed codevector producing method as claimed in claim 1 , wherein : finding a codevector comprises finding an innovative codevector in an innovative codebook in relation to an index k of said innovative codebook , said index k forming said at least one first wideband signal encoding parameter ;
and calculating a first factor comprises computing a voicing factor rv by means of the following relation : rv =(Ev−Ec)/(Ev+Ec) where : Ev is the energy of a scaled adaptive codevector bvT ;
Ec is the energy of a scaled innovative codevector gck ;
b is a pitch gain computed during encoding of the wideband signal ;
T is a pitch delay computed during encoding of the wideband signal ;
vT is an adaptive codebook (sound signal) vector at pitch delay T ;
g is an innovative codebook gain computed during encoding of the wideband signal ;
k is an index of the innovative codebook computed during encoding of the wideband signal ;
and ck is the innovative codevector of said innovative codebook at index k .

US8990073B2
CLAIM 18
. A method as defined in claim 14 , comprising classifying the sound signal (adaptive codebook) as one of an inactive sound signal and active sound signal , which comprises determining an inactive sound signal when the average signal-to-noise ratio (SNR av ) is inferior to the calculated threshold .
US7191123B1
CLAIM 6
. A gain-smoothed codevector producing method as claimed in claim 1 , wherein : finding a codevector comprises finding an innovative codevector in an innovative codebook in relation to an index k of said innovative codebook , said index k forming said at least one first wideband signal encoding parameter ;
and calculating a first factor comprises computing a voicing factor rv by means of the following relation : rv =(Ev−Ec)/(Ev+Ec) where : Ev is the energy of a scaled adaptive codevector bvT ;
Ec is the energy of a scaled innovative codevector gck ;
b is a pitch gain computed during encoding of the wideband signal ;
T is a pitch delay computed during encoding of the wideband signal ;
vT is an adaptive codebook (sound signal) vector at pitch delay T ;
g is an innovative codebook gain computed during encoding of the wideband signal ;
k is an index of the innovative codebook computed during encoding of the wideband signal ;
and ck is the innovative codevector of said innovative codebook at index k .

US8990073B2
CLAIM 19
. A method as defined in claim 14 , comprising classifying the sound signal (adaptive codebook) as one of an inactive sound signal and active sound signal , which comprises determining an active sound signal when the average signal-to-noise ratio (SNR av ) is larger than the calculated threshold .
US7191123B1
CLAIM 6
. A gain-smoothed codevector producing method as claimed in claim 1 , wherein : finding a codevector comprises finding an innovative codevector in an innovative codebook in relation to an index k of said innovative codebook , said index k forming said at least one first wideband signal encoding parameter ;
and calculating a first factor comprises computing a voicing factor rv by means of the following relation : rv =(Ev−Ec)/(Ev+Ec) where : Ev is the energy of a scaled adaptive codevector bvT ;
Ec is the energy of a scaled innovative codevector gck ;
b is a pitch gain computed during encoding of the wideband signal ;
T is a pitch delay computed during encoding of the wideband signal ;
vT is an adaptive codebook (sound signal) vector at pitch delay T ;
g is an innovative codebook gain computed during encoding of the wideband signal ;
k is an index of the innovative codebook computed during encoding of the wideband signal ;
and ck is the innovative codevector of said innovative codebook at index k .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal (adaptive codebook) prevents updating of noise energy estimates when a music signal is detected .
US7191123B1
CLAIM 6
. A gain-smoothed codevector producing method as claimed in claim 1 , wherein : finding a codevector comprises finding an innovative codevector in an innovative codebook in relation to an index k of said innovative codebook , said index k forming said at least one first wideband signal encoding parameter ;
and calculating a first factor comprises computing a voicing factor rv by means of the following relation : rv =(Ev−Ec)/(Ev+Ec) where : Ev is the energy of a scaled adaptive codevector bvT ;
Ec is the energy of a scaled innovative codevector gck ;
b is a pitch gain computed during encoding of the wideband signal ;
T is a pitch delay computed during encoding of the wideband signal ;
vT is an adaptive codebook (sound signal) vector at pitch delay T ;
g is an innovative codebook gain computed during encoding of the wideband signal ;
k is an index of the innovative codebook computed during encoding of the wideband signal ;
and ck is the innovative codevector of said innovative codebook at index k .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal and prevent update (base stations) of noise energy estimates on the music signal .
US7191123B1
CLAIM 37
. A cellular communication system for servicing a large geographical area divided into a plurality of cells , comprising : mobile transmitter/receiver units ;
cellular base stations (prevent update) respectively situated in said cells ;
means for controlling communication between the cellular base stations ;
a bidirectional wireless communication sub-system between each mobile unit situated in one cell and the cellular base station of said one cell , said bidirectional wireless communication sub-system comprising in both the mobile unit and the cellular base station (a) a transmitter including a decoder for encoding a wideband signal and means for transmitting the encoded wideband signal , and (b) a receiver including means for receiving a transmitted encoded wideband signal and a decoder for decoding the received encoded wideband signal ;
wherein said decoder comprises means responsive to a set of wideband signal encoding parameters for decoding the received encoded wideband signal , and wherein said wideband signal decoding means comprises a device as recited in claim 21 , for producing a gain-smoothed codevector during decoding of the encoded wideband signal from said set of wideband signal encoding parameters .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal (adaptive codebook) in a current frame and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US7191123B1
CLAIM 6
. A gain-smoothed codevector producing method as claimed in claim 1 , wherein : finding a codevector comprises finding an innovative codevector in an innovative codebook in relation to an index k of said innovative codebook , said index k forming said at least one first wideband signal encoding parameter ;
and calculating a first factor comprises computing a voicing factor rv by means of the following relation : rv =(Ev−Ec)/(Ev+Ec) where : Ev is the energy of a scaled adaptive codevector bvT ;
Ec is the energy of a scaled innovative codevector gck ;
b is a pitch gain computed during encoding of the wideband signal ;
T is a pitch delay computed during encoding of the wideband signal ;
vT is an adaptive codebook (sound signal) vector at pitch delay T ;
g is an innovative codebook gain computed during encoding of the wideband signal ;
k is an index of the innovative codebook computed during encoding of the wideband signal ;
and ck is the innovative codevector of said innovative codebook at index k .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter indicative of an activity of the sound signal (adaptive codebook) .
US7191123B1
CLAIM 6
. A gain-smoothed codevector producing method as claimed in claim 1 , wherein : finding a codevector comprises finding an innovative codevector in an innovative codebook in relation to an index k of said innovative codebook , said index k forming said at least one first wideband signal encoding parameter ;
and calculating a first factor comprises computing a voicing factor rv by means of the following relation : rv =(Ev−Ec)/(Ev+Ec) where : Ev is the energy of a scaled adaptive codevector bvT ;
Ec is the energy of a scaled innovative codevector gck ;
b is a pitch gain computed during encoding of the wideband signal ;
T is a pitch delay computed during encoding of the wideband signal ;
vT is an adaptive codebook (sound signal) vector at pitch delay T ;
g is an innovative codebook gain computed during encoding of the wideband signal ;
k is an index of the innovative codebook computed during encoding of the wideband signal ;
and ck is the innovative codevector of said innovative codebook at index k .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal (adaptive codebook) and the complementary non-stationarity parameter .
US7191123B1
CLAIM 6
. A gain-smoothed codevector producing method as claimed in claim 1 , wherein : finding a codevector comprises finding an innovative codevector in an innovative codebook in relation to an index k of said innovative codebook , said index k forming said at least one first wideband signal encoding parameter ;
and calculating a first factor comprises computing a voicing factor rv by means of the following relation : rv =(Ev−Ec)/(Ev+Ec) where : Ev is the energy of a scaled adaptive codevector bvT ;
Ec is the energy of a scaled innovative codevector gck ;
b is a pitch gain computed during encoding of the wideband signal ;
T is a pitch delay computed during encoding of the wideband signal ;
vT is an adaptive codebook (sound signal) vector at pitch delay T ;
g is an innovative codebook gain computed during encoding of the wideband signal ;
k is an index of the innovative codebook computed during encoding of the wideband signal ;
and ck is the innovative codevector of said innovative codebook at index k .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal (adaptive codebook) using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US7191123B1
CLAIM 6
. A gain-smoothed codevector producing method as claimed in claim 1 , wherein : finding a codevector comprises finding an innovative codevector in an innovative codebook in relation to an index k of said innovative codebook , said index k forming said at least one first wideband signal encoding parameter ;
and calculating a first factor comprises computing a voicing factor rv by means of the following relation : rv =(Ev−Ec)/(Ev+Ec) where : Ev is the energy of a scaled adaptive codevector bvT ;
Ec is the energy of a scaled innovative codevector gck ;
b is a pitch gain computed during encoding of the wideband signal ;
T is a pitch delay computed during encoding of the wideband signal ;
vT is an adaptive codebook (sound signal) vector at pitch delay T ;
g is an innovative codebook gain computed during encoding of the wideband signal ;
k is an index of the innovative codebook computed during encoding of the wideband signal ;
and ck is the innovative codevector of said innovative codebook at index k .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal (adaptive codebook) using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US7191123B1
CLAIM 6
. A gain-smoothed codevector producing method as claimed in claim 1 , wherein : finding a codevector comprises finding an innovative codevector in an innovative codebook in relation to an index k of said innovative codebook , said index k forming said at least one first wideband signal encoding parameter ;
and calculating a first factor comprises computing a voicing factor rv by means of the following relation : rv =(Ev−Ec)/(Ev+Ec) where : Ev is the energy of a scaled adaptive codevector bvT ;
Ec is the energy of a scaled innovative codevector gck ;
b is a pitch gain computed during encoding of the wideband signal ;
T is a pitch delay computed during encoding of the wideband signal ;
vT is an adaptive codebook (sound signal) vector at pitch delay T ;
g is an innovative codebook gain computed during encoding of the wideband signal ;
k is an index of the innovative codebook computed during encoding of the wideband signal ;
and ck is the innovative codevector of said innovative codebook at index k .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal (adaptive codebook) in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US7191123B1
CLAIM 6
. A gain-smoothed codevector producing method as claimed in claim 1 , wherein : finding a codevector comprises finding an innovative codevector in an innovative codebook in relation to an index k of said innovative codebook , said index k forming said at least one first wideband signal encoding parameter ;
and calculating a first factor comprises computing a voicing factor rv by means of the following relation : rv =(Ev−Ec)/(Ev+Ec) where : Ev is the energy of a scaled adaptive codevector bvT ;
Ec is the energy of a scaled innovative codevector gck ;
b is a pitch gain computed during encoding of the wideband signal ;
T is a pitch delay computed during encoding of the wideband signal ;
vT is an adaptive codebook (sound signal) vector at pitch delay T ;
g is an innovative codebook gain computed during encoding of the wideband signal ;
k is an index of the innovative codebook computed during encoding of the wideband signal ;
and ck is the innovative codevector of said innovative codebook at index k .

US8990073B2
CLAIM 34
. A device as defined in claim 31 , further comprising a detector of strong tones in the sound signal (adaptive codebook) .
US7191123B1
CLAIM 6
. A gain-smoothed codevector producing method as claimed in claim 1 , wherein : finding a codevector comprises finding an innovative codevector in an innovative codebook in relation to an index k of said innovative codebook , said index k forming said at least one first wideband signal encoding parameter ;
and calculating a first factor comprises computing a voicing factor rv by means of the following relation : rv =(Ev−Ec)/(Ev+Ec) where : Ev is the energy of a scaled adaptive codevector bvT ;
Ec is the energy of a scaled innovative codevector gck ;
b is a pitch gain computed during encoding of the wideband signal ;
T is a pitch delay computed during encoding of the wideband signal ;
vT is an adaptive codebook (sound signal) vector at pitch delay T ;
g is an innovative codebook gain computed during encoding of the wideband signal ;
k is an index of the innovative codebook computed during encoding of the wideband signal ;
and ck is the innovative codevector of said innovative codebook at index k .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal (adaptive codebook) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US7191123B1
CLAIM 6
. A gain-smoothed codevector producing method as claimed in claim 1 , wherein : finding a codevector comprises finding an innovative codevector in an innovative codebook in relation to an index k of said innovative codebook , said index k forming said at least one first wideband signal encoding parameter ;
and calculating a first factor comprises computing a voicing factor rv by means of the following relation : rv =(Ev−Ec)/(Ev+Ec) where : Ev is the energy of a scaled adaptive codevector bvT ;
Ec is the energy of a scaled innovative codevector gck ;
b is a pitch gain computed during encoding of the wideband signal ;
T is a pitch delay computed during encoding of the wideband signal ;
vT is an adaptive codebook (sound signal) vector at pitch delay T ;
g is an innovative codebook gain computed during encoding of the wideband signal ;
k is an index of the innovative codebook computed during encoding of the wideband signal ;
and ck is the innovative codevector of said innovative codebook at index k .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal (adaptive codebook) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US7191123B1
CLAIM 6
. A gain-smoothed codevector producing method as claimed in claim 1 , wherein : finding a codevector comprises finding an innovative codevector in an innovative codebook in relation to an index k of said innovative codebook , said index k forming said at least one first wideband signal encoding parameter ;
and calculating a first factor comprises computing a voicing factor rv by means of the following relation : rv =(Ev−Ec)/(Ev+Ec) where : Ev is the energy of a scaled adaptive codevector bvT ;
Ec is the energy of a scaled innovative codevector gck ;
b is a pitch gain computed during encoding of the wideband signal ;
T is a pitch delay computed during encoding of the wideband signal ;
vT is an adaptive codebook (sound signal) vector at pitch delay T ;
g is an innovative codebook gain computed during encoding of the wideband signal ;
k is an index of the innovative codebook computed during encoding of the wideband signal ;
and ck is the innovative codevector of said innovative codebook at index k .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal (adaptive codebook) for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates .
US7191123B1
CLAIM 6
. A gain-smoothed codevector producing method as claimed in claim 1 , wherein : finding a codevector comprises finding an innovative codevector in an innovative codebook in relation to an index k of said innovative codebook , said index k forming said at least one first wideband signal encoding parameter ;
and calculating a first factor comprises computing a voicing factor rv by means of the following relation : rv =(Ev−Ec)/(Ev+Ec) where : Ev is the energy of a scaled adaptive codevector bvT ;
Ec is the energy of a scaled innovative codevector gck ;
b is a pitch gain computed during encoding of the wideband signal ;
T is a pitch delay computed during encoding of the wideband signal ;
vT is an adaptive codebook (sound signal) vector at pitch delay T ;
g is an innovative codebook gain computed during encoding of the wideband signal ;
k is an index of the innovative codebook computed during encoding of the wideband signal ;
and ck is the innovative codevector of said innovative codebook at index k .

US8990073B2
CLAIM 41
. A device as defined in claim 36 , further comprising a calculator of a spectral parameter used for detecting spectral changes and spectral attacks in the sound signal (adaptive codebook) .
US7191123B1
CLAIM 6
. A gain-smoothed codevector producing method as claimed in claim 1 , wherein : finding a codevector comprises finding an innovative codevector in an innovative codebook in relation to an index k of said innovative codebook , said index k forming said at least one first wideband signal encoding parameter ;
and calculating a first factor comprises computing a voicing factor rv by means of the following relation : rv =(Ev−Ec)/(Ev+Ec) where : Ev is the energy of a scaled adaptive codevector bvT ;
Ec is the energy of a scaled innovative codevector gck ;
b is a pitch gain computed during encoding of the wideband signal ;
T is a pitch delay computed during encoding of the wideband signal ;
vT is an adaptive codebook (sound signal) vector at pitch delay T ;
g is an innovative codebook gain computed during encoding of the wideband signal ;
k is an index of the innovative codebook computed during encoding of the wideband signal ;
and ck is the innovative codevector of said innovative codebook at index k .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
EP1216474A1

Filed: 2000-09-29     Issued: 2002-06-26

Efficient spectral envelope coding using variable time/frequency resolution and time/frequency switching

(Original Assignee) Coding Technologies Sweden AB     (Current Assignee) Dolby International AB

Per Ekstrand, Fredrik Henn, Kristofer KJÖRLING, Lars Gustaf Liljeryd
US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (said time) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
EP1216474A1
CLAIM 3
. A method according to claim 2 , characterised in that said time (noise character parameter) /frequency representation is generated by a filterbank .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (said time) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values (given number) so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
EP1216474A1
CLAIM 3
. A method according to claim 2 , characterised in that said time (noise character parameter) /frequency representation is generated by a filterbank .

EP1216474A1
CLAIM 13
. A method according to claim 12 , characterised in that the direction which generates the least coding error for a given number (second energy values) of bits is chosen .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (said time) inferior than a given fixed threshold .
EP1216474A1
CLAIM 3
. A method according to claim 2 , characterised in that said time (noise character parameter) /frequency representation is generated by a filterbank .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
EP1093113A2

Filed: 2000-09-27     Issued: 2001-04-18

Method and apparatus for dynamic segmentation of a low bit rate digital voice message

(Original Assignee) Motorola Solutions Inc     (Current Assignee) Motorola Solutions Inc

Kenneth Finlon, Jian-Cheng Huang, Sunil Satyamurti, Floyd Simpson
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (speech encoder) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
EP1093113A2
CLAIM 1
A system controller comprises a speech encoder (frequency spectrum, noise estimator) that dynamically segments frames of a low bit rate digital voice message , wherein speech model parameters have been generated in a sequence of frames , the speech model parameters including quantized speech spectral parameter vectors ;
and wherein the speech encoder comprises a central processor coupled to a memory that controls the central processor to : a) select a first quantized speech spectral parameter vector as a current anchor vector ;
b) select a second quantized speech spectral parameter vector located a predetermined number of frames (L MAX) from the current anchor vector as a target speech parameter vector ;
and c) perturb the target speech parameter vector to derive a plurality (K) of perturbed speech parameter vectors .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (speech encoder) of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
EP1093113A2
CLAIM 1
A system controller comprises a speech encoder (frequency spectrum, noise estimator) that dynamically segments frames of a low bit rate digital voice message , wherein speech model parameters have been generated in a sequence of frames , the speech model parameters including quantized speech spectral parameter vectors ;
and wherein the speech encoder comprises a central processor coupled to a memory that controls the central processor to : a) select a first quantized speech spectral parameter vector as a current anchor vector ;
b) select a second quantized speech spectral parameter vector located a predetermined number of frames (L MAX) from the current anchor vector as a target speech parameter vector ;
and c) perturb the target speech parameter vector to derive a plurality (K) of perturbed speech parameter vectors .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (previous frame) indicative of an activity of the sound signal .
EP1093113A2
CLAIM 6
A method used in a speech encoder for dynamically segmenting frames of a low bit rate digital voice message , wherein speech model parameters have been generated in a sequence of frames , the speech model parameters including quantized speech spectral parameter vectors , said method comprising the steps of : a) selecting a first quantized speech spectral parameter vector as a current anchor vector ;
b) selecting a second quantized speech spectral parameter vector located a predetermined number of frames (L MAX) from the current anchor vector as a target speech parameter vector ;
c) perturbing the target speech parameter vector to derive a plurality , K , of perturbed speech parameter vectors ;
d) quantizing each of the K perturbed speech parameter vectors ;
e) interpolating between the current anchor vector and each of the plurality (K) of quantized perturbed speech parameter vectors to derive K sets of interpolated speech parameter vectors ;
f) comparing each set of the K sets of interpolated speech parameter vectors to corresponding sampled speech parameter vectors to derive K distances ;
g) testing the K distances against a predetermined distance , and h) selecting one of K quantized perturbed speech parameter vectors from a previous frame (activity prediction parameter) as a best perturbed speech parameter vector when none of the plurality of distances is less than the predetermined distance .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (previous frame) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
EP1093113A2
CLAIM 6
A method used in a speech encoder for dynamically segmenting frames of a low bit rate digital voice message , wherein speech model parameters have been generated in a sequence of frames , the speech model parameters including quantized speech spectral parameter vectors , said method comprising the steps of : a) selecting a first quantized speech spectral parameter vector as a current anchor vector ;
b) selecting a second quantized speech spectral parameter vector located a predetermined number of frames (L MAX) from the current anchor vector as a target speech parameter vector ;
c) perturbing the target speech parameter vector to derive a plurality , K , of perturbed speech parameter vectors ;
d) quantizing each of the K perturbed speech parameter vectors ;
e) interpolating between the current anchor vector and each of the plurality (K) of quantized perturbed speech parameter vectors to derive K sets of interpolated speech parameter vectors ;
f) comparing each set of the K sets of interpolated speech parameter vectors to corresponding sampled speech parameter vectors to derive K distances ;
g) testing the K distances against a predetermined distance , and h) selecting one of K quantized perturbed speech parameter vectors from a previous frame (activity prediction parameter) as a best perturbed speech parameter vector when none of the plurality of distances is less than the predetermined distance .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (previous frame) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
EP1093113A2
CLAIM 6
A method used in a speech encoder for dynamically segmenting frames of a low bit rate digital voice message , wherein speech model parameters have been generated in a sequence of frames , the speech model parameters including quantized speech spectral parameter vectors , said method comprising the steps of : a) selecting a first quantized speech spectral parameter vector as a current anchor vector ;
b) selecting a second quantized speech spectral parameter vector located a predetermined number of frames (L MAX) from the current anchor vector as a target speech parameter vector ;
c) perturbing the target speech parameter vector to derive a plurality , K , of perturbed speech parameter vectors ;
d) quantizing each of the K perturbed speech parameter vectors ;
e) interpolating between the current anchor vector and each of the plurality (K) of quantized perturbed speech parameter vectors to derive K sets of interpolated speech parameter vectors ;
f) comparing each set of the K sets of interpolated speech parameter vectors to corresponding sampled speech parameter vectors to derive K distances ;
g) testing the K distances against a predetermined distance , and h) selecting one of K quantized perturbed speech parameter vectors from a previous frame (activity prediction parameter) as a best perturbed speech parameter vector when none of the plurality of distances is less than the predetermined distance .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value (repeating step) for the first group of frequency bands and a second energy value (repeating step) of the second group of frequency bands ;

calculating a ratio between the first and second energy values (repeating step) so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
EP1093113A2
CLAIM 7
The method according to claim 1 , further comprising : i) selecting a second quantized speech parameter vector located one frame farther from the current anchor vector as the target speech parameter vector when at least one of K distances is less than the predetermined distance ;
j) repeating step (first energy value, second energy value, second energy values) s c) through h) ;
k) selecting one of a plurality of quantized perturbed speech parameter vectors from a previous frame as a best perturbed speech parameter vector when none of the K distances is less than the predetermined distance ;
and l) repeating steps i) through l) when at least one of the K distances is less than the predetermined distance .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (speech encoder) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
EP1093113A2
CLAIM 1
A system controller comprises a speech encoder (frequency spectrum, noise estimator) that dynamically segments frames of a low bit rate digital voice message , wherein speech model parameters have been generated in a sequence of frames , the speech model parameters including quantized speech spectral parameter vectors ;
and wherein the speech encoder comprises a central processor coupled to a memory that controls the central processor to : a) select a first quantized speech spectral parameter vector as a current anchor vector ;
b) select a second quantized speech spectral parameter vector located a predetermined number of frames (L MAX) from the current anchor vector as a target speech parameter vector ;
and c) perturb the target speech parameter vector to derive a plurality (K) of perturbed speech parameter vectors .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (speech encoder) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
EP1093113A2
CLAIM 1
A system controller comprises a speech encoder (frequency spectrum, noise estimator) that dynamically segments frames of a low bit rate digital voice message , wherein speech model parameters have been generated in a sequence of frames , the speech model parameters including quantized speech spectral parameter vectors ;
and wherein the speech encoder comprises a central processor coupled to a memory that controls the central processor to : a) select a first quantized speech spectral parameter vector as a current anchor vector ;
b) select a second quantized speech spectral parameter vector located a predetermined number of frames (L MAX) from the current anchor vector as a target speech parameter vector ;
and c) perturb the target speech parameter vector to derive a plurality (K) of perturbed speech parameter vectors .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (speech encoder) of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
EP1093113A2
CLAIM 1
A system controller comprises a speech encoder (frequency spectrum, noise estimator) that dynamically segments frames of a low bit rate digital voice message , wherein speech model parameters have been generated in a sequence of frames , the speech model parameters including quantized speech spectral parameter vectors ;
and wherein the speech encoder comprises a central processor coupled to a memory that controls the central processor to : a) select a first quantized speech spectral parameter vector as a current anchor vector ;
b) select a second quantized speech spectral parameter vector located a predetermined number of frames (L MAX) from the current anchor vector as a target speech parameter vector ;
and c) perturb the target speech parameter vector to derive a plurality (K) of perturbed speech parameter vectors .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator (speech encoder) for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector .
EP1093113A2
CLAIM 1
A system controller comprises a speech encoder (frequency spectrum, noise estimator) that dynamically segments frames of a low bit rate digital voice message , wherein speech model parameters have been generated in a sequence of frames , the speech model parameters including quantized speech spectral parameter vectors ;
and wherein the speech encoder comprises a central processor coupled to a memory that controls the central processor to : a) select a first quantized speech spectral parameter vector as a current anchor vector ;
b) select a second quantized speech spectral parameter vector located a predetermined number of frames (L MAX) from the current anchor vector as a target speech parameter vector ;
and c) perturb the target speech parameter vector to derive a plurality (K) of perturbed speech parameter vectors .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US7139700B1

Filed: 2000-09-22     Issued: 2006-11-21

Hybrid speech coding and system

(Original Assignee) Texas Instruments Inc     (Current Assignee) Texas Instruments Inc

Jacek Stachurski, Alan V. McCree
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (speech encoder) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US7139700B1
CLAIM 1
. A hybrid speech encoder (frequency spectrum, noise estimator) , comprising : (a) a linear prediction , pitch and , voicing analyzer ;
(b) a parametric encoder coupled to said analyzer ;
and (c) a waveform encoder coupled to said analyzer ;
(d) wherein said parametric encoder encodes strongly-voiced frames and said waveform encoder encodes both unvoiced and weakly-voiced frames including a pitch-prediction filter for weakly-voiced frames .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (speech encoder) of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US7139700B1
CLAIM 1
. A hybrid speech encoder (frequency spectrum, noise estimator) , comprising : (a) a linear prediction , pitch and , voicing analyzer ;
(b) a parametric encoder coupled to said analyzer ;
and (c) a waveform encoder coupled to said analyzer ;
(d) wherein said parametric encoder encodes strongly-voiced frames and said waveform encoder encodes both unvoiced and weakly-voiced frames including a pitch-prediction filter for weakly-voiced frames .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction (linear prediction) residual error energies .
US7139700B1
CLAIM 1
. A hybrid speech encoder , comprising : (a) a linear prediction (linear prediction, residual error) , pitch and , voicing analyzer ;
(b) a parametric encoder coupled to said analyzer ;
and (c) a waveform encoder coupled to said analyzer ;
(d) wherein said parametric encoder encodes strongly-voiced frames and said waveform encoder encodes both unvoiced and weakly-voiced frames including a pitch-prediction filter for weakly-voiced frames .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (speech encoder) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US7139700B1
CLAIM 1
. A hybrid speech encoder (frequency spectrum, noise estimator) , comprising : (a) a linear prediction , pitch and , voicing analyzer ;
(b) a parametric encoder coupled to said analyzer ;
and (c) a waveform encoder coupled to said analyzer ;
(d) wherein said parametric encoder encodes strongly-voiced frames and said waveform encoder encodes both unvoiced and weakly-voiced frames including a pitch-prediction filter for weakly-voiced frames .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (speech encoder) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US7139700B1
CLAIM 1
. A hybrid speech encoder (frequency spectrum, noise estimator) , comprising : (a) a linear prediction , pitch and , voicing analyzer ;
(b) a parametric encoder coupled to said analyzer ;
and (c) a waveform encoder coupled to said analyzer ;
(d) wherein said parametric encoder encodes strongly-voiced frames and said waveform encoder encodes both unvoiced and weakly-voiced frames including a pitch-prediction filter for weakly-voiced frames .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (speech encoder) of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US7139700B1
CLAIM 1
. A hybrid speech encoder (frequency spectrum, noise estimator) , comprising : (a) a linear prediction , pitch and , voicing analyzer ;
(b) a parametric encoder coupled to said analyzer ;
and (c) a waveform encoder coupled to said analyzer ;
(d) wherein said parametric encoder encodes strongly-voiced frames and said waveform encoder encodes both unvoiced and weakly-voiced frames including a pitch-prediction filter for weakly-voiced frames .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator (speech encoder) for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector .
US7139700B1
CLAIM 1
. A hybrid speech encoder (frequency spectrum, noise estimator) , comprising : (a) a linear prediction , pitch and , voicing analyzer ;
(b) a parametric encoder coupled to said analyzer ;
and (c) a waveform encoder coupled to said analyzer ;
(d) wherein said parametric encoder encodes strongly-voiced frames and said waveform encoder encodes both unvoiced and weakly-voiced frames including a pitch-prediction filter for weakly-voiced frames .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6355869B1

Filed: 2000-08-21     Issued: 2002-03-12

Method and system for creating musical scores from musical recordings

(Original Assignee) Duane Mitton     

Duane Mitton
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (sample rate) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US6355869B1
CLAIM 8
. The method of claim 1 , wherein the musical recording includes a sample rate (first frequency, frequency spectrum, frequency bins) of one of 11025 samples/second , 22050 samples/second and 44100 samples/second .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (sample rate) of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US6355869B1
CLAIM 8
. The method of claim 1 , wherein the musical recording includes a sample rate (first frequency, frequency spectrum, frequency bins) of one of 11025 samples/second , 22050 samples/second and 44100 samples/second .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (sample rate) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US6355869B1
CLAIM 8
. The method of claim 1 , wherein the musical recording includes a sample rate (first frequency, frequency spectrum, frequency bins) of one of 11025 samples/second , 22050 samples/second and 44100 samples/second .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (sample rate) so as to produce a summed long-term correlation map .
US6355869B1
CLAIM 8
. The method of claim 1 , wherein the musical recording includes a sample rate (first frequency, frequency spectrum, frequency bins) of one of 11025 samples/second , 22050 samples/second and 44100 samples/second .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map for frequency bins (sample rate) having a magnitude that exceeds a given fixed threshold .
US6355869B1
CLAIM 8
. The method of claim 1 , wherein the musical recording includes a sample rate (first frequency, frequency spectrum, frequency bins) of one of 11025 samples/second , 22050 samples/second and 44100 samples/second .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency (sample rate) bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values (four points) so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US6355869B1
CLAIM 8
. The method of claim 1 , wherein the musical recording includes a sample rate (first frequency, frequency spectrum, frequency bins) of one of 11025 samples/second , 22050 samples/second and 44100 samples/second .

US6355869B1
CLAIM 13
. The method of claim 12 , wherein the interpolating is carried out between one of two , three and four points (second energy values) .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (sample rate) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US6355869B1
CLAIM 8
. The method of claim 1 , wherein the musical recording includes a sample rate (first frequency, frequency spectrum, frequency bins) of one of 11025 samples/second , 22050 samples/second and 44100 samples/second .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (sample rate) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US6355869B1
CLAIM 8
. The method of claim 1 , wherein the musical recording includes a sample rate (first frequency, frequency spectrum, frequency bins) of one of 11025 samples/second , 22050 samples/second and 44100 samples/second .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (sample rate) of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US6355869B1
CLAIM 8
. The method of claim 1 , wherein the musical recording includes a sample rate (first frequency, frequency spectrum, frequency bins) of one of 11025 samples/second , 22050 samples/second and 44100 samples/second .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (sample rate) so as to produce a summed long-term correlation map .
US6355869B1
CLAIM 8
. The method of claim 1 , wherein the musical recording includes a sample rate (first frequency, frequency spectrum, frequency bins) of one of 11025 samples/second , 22050 samples/second and 44100 samples/second .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6691082B1

Filed: 2000-08-02     Issued: 2004-02-10

Method and system for sub-band hybrid coding

(Original Assignee) Nokia of America Corp     (Current Assignee) Nokia of America Corp

Joseph Gerard Aguilar, Juin-Hwey Chen, Vipul Parikh, Xiaoqin Sun
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (adaptive codebook) using a frequency spectrum (band signals) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long term correlation map .
US6691082B1
CLAIM 1
. A system for processing an input signal , the system comprising : means for separating the input signal into at least two sub-band signals (first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) ;
first means for encoding one of said at least two sub-band signals using a first encoding algorithm to produce at least one encoded output signal , said first means for encoding further comprising means for detecting a gain mismatch between said at least two sub-band signals ;
and means for adjusting said gain mismatch detected by said detecting means ;
and second means for encoding another of said at least two sub-band signals using a second encoding algorithm to produce at least one other encoded output signal , where said first encoding algorithm is different from said second encoding algorithm .

US6691082B1
CLAIM 23
. The hybrid encoder of claim 22 , wherein the means for encoding said second signal comprises : means for high-pass filtering and buffering an input signal comprised of a plurality of consecutive frames to derive a preprocessed signal , ps(m) ;
means for analyzing a current frame (current frame) and at least one previously received frame from among said plurality of frames to derive a pitch period estimate ;
means for analyzing said pre-processed signal , ps(m) , and said pitch period estimate to estimate a voicing cutoff frequency and to derive an all-pole model of the frequency response of the current speech frame dependent on said pitch period estimate , said voicing cutoff frequency , and ps(m) ;
means for outputting a line spectral frequency (LSF) representation of the all-pole model and a frame gain of the current frame ;
and means for quantizing said LSF representation , said voicing cutoff frequency , and said frame gain to derive a quantized LSF representation , a quantized voicing cutoff frequency , and a quantized frame gain .

US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook (sound signal) vector , v(n) , based on a previously quantized excitation signal , u(n) .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (band signals) of the sound signal (adaptive codebook) in the current frame (current frame) ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US6691082B1
CLAIM 1
. A system for processing an input signal , the system comprising : means for separating the input signal into at least two sub-band signals (first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) ;
first means for encoding one of said at least two sub-band signals using a first encoding algorithm to produce at least one encoded output signal , said first means for encoding further comprising means for detecting a gain mismatch between said at least two sub-band signals ;
and means for adjusting said gain mismatch detected by said detecting means ;
and second means for encoding another of said at least two sub-band signals using a second encoding algorithm to produce at least one other encoded output signal , where said first encoding algorithm is different from said second encoding algorithm .

US6691082B1
CLAIM 23
. The hybrid encoder of claim 22 , wherein the means for encoding said second signal comprises : means for high-pass filtering and buffering an input signal comprised of a plurality of consecutive frames to derive a preprocessed signal , ps(m) ;
means for analyzing a current frame (current frame) and at least one previously received frame from among said plurality of frames to derive a pitch period estimate ;
means for analyzing said pre-processed signal , ps(m) , and said pitch period estimate to estimate a voicing cutoff frequency and to derive an all-pole model of the frequency response of the current speech frame dependent on said pitch period estimate , said voicing cutoff frequency , and ps(m) ;
means for outputting a line spectral frequency (LSF) representation of the all-pole model and a frame gain of the current frame ;
and means for quantizing said LSF representation , said voicing cutoff frequency , and said frame gain to derive a quantized LSF representation , a quantized voicing cutoff frequency , and a quantized frame gain .

US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook (sound signal) vector , v(n) , based on a previously quantized excitation signal , u(n) .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (band signals) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US6691082B1
CLAIM 1
. A system for processing an input signal , the system comprising : means for separating the input signal into at least two sub-band signals (first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) ;
first means for encoding one of said at least two sub-band signals using a first encoding algorithm to produce at least one encoded output signal , said first means for encoding further comprising means for detecting a gain mismatch between said at least two sub-band signals ;
and means for adjusting said gain mismatch detected by said detecting means ;
and second means for encoding another of said at least two sub-band signals using a second encoding algorithm to produce at least one other encoded output signal , where said first encoding algorithm is different from said second encoding algorithm .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin (band signals) by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (band signals) so as to produce a summed long-term correlation map .
US6691082B1
CLAIM 1
. A system for processing an input signal , the system comprising : means for separating the input signal into at least two sub-band signals (first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) ;
first means for encoding one of said at least two sub-band signals using a first encoding algorithm to produce at least one encoded output signal , said first means for encoding further comprising means for detecting a gain mismatch between said at least two sub-band signals ;
and means for adjusting said gain mismatch detected by said detecting means ;
and second means for encoding another of said at least two sub-band signals using a second encoding algorithm to produce at least one other encoded output signal , where said first encoding algorithm is different from said second encoding algorithm .

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (adaptive codebook) .
US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook (sound signal) vector , v(n) , based on a previously quantized excitation signal , u(n) .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (adaptive codebook) comprises searching in the correlation map for frequency bins (band signals) having a magnitude that exceeds a given fixed threshold .
US6691082B1
CLAIM 1
. A system for processing an input signal , the system comprising : means for separating the input signal into at least two sub-band signals (first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) ;
first means for encoding one of said at least two sub-band signals using a first encoding algorithm to produce at least one encoded output signal , said first means for encoding further comprising means for detecting a gain mismatch between said at least two sub-band signals ;
and means for adjusting said gain mismatch detected by said detecting means ;
and second means for encoding another of said at least two sub-band signals using a second encoding algorithm to produce at least one other encoded output signal , where said first encoding algorithm is different from said second encoding algorithm .

US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook (sound signal) vector , v(n) , based on a previously quantized excitation signal , u(n) .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (adaptive codebook) comprises comparing the summed long-term correlation map with an adaptive threshold indicative of sound activity (second encoding) in the sound signal .
US6691082B1
CLAIM 1
. A system for processing an input signal , the system comprising : means for separating the input signal into at least two sub-band signals ;
first means for encoding one of said at least two sub-band signals using a first encoding algorithm to produce at least one encoded output signal , said first means for encoding further comprising means for detecting a gain mismatch between said at least two sub-band signals ;
and means for adjusting said gain mismatch detected by said detecting means ;
and second means for encoding another of said at least two sub-band signals using a second encoding (sound activity, detecting sound activity) algorithm to produce at least one other encoded output signal , where said first encoding algorithm is different from said second encoding algorithm .

US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook (sound signal) vector , v(n) , based on a previously quantized excitation signal , u(n) .

US8990073B2
CLAIM 10
. A method for detecting sound activity (second encoding) in a sound signal (adaptive codebook) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US6691082B1
CLAIM 1
. A system for processing an input signal , the system comprising : means for separating the input signal into at least two sub-band signals ;
first means for encoding one of said at least two sub-band signals using a first encoding algorithm to produce at least one encoded output signal , said first means for encoding further comprising means for detecting a gain mismatch between said at least two sub-band signals ;
and means for adjusting said gain mismatch detected by said detecting means ;
and second means for encoding another of said at least two sub-band signals using a second encoding (sound activity, detecting sound activity) algorithm to produce at least one other encoded output signal , where said first encoding algorithm is different from said second encoding algorithm .

US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook (sound signal) vector , v(n) , based on a previously quantized excitation signal , u(n) .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound signal (adaptive codebook) is detected .
US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook (sound signal) vector , v(n) , based on a previously quantized excitation signal , u(n) .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity (second encoding) in the sound signal (adaptive codebook) further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
US6691082B1
CLAIM 1
. A system for processing an input signal , the system comprising : means for separating the input signal into at least two sub-band signals ;
first means for encoding one of said at least two sub-band signals using a first encoding algorithm to produce at least one encoded output signal , said first means for encoding further comprising means for detecting a gain mismatch between said at least two sub-band signals ;
and means for adjusting said gain mismatch detected by said detecting means ;
and second means for encoding another of said at least two sub-band signals using a second encoding (sound activity, detecting sound activity) algorithm to produce at least one other encoded output signal , where said first encoding algorithm is different from said second encoding algorithm .

US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook (sound signal) vector , v(n) , based on a previously quantized excitation signal , u(n) .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (second encoding) detection comprises detecting the sound signal (adaptive codebook) based on a frequency dependent signal-to-noise ratio (SNR) .
US6691082B1
CLAIM 1
. A system for processing an input signal , the system comprising : means for separating the input signal into at least two sub-band signals ;
first means for encoding one of said at least two sub-band signals using a first encoding algorithm to produce at least one encoded output signal , said first means for encoding further comprising means for detecting a gain mismatch between said at least two sub-band signals ;
and means for adjusting said gain mismatch detected by said detecting means ;
and second means for encoding another of said at least two sub-band signals using a second encoding (sound activity, detecting sound activity) algorithm to produce at least one other encoded output signal , where said first encoding algorithm is different from said second encoding algorithm .

US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook (sound signal) vector , v(n) , based on a previously quantized excitation signal , u(n) .

US8990073B2
CLAIM 14
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (second encoding) detection comprises comparing an average signal-to-noise ratio (SNR av ) to a threshold calculated as a function of a long-term signal-to-noise ratio (SNR LT ) .
US6691082B1
CLAIM 1
. A system for processing an input signal , the system comprising : means for separating the input signal into at least two sub-band signals ;
first means for encoding one of said at least two sub-band signals using a first encoding algorithm to produce at least one encoded output signal , said first means for encoding further comprising means for detecting a gain mismatch between said at least two sub-band signals ;
and means for adjusting said gain mismatch detected by said detecting means ;
and second means for encoding another of said at least two sub-band signals using a second encoding (sound activity, detecting sound activity) algorithm to produce at least one other encoded output signal , where said first encoding algorithm is different from said second encoding algorithm .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity (second encoding) detection in the sound signal (adaptive codebook) further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
US6691082B1
CLAIM 1
. A system for processing an input signal , the system comprising : means for separating the input signal into at least two sub-band signals ;
first means for encoding one of said at least two sub-band signals using a first encoding algorithm to produce at least one encoded output signal , said first means for encoding further comprising means for detecting a gain mismatch between said at least two sub-band signals ;
and means for adjusting said gain mismatch detected by said detecting means ;
and second means for encoding another of said at least two sub-band signals using a second encoding (sound activity, detecting sound activity) algorithm to produce at least one other encoded output signal , where said first encoding algorithm is different from said second encoding algorithm .

US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook (sound signal) vector , v(n) , based on a previously quantized excitation signal , u(n) .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity (second encoding) detection further comprises updating the noise estimates for a next frame .
US6691082B1
CLAIM 1
. A system for processing an input signal , the system comprising : means for separating the input signal into at least two sub-band signals ;
first means for encoding one of said at least two sub-band signals using a first encoding algorithm to produce at least one encoded output signal , said first means for encoding further comprising means for detecting a gain mismatch between said at least two sub-band signals ;
and means for adjusting said gain mismatch detected by said detecting means ;
and second means for encoding another of said at least two sub-band signals using a second encoding (sound activity, detecting sound activity) algorithm to produce at least one other encoded output signal , where said first encoding algorithm is different from said second encoding algorithm .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal (adaptive codebook) and a ratio between a second order and a sixteenth order of linear prediction residual error energies (time scale) .
US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook (sound signal) vector , v(n) , based on a previously quantized excitation signal , u(n) .

US6691082B1
CLAIM 30
. The hybrid encoder of claim 29 , wherein the means for generating said adaptive codebook vector , v(n) , comprises : means for determining a last pitch period cycle of said quantized excitation signal , u(n) ;
means for stretching/compressing the time scale (linear prediction residual error energies) of the last pitch period cycle of said previously quantized excitation signal , u(n) ;
and means for copying said stretched/compressed last pitch period cycle in a current subframe according to said pitch period contour array , ip(i) .

US8990073B2
CLAIM 18
. A method as defined in claim 14 , comprising classifying the sound signal (adaptive codebook) as one of an inactive sound signal and active sound signal , which comprises determining an inactive sound signal when the average signal-to-noise ratio (SNR av ) is inferior to the calculated threshold .
US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook (sound signal) vector , v(n) , based on a previously quantized excitation signal , u(n) .

US8990073B2
CLAIM 19
. A method as defined in claim 14 , comprising classifying the sound signal (adaptive codebook) as one of an inactive sound signal and active sound signal , which comprises determining an active sound signal when the average signal-to-noise ratio (SNR av ) is larger than the calculated threshold .
US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook (sound signal) vector , v(n) , based on a previously quantized excitation signal , u(n) .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal (adaptive codebook) prevents updating of noise energy estimates when a music signal is detected .
US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook (sound signal) vector , v(n) , based on a previously quantized excitation signal , u(n) .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (excitation signal, speech signal) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
US6691082B1
CLAIM 22
. A hybrid encoder for encoding audio and speech signal (noise character parameter, activity prediction parameter) s , the hybrid encoder comprising : means for separating an input signal into a first signal and a second signal ;
means for detecting a gain mismatch between said first signal and a second signal ;
means for adjusting for said gain mismatch detected by said detecting means ;
means for processing the first signal to derive a baseband signal ;
means for encoding said baseband signal using a relaxed code excited linear prediction (RCELP) encoder to derive a baseband RCELP encoded signal ;
means for encoding the second signal using a harmonic encoder to derive a harmonic encoded signal ;
and means for combining said baseband RCELP encoded signal with said harmonic encoded signal to form a combined hybrid encoded signal .

US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook vector , v(n) , based on a previously quantized excitation signal (noise character parameter, activity prediction parameter) , u(n) .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame (current frame) energy and an average frame energy (fixed codebook) .
US6691082B1
CLAIM 23
. The hybrid encoder of claim 22 , wherein the means for encoding said second signal comprises : means for high-pass filtering and buffering an input signal comprised of a plurality of consecutive frames to derive a preprocessed signal , ps(m) ;
means for analyzing a current frame (current frame) and at least one previously received frame from among said plurality of frames to derive a pitch period estimate ;
means for analyzing said pre-processed signal , ps(m) , and said pitch period estimate to estimate a voicing cutoff frequency and to derive an all-pole model of the frequency response of the current speech frame dependent on said pitch period estimate , said voicing cutoff frequency , and ps(m) ;
means for outputting a line spectral frequency (LSF) representation of the all-pole model and a frame gain of the current frame ;
and means for quantizing said LSF representation , said voicing cutoff frequency , and said frame gain to derive a quantized LSF representation , a quantized voicing cutoff frequency , and a quantized frame gain .

US6691082B1
CLAIM 33
. The hybrid encoder of claim 24 , further comprising means for finding an optimal combination of fixed codebook (average frame energy) pulse locations and pulse signs which minimizes the energy of a weighted coding error signal , ew(n) , within a current subframe .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal (adaptive codebook) in a current frame (current frame) and an energy of the sound signal in a previous frame , for frequency bands (band signals) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US6691082B1
CLAIM 1
. A system for processing an input signal , the system comprising : means for separating the input signal into at least two sub-band signals (first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) ;
first means for encoding one of said at least two sub-band signals using a first encoding algorithm to produce at least one encoded output signal , said first means for encoding further comprising means for detecting a gain mismatch between said at least two sub-band signals ;
and means for adjusting said gain mismatch detected by said detecting means ;
and second means for encoding another of said at least two sub-band signals using a second encoding algorithm to produce at least one other encoded output signal , where said first encoding algorithm is different from said second encoding algorithm .

US6691082B1
CLAIM 23
. The hybrid encoder of claim 22 , wherein the means for encoding said second signal comprises : means for high-pass filtering and buffering an input signal comprised of a plurality of consecutive frames to derive a preprocessed signal , ps(m) ;
means for analyzing a current frame (current frame) and at least one previously received frame from among said plurality of frames to derive a pitch period estimate ;
means for analyzing said pre-processed signal , ps(m) , and said pitch period estimate to estimate a voicing cutoff frequency and to derive an all-pole model of the frequency response of the current speech frame dependent on said pitch period estimate , said voicing cutoff frequency , and ps(m) ;
means for outputting a line spectral frequency (LSF) representation of the all-pole model and a frame gain of the current frame ;
and means for quantizing said LSF representation , said voicing cutoff frequency , and said frame gain to derive a quantized LSF representation , a quantized voicing cutoff frequency , and a quantized frame gain .

US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook (sound signal) vector , v(n) , based on a previously quantized excitation signal , u(n) .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (excitation signal, speech signal) indicative of an activity of the sound signal (adaptive codebook) .
US6691082B1
CLAIM 22
. A hybrid encoder for encoding audio and speech signal (noise character parameter, activity prediction parameter) s , the hybrid encoder comprising : means for separating an input signal into a first signal and a second signal ;
means for detecting a gain mismatch between said first signal and a second signal ;
means for adjusting for said gain mismatch detected by said detecting means ;
means for processing the first signal to derive a baseband signal ;
means for encoding said baseband signal using a relaxed code excited linear prediction (RCELP) encoder to derive a baseband RCELP encoded signal ;
means for encoding the second signal using a harmonic encoder to derive a harmonic encoded signal ;
and means for combining said baseband RCELP encoded signal with said harmonic encoded signal to form a combined hybrid encoded signal .

US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook (sound signal) vector , v(n) , based on a previously quantized excitation signal (noise character parameter, activity prediction parameter) , u(n) .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (excitation signal, speech signal) comprises : calculating a long-term value of a binary decision (linear prediction coefficient) obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal (adaptive codebook) and the complementary non-stationarity parameter .
US6691082B1
CLAIM 22
. A hybrid encoder for encoding audio and speech signal (noise character parameter, activity prediction parameter) s , the hybrid encoder comprising : means for separating an input signal into a first signal and a second signal ;
means for detecting a gain mismatch between said first signal and a second signal ;
means for adjusting for said gain mismatch detected by said detecting means ;
means for processing the first signal to derive a baseband signal ;
means for encoding said baseband signal using a relaxed code excited linear prediction (RCELP) encoder to derive a baseband RCELP encoded signal ;
means for encoding the second signal using a harmonic encoder to derive a harmonic encoded signal ;
and means for combining said baseband RCELP encoded signal with said harmonic encoded signal to form a combined hybrid encoded signal .

US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook (sound signal) vector , v(n) , based on a previously quantized excitation signal (noise character parameter, activity prediction parameter) , u(n) .

US6691082B1
CLAIM 36
. The hybrid decoder of claim 35 , wherein the RCELP decoder further comprises means for converting a decoded full-band line spectral frequency (LSF) vector into a baseband linear prediction coefficient (binary decision) (LPC) array .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (excitation signal, speech signal) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US6691082B1
CLAIM 22
. A hybrid encoder for encoding audio and speech signal (noise character parameter, activity prediction parameter) s , the hybrid encoder comprising : means for separating an input signal into a first signal and a second signal ;
means for detecting a gain mismatch between said first signal and a second signal ;
means for adjusting for said gain mismatch detected by said detecting means ;
means for processing the first signal to derive a baseband signal ;
means for encoding said baseband signal using a relaxed code excited linear prediction (RCELP) encoder to derive a baseband RCELP encoded signal ;
means for encoding the second signal using a harmonic encoder to derive a harmonic encoded signal ;
and means for combining said baseband RCELP encoded signal with said harmonic encoded signal to form a combined hybrid encoded signal .

US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook vector , v(n) , based on a previously quantized excitation signal (noise character parameter, activity prediction parameter) , u(n) .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (excitation signal, speech signal) comprises : dividing a plurality of frequency bands (band signals) into a first group (band signals) of a certain number of first frequency (band signals) bands and a second group of a rest of the frequency bands ;

calculating a first energy (band signals) value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US6691082B1
CLAIM 1
. A system for processing an input signal , the system comprising : means for separating the input signal into at least two sub-band signals (first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) ;
first means for encoding one of said at least two sub-band signals using a first encoding algorithm to produce at least one encoded output signal , said first means for encoding further comprising means for detecting a gain mismatch between said at least two sub-band signals ;
and means for adjusting said gain mismatch detected by said detecting means ;
and second means for encoding another of said at least two sub-band signals using a second encoding algorithm to produce at least one other encoded output signal , where said first encoding algorithm is different from said second encoding algorithm .

US6691082B1
CLAIM 22
. A hybrid encoder for encoding audio and speech signal (noise character parameter, activity prediction parameter) s , the hybrid encoder comprising : means for separating an input signal into a first signal and a second signal ;
means for detecting a gain mismatch between said first signal and a second signal ;
means for adjusting for said gain mismatch detected by said detecting means ;
means for processing the first signal to derive a baseband signal ;
means for encoding said baseband signal using a relaxed code excited linear prediction (RCELP) encoder to derive a baseband RCELP encoded signal ;
means for encoding the second signal using a harmonic encoder to derive a harmonic encoded signal ;
and means for combining said baseband RCELP encoded signal with said harmonic encoded signal to form a combined hybrid encoded signal .

US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook vector , v(n) , based on a previously quantized excitation signal (noise character parameter, activity prediction parameter) , u(n) .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (excitation signal, speech signal) inferior than a given fixed threshold .
US6691082B1
CLAIM 22
. A hybrid encoder for encoding audio and speech signal (noise character parameter, activity prediction parameter) s , the hybrid encoder comprising : means for separating an input signal into a first signal and a second signal ;
means for detecting a gain mismatch between said first signal and a second signal ;
means for adjusting for said gain mismatch detected by said detecting means ;
means for processing the first signal to derive a baseband signal ;
means for encoding said baseband signal using a relaxed code excited linear prediction (RCELP) encoder to derive a baseband RCELP encoded signal ;
means for encoding the second signal using a harmonic encoder to derive a harmonic encoded signal ;
and means for combining said baseband RCELP encoded signal with said harmonic encoded signal to form a combined hybrid encoded signal .

US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook vector , v(n) , based on a previously quantized excitation signal (noise character parameter, activity prediction parameter) , u(n) .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal (adaptive codebook) using a frequency spectrum (band signals) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long-term correlation map .
US6691082B1
CLAIM 1
. A system for processing an input signal , the system comprising : means for separating the input signal into at least two sub-band signals (first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) ;
first means for encoding one of said at least two sub-band signals using a first encoding algorithm to produce at least one encoded output signal , said first means for encoding further comprising means for detecting a gain mismatch between said at least two sub-band signals ;
and means for adjusting said gain mismatch detected by said detecting means ;
and second means for encoding another of said at least two sub-band signals using a second encoding algorithm to produce at least one other encoded output signal , where said first encoding algorithm is different from said second encoding algorithm .

US6691082B1
CLAIM 23
. The hybrid encoder of claim 22 , wherein the means for encoding said second signal comprises : means for high-pass filtering and buffering an input signal comprised of a plurality of consecutive frames to derive a preprocessed signal , ps(m) ;
means for analyzing a current frame (current frame) and at least one previously received frame from among said plurality of frames to derive a pitch period estimate ;
means for analyzing said pre-processed signal , ps(m) , and said pitch period estimate to estimate a voicing cutoff frequency and to derive an all-pole model of the frequency response of the current speech frame dependent on said pitch period estimate , said voicing cutoff frequency , and ps(m) ;
means for outputting a line spectral frequency (LSF) representation of the all-pole model and a frame gain of the current frame ;
and means for quantizing said LSF representation , said voicing cutoff frequency , and said frame gain to derive a quantized LSF representation , a quantized voicing cutoff frequency , and a quantized frame gain .

US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook (sound signal) vector , v(n) , based on a previously quantized excitation signal , u(n) .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal (adaptive codebook) using a frequency spectrum (band signals) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long-term correlation map .
US6691082B1
CLAIM 1
. A system for processing an input signal , the system comprising : means for separating the input signal into at least two sub-band signals (first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) ;
first means for encoding one of said at least two sub-band signals using a first encoding algorithm to produce at least one encoded output signal , said first means for encoding further comprising means for detecting a gain mismatch between said at least two sub-band signals ;
and means for adjusting said gain mismatch detected by said detecting means ;
and second means for encoding another of said at least two sub-band signals using a second encoding algorithm to produce at least one other encoded output signal , where said first encoding algorithm is different from said second encoding algorithm .

US6691082B1
CLAIM 23
. The hybrid encoder of claim 22 , wherein the means for encoding said second signal comprises : means for high-pass filtering and buffering an input signal comprised of a plurality of consecutive frames to derive a preprocessed signal , ps(m) ;
means for analyzing a current frame (current frame) and at least one previously received frame from among said plurality of frames to derive a pitch period estimate ;
means for analyzing said pre-processed signal , ps(m) , and said pitch period estimate to estimate a voicing cutoff frequency and to derive an all-pole model of the frequency response of the current speech frame dependent on said pitch period estimate , said voicing cutoff frequency , and ps(m) ;
means for outputting a line spectral frequency (LSF) representation of the all-pole model and a frame gain of the current frame ;
and means for quantizing said LSF representation , said voicing cutoff frequency , and said frame gain to derive a quantized LSF representation , a quantized voicing cutoff frequency , and a quantized frame gain .

US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook (sound signal) vector , v(n) , based on a previously quantized excitation signal , u(n) .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (band signals) of the sound signal (adaptive codebook) in the current frame (current frame) ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US6691082B1
CLAIM 1
. A system for processing an input signal , the system comprising : means for separating the input signal into at least two sub-band signals (first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) ;
first means for encoding one of said at least two sub-band signals using a first encoding algorithm to produce at least one encoded output signal , said first means for encoding further comprising means for detecting a gain mismatch between said at least two sub-band signals ;
and means for adjusting said gain mismatch detected by said detecting means ;
and second means for encoding another of said at least two sub-band signals using a second encoding algorithm to produce at least one other encoded output signal , where said first encoding algorithm is different from said second encoding algorithm .

US6691082B1
CLAIM 23
. The hybrid encoder of claim 22 , wherein the means for encoding said second signal comprises : means for high-pass filtering and buffering an input signal comprised of a plurality of consecutive frames to derive a preprocessed signal , ps(m) ;
means for analyzing a current frame (current frame) and at least one previously received frame from among said plurality of frames to derive a pitch period estimate ;
means for analyzing said pre-processed signal , ps(m) , and said pitch period estimate to estimate a voicing cutoff frequency and to derive an all-pole model of the frequency response of the current speech frame dependent on said pitch period estimate , said voicing cutoff frequency , and ps(m) ;
means for outputting a line spectral frequency (LSF) representation of the all-pole model and a frame gain of the current frame ;
and means for quantizing said LSF representation , said voicing cutoff frequency , and said frame gain to derive a quantized LSF representation , a quantized voicing cutoff frequency , and a quantized frame gain .

US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook (sound signal) vector , v(n) , based on a previously quantized excitation signal , u(n) .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin (band signals) by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (band signals) so as to produce a summed long-term correlation map .
US6691082B1
CLAIM 1
. A system for processing an input signal , the system comprising : means for separating the input signal into at least two sub-band signals (first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, first energy value) ;
first means for encoding one of said at least two sub-band signals using a first encoding algorithm to produce at least one encoded output signal , said first means for encoding further comprising means for detecting a gain mismatch between said at least two sub-band signals ;
and means for adjusting said gain mismatch detected by said detecting means ;
and second means for encoding another of said at least two sub-band signals using a second encoding algorithm to produce at least one other encoded output signal , where said first encoding algorithm is different from said second encoding algorithm .

US8990073B2
CLAIM 34
. A device as defined in claim 31 , further comprising a detector of strong tones in the sound signal (adaptive codebook) .
US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook (sound signal) vector , v(n) , based on a previously quantized excitation signal , u(n) .

US8990073B2
CLAIM 35
. A device for detecting sound activity (second encoding) in a sound signal (adaptive codebook) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US6691082B1
CLAIM 1
. A system for processing an input signal , the system comprising : means for separating the input signal into at least two sub-band signals ;
first means for encoding one of said at least two sub-band signals using a first encoding algorithm to produce at least one encoded output signal , said first means for encoding further comprising means for detecting a gain mismatch between said at least two sub-band signals ;
and means for adjusting said gain mismatch detected by said detecting means ;
and second means for encoding another of said at least two sub-band signals using a second encoding (sound activity, detecting sound activity) algorithm to produce at least one other encoded output signal , where said first encoding algorithm is different from said second encoding algorithm .

US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook (sound signal) vector , v(n) , based on a previously quantized excitation signal , u(n) .

US8990073B2
CLAIM 36
. A device for detecting sound activity (second encoding) in a sound signal (adaptive codebook) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US6691082B1
CLAIM 1
. A system for processing an input signal , the system comprising : means for separating the input signal into at least two sub-band signals ;
first means for encoding one of said at least two sub-band signals using a first encoding algorithm to produce at least one encoded output signal , said first means for encoding further comprising means for detecting a gain mismatch between said at least two sub-band signals ;
and means for adjusting said gain mismatch detected by said detecting means ;
and second means for encoding another of said at least two sub-band signals using a second encoding (sound activity, detecting sound activity) algorithm to produce at least one other encoded output signal , where said first encoding algorithm is different from said second encoding algorithm .

US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook (sound signal) vector , v(n) , based on a previously quantized excitation signal , u(n) .

US8990073B2
CLAIM 37
. A device as defined in claim 36 , further comprising a signal-to-noise ratio (SNR)-based sound activity (second encoding) detector .
US6691082B1
CLAIM 1
. A system for processing an input signal , the system comprising : means for separating the input signal into at least two sub-band signals ;
first means for encoding one of said at least two sub-band signals using a first encoding algorithm to produce at least one encoded output signal , said first means for encoding further comprising means for detecting a gain mismatch between said at least two sub-band signals ;
and means for adjusting said gain mismatch detected by said detecting means ;
and second means for encoding another of said at least two sub-band signals using a second encoding (sound activity, detecting sound activity) algorithm to produce at least one other encoded output signal , where said first encoding algorithm is different from said second encoding algorithm .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity (second encoding) detector comprises a comparator of an average signal to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US6691082B1
CLAIM 1
. A system for processing an input signal , the system comprising : means for separating the input signal into at least two sub-band signals ;
first means for encoding one of said at least two sub-band signals using a first encoding algorithm to produce at least one encoded output signal , said first means for encoding further comprising means for detecting a gain mismatch between said at least two sub-band signals ;
and means for adjusting said gain mismatch detected by said detecting means ;
and second means for encoding another of said at least two sub-band signals using a second encoding (sound activity, detecting sound activity) algorithm to produce at least one other encoded output signal , where said first encoding algorithm is different from said second encoding algorithm .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator (pre-processed signal) for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity (second encoding) detector .
US6691082B1
CLAIM 1
. A system for processing an input signal , the system comprising : means for separating the input signal into at least two sub-band signals ;
first means for encoding one of said at least two sub-band signals using a first encoding algorithm to produce at least one encoded output signal , said first means for encoding further comprising means for detecting a gain mismatch between said at least two sub-band signals ;
and means for adjusting said gain mismatch detected by said detecting means ;
and second means for encoding another of said at least two sub-band signals using a second encoding (sound activity, detecting sound activity) algorithm to produce at least one other encoded output signal , where said first encoding algorithm is different from said second encoding algorithm .

US6691082B1
CLAIM 23
. The hybrid encoder of claim 22 , wherein the means for encoding said second signal comprises : means for high-pass filtering and buffering an input signal comprised of a plurality of consecutive frames to derive a preprocessed signal , ps(m) ;
means for analyzing a current frame and at least one previously received frame from among said plurality of frames to derive a pitch period estimate ;
means for analyzing said pre-processed signal (noise estimator) , ps(m) , and said pitch period estimate to estimate a voicing cutoff frequency and to derive an all-pole model of the frequency response of the current speech frame dependent on said pitch period estimate , said voicing cutoff frequency , and ps(m) ;
means for outputting a line spectral frequency (LSF) representation of the all-pole model and a frame gain of the current frame ;
and means for quantizing said LSF representation , said voicing cutoff frequency , and said frame gain to derive a quantized LSF representation , a quantized voicing cutoff frequency , and a quantized frame gain .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal (adaptive codebook) for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates .
US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook (sound signal) vector , v(n) , based on a previously quantized excitation signal , u(n) .

US8990073B2
CLAIM 41
. A device as defined in claim 36 , further comprising a calculator of a spectral parameter used for detecting spectral changes and spectral attacks in the sound signal (adaptive codebook) .
US6691082B1
CLAIM 29
. The hybrid encoder of claim 24 , further comprising means for generating an adaptive codebook (sound signal) vector , v(n) , based on a previously quantized excitation signal , u(n) .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
EP1119911A1

Filed: 2000-07-21     Issued: 2001-08-01

Filtering device

(Original Assignee) Koninklijke Philips NV     (Current Assignee) Koninklijke Philips NV

Béatrice PESQUET-POPESCU
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (digital signals) using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
EP1119911A1
CLAIM 1
. A device for filtering an input sequence of digital signals (sound signal, music signal, second energy value, sound signal prevents updating) , comprising : (1) a first filtering stage , itself including : (A) a first splitting circuit , provided for subdividing the input sequence into two disjoint sets of samples , comprising respectively the odd samples c 0 (2n+l) and the even samples c 0 (2n) of said sequence ;
(B) a first predicting circuit , provided for generating " ;
detail" ;
coefficients dj(n) = c 0 (2n+l) - PI [ . . . . , c 0 (2n-2) , c 0 (2n) , c 0 (2n+2) , . . . . ] , where PI is a first linear filter and (. . . . , c 0 (2n-2) , c 0 (2n) , c 0 (2n+2) , . . . .) is the vector containing only the even samples of the input signal ;
(C) a first updating circuit , provided for generating " ;
approximation" ;
coefficients cj(n) = c 0 (2n) + Ul [ . . . . , dj (n-1) , dι(n) , dι(n+l) , . . . . ] 5 where Ul is a second linear filter and (. . . . , dι(n-l) , dι(n) , d^n+l) , . . . .) is the vector containing the " ;
detail" ;
coefficients ;
(2) at least a second filtering stage , itself including : (D) a second splitting circuit , provided for subdividing the previously generated " ;
approximation" ;
vector into two disjoint sets of samples , similarly called cι(2n+l) (E) a second predicting circuit , provided for generating a second level of " ;
detail" ;
coefficients d 2 (n) = cj(2n+l) - P2 [ . . . . , cι(2n-2) , cι(2n) , Cι(2n+2) , . . „] , where P2 is a third linear filter and (. . . . , C \ (2n-2) , cι(2n) , cι(2n+2) , . . . .) is the vector of " ;
detail" ;
coefficients of even order resulting from the second splitting operation ;
(F) a second updating circuit , provided for generating a second set of " ;
approximation" ;
coefficients c 2 (n) = cι(2n) + U2[ . . . . , d 2 (n-l) , d 2 (n) , d 2 (n+l) , . . . . ] , where U2 is a fourth linear filter and (. . . . , d 2 (n-1) , d 2 (n) , d (n+l) , . . . .) is the following vector of " ;
detail" ;
coefficients ;
characterized in that the first updating circuit U] and the second predicting circuit P2 are associated within an iterative cross-optimization operation carried out by using the second filter Ul for optimizing the third filter P2 and then said third filter P2 for optimizing said second filter Ul .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal (digital signals) in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
EP1119911A1
CLAIM 1
. A device for filtering an input sequence of digital signals (sound signal, music signal, second energy value, sound signal prevents updating) , comprising : (1) a first filtering stage , itself including : (A) a first splitting circuit , provided for subdividing the input sequence into two disjoint sets of samples , comprising respectively the odd samples c 0 (2n+l) and the even samples c 0 (2n) of said sequence ;
(B) a first predicting circuit , provided for generating " ;
detail" ;
coefficients dj(n) = c 0 (2n+l) - PI [ . . . . , c 0 (2n-2) , c 0 (2n) , c 0 (2n+2) , . . . . ] , where PI is a first linear filter and ( . . . . , c 0 (2n-2) , c 0 (2n) , c 0 (2n+2) , . . . . ) is the vector containing only the even samples of the input signal ;
(C) a first updating circuit , provided for generating " ;
approximation" ;
coefficients cj(n) = c 0 (2n) + Ul [ . . . . , dj (n-1) , dι(n) , dι(n+l) , . . . . ] 5 where Ul is a second linear filter and ( . . . . , dι(n-l) , dι(n) , d^n+l) , . . . . ) is the vector containing the " ;
detail" ;
coefficients ;
(2) at least a second filtering stage , itself including : (D) a second splitting circuit , provided for subdividing the previously generated " ;
approximation" ;
vector into two disjoint sets of samples , similarly called cι(2n+l) (E) a second predicting circuit , provided for generating a second level of " ;
detail" ;
coefficients d 2 (n) = cj(2n+l) - P2 [ . . . . , cι(2n-2) , cι(2n) , Cι(2n+2) , . . „] , where P2 is a third linear filter and ( . . . . , C \ (2n-2) , cι(2n) , cι(2n+2) , . . . . ) is the vector of " ;
detail" ;
coefficients of even order resulting from the second splitting operation ;
(F) a second updating circuit , provided for generating a second set of " ;
approximation" ;
coefficients c 2 (n) = cι(2n) + U2[ . . . . , d 2 (n-l) , d 2 (n) , d 2 (n+l) , . . . . ] , where U2 is a fourth linear filter and ( . . . . , d 2 (n-1) , d 2 (n) , d (n+l) , . . . . ) is the following vector of " ;
detail" ;
coefficients ;
characterized in that the first updating circuit U] and the second predicting circuit P2 are associated within an iterative cross-optimization operation carried out by using the second filter Ul for optimizing the third filter P2 and then said third filter P2 for optimizing said second filter Ul .

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (digital signals) .
EP1119911A1
CLAIM 1
. A device for filtering an input sequence of digital signals (sound signal, music signal, second energy value, sound signal prevents updating) , comprising : (1) a first filtering stage , itself including : (A) a first splitting circuit , provided for subdividing the input sequence into two disjoint sets of samples , comprising respectively the odd samples c 0 (2n+l) and the even samples c 0 (2n) of said sequence ;
(B) a first predicting circuit , provided for generating " ;
detail" ;
coefficients dj(n) = c 0 (2n+l) - PI [ . . . . , c 0 (2n-2) , c 0 (2n) , c 0 (2n+2) , . . . . ] , where PI is a first linear filter and ( . . . . , c 0 (2n-2) , c 0 (2n) , c 0 (2n+2) , . . . . ) is the vector containing only the even samples of the input signal ;
(C) a first updating circuit , provided for generating " ;
approximation" ;
coefficients cj(n) = c 0 (2n) + Ul [ . . . . , dj (n-1) , dι(n) , dι(n+l) , . . . . ] 5 where Ul is a second linear filter and ( . . . . , dι(n-l) , dι(n) , d^n+l) , . . . . ) is the vector containing the " ;
detail" ;
coefficients ;
(2) at least a second filtering stage , itself including : (D) a second splitting circuit , provided for subdividing the previously generated " ;
approximation" ;
vector into two disjoint sets of samples , similarly called cι(2n+l) (E) a second predicting circuit , provided for generating a second level of " ;
detail" ;
coefficients d 2 (n) = cj(2n+l) - P2 [ . . . . , cι(2n-2) , cι(2n) , Cι(2n+2) , . . „] , where P2 is a third linear filter and ( . . . . , C \ (2n-2) , cι(2n) , cι(2n+2) , . . . . ) is the vector of " ;
detail" ;
coefficients of even order resulting from the second splitting operation ;
(F) a second updating circuit , provided for generating a second set of " ;
approximation" ;
coefficients c 2 (n) = cι(2n) + U2[ . . . . , d 2 (n-l) , d 2 (n) , d 2 (n+l) , . . . . ] , where U2 is a fourth linear filter and ( . . . . , d 2 (n-1) , d 2 (n) , d (n+l) , . . . . ) is the following vector of " ;
detail" ;
coefficients ;
characterized in that the first updating circuit U] and the second predicting circuit P2 are associated within an iterative cross-optimization operation carried out by using the second filter Ul for optimizing the third filter P2 and then said third filter P2 for optimizing said second filter Ul .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update (updating circuit) of noise energy estimates when a tonal sound signal (digital signals) is detected .
EP1119911A1
CLAIM 4
. A filtering device according to anyone of claims 1 and 2 , characterized in that , at the last scale N , the last updating circuit (preventing update) U N is optimized by using as optimization criterion the minimization of the variance of the " ;
approximation" ;
coefficients at this last scale .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal (digital signals) for distinguishing a music signal (digital signals) from a background noise signal (first filter) and preventing update (updating circuit) of noise energy estimates .
EP1119911A1
CLAIM 4
. A filtering device according to anyone of claims 1 and 2 , characterized in that , at the last scale N , the last updating circuit (preventing update) U N is optimized by using as optimization criterion the minimization of the variance of the " ;
approximation" ;
coefficients at this last scale .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
EP1120775A1

Filed: 2000-06-01     Issued: 2001-08-01

Noise signal encoder and voice signal encoder

(Original Assignee) Panasonic Corp     (Current Assignee) Panasonic Corp

Koji Yoshida
US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (readable record) from a background noise signal (background noise signal) ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
EP1120775A1
CLAIM 4
A speech signal coder comprising : speech/noise signal separating means for separating an input speech signal into a speech signal and a background noise signal (background noise signal) superimposed on this speech signal ;
speech/non-speech deciding means for deciding whether a signal is in a speech segment or non-speech segment speech segment including only a noise signal from the speech signal obtained from said input speech signal or said speech/noise signal separating means ;
speech coding means for performing speech coding on said input speech signal when the decision result indicates the speech segment ;
the noise signal coder according to claim 1 that performs coding on the background noise signal obtained from said speech/noise signal separating means ;
and multiplexing means for multiplexing the outputs from said speech/non-speech deciding means , said speech coding means and said noise signal coder .

EP1120775A1
CLAIM 15
A mechanically readable record (music signal) ing medium that records a program to execute the steps of : analyzing statistical characteristic quantities on an input noise signal ;
storing information on a noise model expressing the statistical characteristic quantities on the input noise signal ;
detecting a variation of the noise model expressing the input noise signal ;
and updating the noise model and outputting information on the updated noise model as required .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal prevents updating of noise energy estimates when a music signal (readable record) is detected .
EP1120775A1
CLAIM 15
A mechanically readable record (music signal) ing medium that records a program to execute the steps of : analyzing statistical characteristic quantities on an input noise signal ;
storing information on a noise model expressing the statistical characteristic quantities on the input noise signal ;
detecting a variation of the noise model expressing the input noise signal ;
and updating the noise model and outputting information on the updated noise model as required .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal (readable record) from a background noise signal (background noise signal) and prevent update of noise energy estimates on the music signal .
EP1120775A1
CLAIM 4
A speech signal coder comprising : speech/noise signal separating means for separating an input speech signal into a speech signal and a background noise signal (background noise signal) superimposed on this speech signal ;
speech/non-speech deciding means for deciding whether a signal is in a speech segment or non-speech segment speech segment including only a noise signal from the speech signal obtained from said input speech signal or said speech/noise signal separating means ;
speech coding means for performing speech coding on said input speech signal when the decision result indicates the speech segment ;
the noise signal coder according to claim 1 that performs coding on the background noise signal obtained from said speech/noise signal separating means ;
and multiplexing means for multiplexing the outputs from said speech/non-speech deciding means , said speech coding means and said noise signal coder .

EP1120775A1
CLAIM 15
A mechanically readable record (music signal) ing medium that records a program to execute the steps of : analyzing statistical characteristic quantities on an input noise signal ;
storing information on a noise model expressing the statistical characteristic quantities on the input noise signal ;
detecting a variation of the noise model expressing the input noise signal ;
and updating the noise model and outputting information on the updated noise model as required .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (readable record) from a background noise signal (background noise signal) ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
EP1120775A1
CLAIM 4
A speech signal coder comprising : speech/noise signal separating means for separating an input speech signal into a speech signal and a background noise signal (background noise signal) superimposed on this speech signal ;
speech/non-speech deciding means for deciding whether a signal is in a speech segment or non-speech segment speech segment including only a noise signal from the speech signal obtained from said input speech signal or said speech/noise signal separating means ;
speech coding means for performing speech coding on said input speech signal when the decision result indicates the speech segment ;
the noise signal coder according to claim 1 that performs coding on the background noise signal obtained from said speech/noise signal separating means ;
and multiplexing means for multiplexing the outputs from said speech/non-speech deciding means , said speech coding means and said noise signal coder .

EP1120775A1
CLAIM 15
A mechanically readable record (music signal) ing medium that records a program to execute the steps of : analyzing statistical characteristic quantities on an input noise signal ;
storing information on a noise model expressing the statistical characteristic quantities on the input noise signal ;
detecting a variation of the noise model expressing the input noise signal ;
and updating the noise model and outputting information on the updated noise model as required .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal (readable record) from a background noise signal (background noise signal) ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
EP1120775A1
CLAIM 4
A speech signal coder comprising : speech/noise signal separating means for separating an input speech signal into a speech signal and a background noise signal (background noise signal) superimposed on this speech signal ;
speech/non-speech deciding means for deciding whether a signal is in a speech segment or non-speech segment speech segment including only a noise signal from the speech signal obtained from said input speech signal or said speech/noise signal separating means ;
speech coding means for performing speech coding on said input speech signal when the decision result indicates the speech segment ;
the noise signal coder according to claim 1 that performs coding on the background noise signal obtained from said speech/noise signal separating means ;
and multiplexing means for multiplexing the outputs from said speech/non-speech deciding means , said speech coding means and said noise signal coder .

EP1120775A1
CLAIM 15
A mechanically readable record (music signal) ing medium that records a program to execute the steps of : analyzing statistical characteristic quantities on an input noise signal ;
storing information on a noise model expressing the statistical characteristic quantities on the input noise signal ;
detecting a variation of the noise model expressing the input noise signal ;
and updating the noise model and outputting information on the updated noise model as required .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal (readable record) from a background noise signal (background noise signal) and preventing update of noise energy estimates .
EP1120775A1
CLAIM 4
A speech signal coder comprising : speech/noise signal separating means for separating an input speech signal into a speech signal and a background noise signal (background noise signal) superimposed on this speech signal ;
speech/non-speech deciding means for deciding whether a signal is in a speech segment or non-speech segment speech segment including only a noise signal from the speech signal obtained from said input speech signal or said speech/noise signal separating means ;
speech coding means for performing speech coding on said input speech signal when the decision result indicates the speech segment ;
the noise signal coder according to claim 1 that performs coding on the background noise signal obtained from said speech/noise signal separating means ;
and multiplexing means for multiplexing the outputs from said speech/non-speech deciding means , said speech coding means and said noise signal coder .

EP1120775A1
CLAIM 15
A mechanically readable record (music signal) ing medium that records a program to execute the steps of : analyzing statistical characteristic quantities on an input noise signal ;
storing information on a noise model expressing the statistical characteristic quantities on the input noise signal ;
detecting a variation of the noise model expressing the input noise signal ;
and updating the noise model and outputting information on the updated noise model as required .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6434417B1

Filed: 2000-03-28     Issued: 2002-08-13

Method and system for detecting cardiac depolarization

(Original Assignee) Cardiac Pacemakers Inc     (Current Assignee) Cardiac Pacemakers Inc

Eric G. Lovett
US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (first filter) ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US6434417B1
CLAIM 8
. The system of claim 6 further comprising a second filter bank with passbands dividing a lower frequency range than the frequency range of the first filter (background noise signal, noise ratio) bank , wherein the outputs of the second filter bank are multiplied together to form a second composite signal .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection further comprises updating the noise estimates for a next frame (accordance therewith) .
US6434417B1
CLAIM 1
. A system for detecting a specific cardiac signal corresponding to a template signal , comprising : a sensing channel for sensing cardiac electrical activity and outputting a sense signal in accordance therewith (next frame) ;
a first bank of bandpass filters for decomposing the sense signal into multiple frequency components , wherein the bandpass filters are selected with passbands corresponding to frequency components of the template signal ;
a differentiator for extracting a derivative signal from each of the multiple frequency components of the sense signal ;
an amplitude demodulator for detecting an envelope signal from each of the derivative signals ;
and , a multiplier for multiplying the envelope signals and outputting a composite signal in accordance therewith , the composite signal representing a correspondence between the sensed signal and the template signal .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame (accordance therewith) comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
US6434417B1
CLAIM 1
. A system for detecting a specific cardiac signal corresponding to a template signal , comprising : a sensing channel for sensing cardiac electrical activity and outputting a sense signal in accordance therewith (next frame) ;
a first bank of bandpass filters for decomposing the sense signal into multiple frequency components , wherein the bandpass filters are selected with passbands corresponding to frequency components of the template signal ;
a differentiator for extracting a derivative signal from each of the multiple frequency components of the sense signal ;
an amplitude demodulator for detecting an envelope signal from each of the derivative signals ;
and , a multiplier for multiplying the envelope signals and outputting a composite signal in accordance therewith , the composite signal representing a correspondence between the sensed signal and the template signal .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal (first filter) and prevent update of noise energy estimates on the music signal .
US6434417B1
CLAIM 8
. The system of claim 6 further comprising a second filter bank with passbands dividing a lower frequency range than the frequency range of the first filter (background noise signal, noise ratio) bank , wherein the outputs of the second filter bank are multiplied together to form a second composite signal .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (first filter) ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US6434417B1
CLAIM 8
. The system of claim 6 further comprising a second filter bank with passbands dividing a lower frequency range than the frequency range of the first filter (background noise signal, noise ratio) bank , wherein the outputs of the second filter bank are multiplied together to form a second composite signal .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal (first filter) ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US6434417B1
CLAIM 8
. The system of claim 6 further comprising a second filter bank with passbands dividing a lower frequency range than the frequency range of the first filter (background noise signal, noise ratio) bank , wherein the outputs of the second filter bank are multiplied together to form a second composite signal .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal (bandpass filter) to noise ratio (first filter) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US6434417B1
CLAIM 1
. A system for detecting a specific cardiac signal corresponding to a template signal , comprising : a sensing channel for sensing cardiac electrical activity and outputting a sense signal in accordance therewith ;
a first bank of bandpass filter (average signal) s for decomposing the sense signal into multiple frequency components , wherein the bandpass filters are selected with passbands corresponding to frequency components of the template signal ;
a differentiator for extracting a derivative signal from each of the multiple frequency components of the sense signal ;
an amplitude demodulator for detecting an envelope signal from each of the derivative signals ;
and , a multiplier for multiplying the envelope signals and outputting a composite signal in accordance therewith , the composite signal representing a correspondence between the sensed signal and the template signal .

US6434417B1
CLAIM 8
. The system of claim 6 further comprising a second filter bank with passbands dividing a lower frequency range than the frequency range of the first filter (background noise signal, noise ratio) bank , wherein the outputs of the second filter bank are multiplied together to form a second composite signal .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal (first filter) and preventing update of noise energy estimates .
US6434417B1
CLAIM 8
. The system of claim 6 further comprising a second filter bank with passbands dividing a lower frequency range than the frequency range of the first filter (background noise signal, noise ratio) bank , wherein the outputs of the second filter bank are multiplied together to form a second composite signal .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
CN1437747A

Filed: 2000-02-29     Issued: 2003-08-20

闭环多模混合域线性预测(mdlp)语音编解码器

(Original Assignee) 高通股份有限公司     

A·达斯
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (包括频率) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
CN1437747A
CLAIM 8
. 如权利要求1所述的语音处理器,其特征在于,至少一种频域编解码模式用多个各具一组包括频率 (frequency spectrum, frequency bin, frequency bands, first frequency bands) 、相位和振幅的参数的正弦波表示每个帧的短期频谱,其中相位由一个多项表达式和一个初始相位值来模拟,其中初始相位值或者是(1)如果前一帧以至少一种频域编解码模式编解码,则取前一帧的最终估计相位值,或者是(2)如果前一帧以该至少一种时域编解码模式编解码,则取从前一帧的短期频谱获取的某个相位值。

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (包括频率) of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
CN1437747A
CLAIM 8
. 如权利要求1所述的语音处理器,其特征在于,至少一种频域编解码模式用多个各具一组包括频率 (frequency spectrum, frequency bin, frequency bands, first frequency bands) 、相位和振幅的参数的正弦波表示每个帧的短期频谱,其中相位由一个多项表达式和一个初始相位值来模拟,其中初始相位值或者是(1)如果前一帧以至少一种频域编解码模式编解码,则取前一帧的最终估计相位值,或者是(2)如果前一帧以该至少一种时域编解码模式编解码,则取从前一帧的短期频谱获取的某个相位值。

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin (包括频率) by frequency bin basis ;

and summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
CN1437747A
CLAIM 8
. 如权利要求1所述的语音处理器,其特征在于,至少一种频域编解码模式用多个各具一组包括频率 (frequency spectrum, frequency bin, frequency bands, first frequency bands) 、相位和振幅的参数的正弦波表示每个帧的短期频谱,其中相位由一个多项表达式和一个初始相位值来模拟,其中初始相位值或者是(1)如果前一帧以至少一种频域编解码模式编解码,则取前一帧的最终估计相位值,或者是(2)如果前一帧以该至少一种时域编解码模式编解码,则取从前一帧的短期频谱获取的某个相位值。

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (预定阈值) indicative of sound activity in the sound signal .
CN1437747A
CLAIM 6
. 如权利要求1所述的语音处理器,其特征在于,进一步包括与编解码器相连的比较电路,用来对未编码帧和按至少一种频域编解码模式编码的帧进行比较,并根据比较结果产生性能测量值,其中,只有该性能测量值低于预定阈值 (adaptive threshold) 时,编解码器才应用至少一种时域编解码模式,否则编解码器应用该至少一种频域编解码模式。

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability (稳定状态) , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
CN1437747A
CLAIM 11
. 一种处理帧所述的方法,其特征在于,包括下列步骤:对每个连续输入帧应用开环编解码模式选择过程,以根据输入帧的语音内容选择一种时域编解码模式或一种频域编解码模式;如果输入帧的语音内容表示为稳定状态 (pitch stability) 的有声语音,则对该输入帧进行频域编解码;如果输入帧的语音内容表示为除稳定状态有声语音外的任何其它内容,则对该输入帧进行时域编解码;比较以频域编解码的帧和输入帧,以获取一个性能测量值;和如果该性能测量值低于预定的阈值,则对该输入帧进行时域编解码。

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame and an energy of the sound signal in a previous frame , for frequency bands (包括频率) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
CN1437747A
CLAIM 8
. 如权利要求1所述的语音处理器,其特征在于,至少一种频域编解码模式用多个各具一组包括频率 (frequency spectrum, frequency bin, frequency bands, first frequency bands) 、相位和振幅的参数的正弦波表示每个帧的短期频谱,其中相位由一个多项表达式和一个初始相位值来模拟,其中初始相位值或者是(1)如果前一帧以至少一种频域编解码模式编解码,则取前一帧的最终估计相位值,或者是(2)如果前一帧以该至少一种时域编解码模式编解码,则取从前一帧的短期频谱获取的某个相位值。

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (包括频率) into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
CN1437747A
CLAIM 8
. 如权利要求1所述的语音处理器,其特征在于,至少一种频域编解码模式用多个各具一组包括频率 (frequency spectrum, frequency bin, frequency bands, first frequency bands) 、相位和振幅的参数的正弦波表示每个帧的短期频谱,其中相位由一个多项表达式和一个初始相位值来模拟,其中初始相位值或者是(1)如果前一帧以至少一种频域编解码模式编解码,则取前一帧的最终估计相位值,或者是(2)如果前一帧以该至少一种时域编解码模式编解码,则取从前一帧的短期频谱获取的某个相位值。

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (包括频率) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
CN1437747A
CLAIM 8
. 如权利要求1所述的语音处理器,其特征在于,至少一种频域编解码模式用多个各具一组包括频率 (frequency spectrum, frequency bin, frequency bands, first frequency bands) 、相位和振幅的参数的正弦波表示每个帧的短期频谱,其中相位由一个多项表达式和一个初始相位值来模拟,其中初始相位值或者是(1)如果前一帧以至少一种频域编解码模式编解码,则取前一帧的最终估计相位值,或者是(2)如果前一帧以该至少一种时域编解码模式编解码,则取从前一帧的短期频谱获取的某个相位值。

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (包括频率) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
CN1437747A
CLAIM 8
. 如权利要求1所述的语音处理器,其特征在于,至少一种频域编解码模式用多个各具一组包括频率 (frequency spectrum, frequency bin, frequency bands, first frequency bands) 、相位和振幅的参数的正弦波表示每个帧的短期频谱,其中相位由一个多项表达式和一个初始相位值来模拟,其中初始相位值或者是(1)如果前一帧以至少一种频域编解码模式编解码,则取前一帧的最终估计相位值,或者是(2)如果前一帧以该至少一种时域编解码模式编解码,则取从前一帧的短期频谱获取的某个相位值。

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (包括频率) of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
CN1437747A
CLAIM 8
. 如权利要求1所述的语音处理器,其特征在于,至少一种频域编解码模式用多个各具一组包括频率 (frequency spectrum, frequency bin, frequency bands, first frequency bands) 、相位和振幅的参数的正弦波表示每个帧的短期频谱,其中相位由一个多项表达式和一个初始相位值来模拟,其中初始相位值或者是(1)如果前一帧以至少一种频域编解码模式编解码,则取前一帧的最终估计相位值,或者是(2)如果前一帧以该至少一种时域编解码模式编解码,则取从前一帧的短期频谱获取的某个相位值。

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin (包括频率) by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
CN1437747A
CLAIM 8
. 如权利要求1所述的语音处理器,其特征在于,至少一种频域编解码模式用多个各具一组包括频率 (frequency spectrum, frequency bin, frequency bands, first frequency bands) 、相位和振幅的参数的正弦波表示每个帧的短期频谱,其中相位由一个多项表达式和一个初始相位值来模拟,其中初始相位值或者是(1)如果前一帧以至少一种频域编解码模式编解码,则取前一帧的最终估计相位值,或者是(2)如果前一帧以该至少一种时域编解码模式编解码,则取从前一帧的短期频谱获取的某个相位值。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US7058572B1

Filed: 2000-01-28     Issued: 2006-06-06

Reducing acoustic noise in wireless and landline based telephony

(Original Assignee) Nortel Networks Ltd     (Current Assignee) Apple Inc

Elias J. Nemer
US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (sampled values) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US7058572B1
CLAIM 18
. The method of claim 17 wherein said skewness value of said LPC residual is determined by the following relation : SK = 1 N ⁢ ⁢ ∑ n = 0 N - 1 ⁢ ⁢ [ e ⁢ ⁢ (n) ] 3 , wherein e(n) are sampled values (frequency bins) of an LPC residual , and N is a frame length .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin (one band, square root) by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (sampled values) so as to produce a summed long-term correlation map .
US7058572B1
CLAIM 6
. The method of claim 1 wherein said respective filter gain value is determined by the following relation : G (f)= C·√{square root (frequency bin, frequency bands, first frequency bands, noise ratio, frequency dependent signal) over ([SNRprior(f)])} , wherein SNRprior is said respective smoothed signal-to-noise ratio .

US7058572B1
CLAIM 18
. The method of claim 17 wherein said skewness value of said LPC residual is determined by the following relation : SK = 1 N ⁢ ⁢ ∑ n = 0 N - 1 ⁢ ⁢ [ e ⁢ ⁢ (n) ] 3 , wherein e(n) are sampled values (frequency bins) of an LPC residual , and N is a frame length .

US7058572B1
CLAIM 31
. A method of reducing noise in a transmitted signal comprised of a plurality of frames , each of said frames including a plurality of frequency bands ;
said method comprising the steps of : determining , as a function of a linear predictive coding (LPC) prediction error , whether at least a respective one of said plurality of frames is a non-speech frame ;
estimating , when said at least one of said plurality of frames is a non-speech frame , a noise energy level of at least one of said plurality of bands of said at least a respective one of said plurality of frames ;
and filtering said at least one band (frequency bin, frequency bands, first frequency bands, noise ratio, frequency dependent signal) as a function of said estimated noise level .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map for frequency bins (sampled values) having a magnitude that exceeds a given fixed threshold .
US7058572B1
CLAIM 18
. The method of claim 17 wherein said skewness value of said LPC residual is determined by the following relation : SK = 1 N ⁢ ⁢ ∑ n = 0 N - 1 ⁢ ⁢ [ e ⁢ ⁢ (n) ] 3 , wherein e(n) are sampled values (frequency bins) of an LPC residual , and N is a frame length .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (second threshold) indicative of sound activity in the sound signal .
US7058572B1
CLAIM 28
. The method of claim 24 wherein said update constant α has a value of 0 . 002 when a watchdog timer is expired and said linear predictive coding (LPC) prediction error (PE) exceeds a predefined LPC prediction error threshold value T PE1 ;
said update constant α has a value of 0 . 05 when said at least one of said plurality of frames is stationary ;
said update constant α has a value of 0 . 1 when a noise likelihood value is less than a noise likelihood threshold value T LIK and said LPC prediction error PE is greater than a predefined LPC prediction error threshold value T PE2 such that said at least one of said plurality of frames is a non-speech frame ;
said update constant α has a value of 0 . 05 when an absolute value of a normalized skewness of a LPC residual is less than a first threshold value T a , said skewness of said LPC residual being normalized by total energy , or is less than a second threshold (adaptive threshold) value T b , said skewness of said LPC residual being normalized by a variance of said skewness of said LPC residual , and when said LPC prediction error PE is greater than a predefined LPC prediction error threshold value T PE2 so that said LPC residual of said at least one of said plurality of frames has substantially zero skewness ;
and said update constant α has a value of 0 . 1 when a current value of said estimated noise energy level is greater than a total energy of said plurality of frames .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame and an energy of the sound signal in a previous frame , for frequency bands (one band, square root) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US7058572B1
CLAIM 6
. The method of claim 1 wherein said respective filter gain value is determined by the following relation : G (f)= C·√{square root (frequency bin, frequency bands, first frequency bands, noise ratio, frequency dependent signal) over ([SNRprior(f)])} , wherein SNRprior is said respective smoothed signal-to-noise ratio .

US7058572B1
CLAIM 31
. A method of reducing noise in a transmitted signal comprised of a plurality of frames , each of said frames including a plurality of frequency bands ;
said method comprising the steps of : determining , as a function of a linear predictive coding (LPC) prediction error , whether at least a respective one of said plurality of frames is a non-speech frame ;
estimating , when said at least one of said plurality of frames is a non-speech frame , a noise energy level of at least one of said plurality of bands of said at least a respective one of said plurality of frames ;
and filtering said at least one band (frequency bin, frequency bands, first frequency bands, noise ratio, frequency dependent signal) as a function of said estimated noise level .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (first threshold value, previous frame) indicative of an activity of the sound signal .
US7058572B1
CLAIM 1
. A method of reducing noise in a transmitted signal comprised of a plurality of frames , each of said frames including a plurality of frequency bands ;
said method comprising the steps of : determining a respective total signal energy and a respective current estimate of the noise energy for at least one of said plurality of frequency bands of at least one of said plurality of frames , wherein said respective current estimate of the noise energy is determined as a function of a linear predictive coding (LPC) prediction error ;
determining a respective local signal-to-noise ratio (SNRpost) for said at least one of said plurality of frequency bands as a function of said respective signal energy and said respective current estimate of the noise energy ;
determining a respective smoothed signal-to-noise ratio (SNRprior) for said at least one of said plurality of frequency bands from said respective local signal-to-noise ratio and another respective signal-to-noise ratio (SNRest) estimated for a previous frame (activity prediction parameter) ;
and calculating a respective filter gain value for said at least one of said plurality of frequency bands from said respective smoothed signal-to-noise ratio .

US7058572B1
CLAIM 28
. The method of claim 24 wherein said update constant α has a value of 0 . 002 when a watchdog timer is expired and said linear predictive coding (LPC) prediction error (PE) exceeds a predefined LPC prediction error threshold value T PE1 ;
said update constant α has a value of 0 . 05 when said at least one of said plurality of frames is stationary ;
said update constant α has a value of 0 . 1 when a noise likelihood value is less than a noise likelihood threshold value T LIK and said LPC prediction error PE is greater than a predefined LPC prediction error threshold value T PE2 such that said at least one of said plurality of frames is a non-speech frame ;
said update constant α has a value of 0 . 05 when an absolute value of a normalized skewness of a LPC residual is less than a first threshold value (activity prediction parameter) T a , said skewness of said LPC residual being normalized by total energy , or is less than a second threshold value T b , said skewness of said LPC residual being normalized by a variance of said skewness of said LPC residual , and when said LPC prediction error PE is greater than a predefined LPC prediction error threshold value T PE2 so that said LPC residual of said at least one of said plurality of frames has substantially zero skewness ;
and said update constant α has a value of 0 . 1 when a current value of said estimated noise energy level is greater than a total energy of said plurality of frames .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (first threshold value, previous frame) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
US7058572B1
CLAIM 1
. A method of reducing noise in a transmitted signal comprised of a plurality of frames , each of said frames including a plurality of frequency bands ;
said method comprising the steps of : determining a respective total signal energy and a respective current estimate of the noise energy for at least one of said plurality of frequency bands of at least one of said plurality of frames , wherein said respective current estimate of the noise energy is determined as a function of a linear predictive coding (LPC) prediction error ;
determining a respective local signal-to-noise ratio (SNRpost) for said at least one of said plurality of frequency bands as a function of said respective signal energy and said respective current estimate of the noise energy ;
determining a respective smoothed signal-to-noise ratio (SNRprior) for said at least one of said plurality of frequency bands from said respective local signal-to-noise ratio and another respective signal-to-noise ratio (SNRest) estimated for a previous frame (activity prediction parameter) ;
and calculating a respective filter gain value for said at least one of said plurality of frequency bands from said respective smoothed signal-to-noise ratio .

US7058572B1
CLAIM 28
. The method of claim 24 wherein said update constant α has a value of 0 . 002 when a watchdog timer is expired and said linear predictive coding (LPC) prediction error (PE) exceeds a predefined LPC prediction error threshold value T PE1 ;
said update constant α has a value of 0 . 05 when said at least one of said plurality of frames is stationary ;
said update constant α has a value of 0 . 1 when a noise likelihood value is less than a noise likelihood threshold value T LIK and said LPC prediction error PE is greater than a predefined LPC prediction error threshold value T PE2 such that said at least one of said plurality of frames is a non-speech frame ;
said update constant α has a value of 0 . 05 when an absolute value of a normalized skewness of a LPC residual is less than a first threshold value (activity prediction parameter) T a , said skewness of said LPC residual being normalized by total energy , or is less than a second threshold value T b , said skewness of said LPC residual being normalized by a variance of said skewness of said LPC residual , and when said LPC prediction error PE is greater than a predefined LPC prediction error threshold value T PE2 so that said LPC residual of said at least one of said plurality of frames has substantially zero skewness ;
and said update constant α has a value of 0 . 1 when a current value of said estimated noise energy level is greater than a total energy of said plurality of frames .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (first threshold value, previous frame) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US7058572B1
CLAIM 1
. A method of reducing noise in a transmitted signal comprised of a plurality of frames , each of said frames including a plurality of frequency bands ;
said method comprising the steps of : determining a respective total signal energy and a respective current estimate of the noise energy for at least one of said plurality of frequency bands of at least one of said plurality of frames , wherein said respective current estimate of the noise energy is determined as a function of a linear predictive coding (LPC) prediction error ;
determining a respective local signal-to-noise ratio (SNRpost) for said at least one of said plurality of frequency bands as a function of said respective signal energy and said respective current estimate of the noise energy ;
determining a respective smoothed signal-to-noise ratio (SNRprior) for said at least one of said plurality of frequency bands from said respective local signal-to-noise ratio and another respective signal-to-noise ratio (SNRest) estimated for a previous frame (activity prediction parameter) ;
and calculating a respective filter gain value for said at least one of said plurality of frequency bands from said respective smoothed signal-to-noise ratio .

US7058572B1
CLAIM 28
. The method of claim 24 wherein said update constant α has a value of 0 . 002 when a watchdog timer is expired and said linear predictive coding (LPC) prediction error (PE) exceeds a predefined LPC prediction error threshold value T PE1 ;
said update constant α has a value of 0 . 05 when said at least one of said plurality of frames is stationary ;
said update constant α has a value of 0 . 1 when a noise likelihood value is less than a noise likelihood threshold value T LIK and said LPC prediction error PE is greater than a predefined LPC prediction error threshold value T PE2 such that said at least one of said plurality of frames is a non-speech frame ;
said update constant α has a value of 0 . 05 when an absolute value of a normalized skewness of a LPC residual is less than a first threshold value (activity prediction parameter) T a , said skewness of said LPC residual being normalized by total energy , or is less than a second threshold value T b , said skewness of said LPC residual being normalized by a variance of said skewness of said LPC residual , and when said LPC prediction error PE is greater than a predefined LPC prediction error threshold value T PE2 so that said LPC residual of said at least one of said plurality of frames has substantially zero skewness ;
and said update constant α has a value of 0 . 1 when a current value of said estimated noise energy level is greater than a total energy of said plurality of frames .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (one band, square root) into a first group of a certain number of first frequency bands and a second group (absolute value) of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US7058572B1
CLAIM 6
. The method of claim 1 wherein said respective filter gain value is determined by the following relation : G (f)= C·√{square root (frequency bin, frequency bands, first frequency bands, noise ratio, frequency dependent signal) over ([SNRprior(f)])} , wherein SNRprior is said respective smoothed signal-to-noise ratio .

US7058572B1
CLAIM 7
. The method of claim 1 further comprising the step of forming said at least one of said plurality of frames from a first number (first frequency) of new speech samples and a second number of prior speech samples .

US7058572B1
CLAIM 28
. The method of claim 24 wherein said update constant α has a value of 0 . 002 when a watchdog timer is expired and said linear predictive coding (LPC) prediction error (PE) exceeds a predefined LPC prediction error threshold value T PE1 ;
said update constant α has a value of 0 . 05 when said at least one of said plurality of frames is stationary ;
said update constant α has a value of 0 . 1 when a noise likelihood value is less than a noise likelihood threshold value T LIK and said LPC prediction error PE is greater than a predefined LPC prediction error threshold value T PE2 such that said at least one of said plurality of frames is a non-speech frame ;
said update constant α has a value of 0 . 05 when an absolute value (second group) of a normalized skewness of a LPC residual is less than a first threshold value T a , said skewness of said LPC residual being normalized by total energy , or is less than a second threshold value T b , said skewness of said LPC residual being normalized by a variance of said skewness of said LPC residual , and when said LPC prediction error PE is greater than a predefined LPC prediction error threshold value T PE2 so that said LPC residual of said at least one of said plurality of frames has substantially zero skewness ;
and said update constant α has a value of 0 . 1 when a current value of said estimated noise energy level is greater than a total energy of said plurality of frames .

US7058572B1
CLAIM 31
. A method of reducing noise in a transmitted signal comprised of a plurality of frames , each of said frames including a plurality of frequency bands ;
said method comprising the steps of : determining , as a function of a linear predictive coding (LPC) prediction error , whether at least a respective one of said plurality of frames is a non-speech frame ;
estimating , when said at least one of said plurality of frames is a non-speech frame , a noise energy level of at least one of said plurality of bands of said at least a respective one of said plurality of frames ;
and filtering said at least one band (frequency bin, frequency bands, first frequency bands, noise ratio, frequency dependent signal) as a function of said estimated noise level .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin (one band, square root) by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (sampled values) so as to produce a summed long-term correlation map .
US7058572B1
CLAIM 6
. The method of claim 1 wherein said respective filter gain value is determined by the following relation : G (f)= C·√{square root (frequency bin, frequency bands, first frequency bands, noise ratio, frequency dependent signal) over ([SNRprior(f)])} , wherein SNRprior is said respective smoothed signal-to-noise ratio .

US7058572B1
CLAIM 18
. The method of claim 17 wherein said skewness value of said LPC residual is determined by the following relation : SK = 1 N ⁢ ⁢ ∑ n = 0 N - 1 ⁢ ⁢ [ e ⁢ ⁢ (n) ] 3 , wherein e(n) are sampled values (frequency bins) of an LPC residual , and N is a frame length .

US7058572B1
CLAIM 31
. A method of reducing noise in a transmitted signal comprised of a plurality of frames , each of said frames including a plurality of frequency bands ;
said method comprising the steps of : determining , as a function of a linear predictive coding (LPC) prediction error , whether at least a respective one of said plurality of frames is a non-speech frame ;
estimating , when said at least one of said plurality of frames is a non-speech frame , a noise energy level of at least one of said plurality of bands of said at least a respective one of said plurality of frames ;
and filtering said at least one band (frequency bin, frequency bands, first frequency bands, noise ratio, frequency dependent signal) as a function of said estimated noise level .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal to noise ratio (one band, square root) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US7058572B1
CLAIM 6
. The method of claim 1 wherein said respective filter gain value is determined by the following relation : G (f)= C·√{square root (frequency bin, frequency bands, first frequency bands, noise ratio, frequency dependent signal) over ([SNRprior(f)])} , wherein SNRprior is said respective smoothed signal-to-noise ratio .

US7058572B1
CLAIM 31
. A method of reducing noise in a transmitted signal comprised of a plurality of frames , each of said frames including a plurality of frequency bands ;
said method comprising the steps of : determining , as a function of a linear predictive coding (LPC) prediction error , whether at least a respective one of said plurality of frames is a non-speech frame ;
estimating , when said at least one of said plurality of frames is a non-speech frame , a noise energy level of at least one of said plurality of bands of said at least a respective one of said plurality of frames ;
and filtering said at least one band (frequency bin, frequency bands, first frequency bands, noise ratio, frequency dependent signal) as a function of said estimated noise level .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6717991B1

Filed: 2000-01-28     Issued: 2004-04-06

System and method for dual microphone signal noise reduction using spectral subtraction

(Original Assignee) Telefonaktiebolaget LM Ericsson AB     (Current Assignee) Optis Wireless Technology LLC

Harald Gustafsson, Ulf Lindgren, Ingvar Claesson, Sven Nordholm
US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis (energy levels) ;

and summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US6717991B1
CLAIM 12
. The noise reduction system of claim 1 , wherein the controller substantially equalizes energy levels (frequency bin basis) of the first signal and the second signal .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (scalar multiplication) .
US6717991B1
CLAIM 11
. The noise reduction system of claim 2 , wherein the subtraction factors k 1 , k 2 , and k 3 are derived as k 1 (i)=(1−{overscore (γ)}(i))· t 1 +r 1 k 2 (i)={overscore (γ)}(i)· t 2 +r 2 k 3 (i)=(1−{overscore (γ)}(i))· t 3 +r 3 where t 1 , t 2 , and t 3 are scalar multiplication (SNR calculation) factors , r 1 , r 2 , and r 3 are additive factors , and {overscore (γ)}(i) is an averaged square correlation sum of the first signal and the second signal .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (speech signal) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
US6717991B1
CLAIM 25
. The noise reduction system of claim 22 , wherein the desired signal measurement is a speech signal (noise character parameter, activity prediction parameter) measurement .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (speech signal) indicative of an activity of the sound signal .
US6717991B1
CLAIM 25
. The noise reduction system of claim 22 , wherein the desired signal measurement is a speech signal (noise character parameter, activity prediction parameter) measurement .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (speech signal) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
US6717991B1
CLAIM 25
. The noise reduction system of claim 22 , wherein the desired signal measurement is a speech signal (noise character parameter, activity prediction parameter) measurement .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (speech signal) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US6717991B1
CLAIM 25
. The noise reduction system of claim 22 , wherein the desired signal measurement is a speech signal (noise character parameter, activity prediction parameter) measurement .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (speech signal) comprises : dividing a plurality of frequency bands into a first group (weighting function) of a certain number of first frequency (first frequency) bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US6717991B1
CLAIM 19
. The noise reduction system of claim 14 , wherein a frequency dependent weighting function (first group) , performed by at least one of the first and second spectral subtraction processors , is used to derive at least one of a first and second frequency dependent positive measurement .

US6717991B1
CLAIM 20
. The noise reduction system of claim 19 , wherein the noise signal measurement is derived from at least one of the first signal and the second signal and at least one of the first frequency (first frequency) dependent positive measurement and the second frequency dependent positive measurement .

US6717991B1
CLAIM 25
. The noise reduction system of claim 22 , wherein the desired signal measurement is a speech signal (noise character parameter, activity prediction parameter) measurement .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (speech signal) inferior than a given fixed threshold .
US6717991B1
CLAIM 25
. The noise reduction system of claim 22 , wherein the desired signal measurement is a speech signal (noise character parameter, activity prediction parameter) measurement .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis (energy levels) ;

and an adder for summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US6717991B1
CLAIM 12
. The noise reduction system of claim 1 , wherein the controller substantially equalizes energy levels (frequency bin basis) of the first signal and the second signal .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates (controller estimates) in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector .
US6717991B1
CLAIM 2
. The noise reduction system of claim 1 , wherein the controller estimates (updating noise energy estimates) a correlation between the first signal and the second signal .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6339758B1

Filed: 1999-07-30     Issued: 2002-01-15

Noise suppress processing apparatus and method

(Original Assignee) Toshiba Corp     (Current Assignee) Toshiba Corp

Hiroshi Kanazawa, Masami Akamine
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (noise component, arrival direction, second noise) using a frequency spectrum (frequency spectrum) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 4
. An apparatus according to claim 1 , wherein said frequency analyzer section converts the speech signal components for the plurality of channels in a time domain into signal components in a frequency domain by the fast Fourier transform , and outputs frequency spectrum (frequency spectrum) data in units of channels .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (frequency spectrum) of the sound signal (noise component, arrival direction, second noise) in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 4
. An apparatus according to claim 1 , wherein said frequency analyzer section converts the speech signal components for the plurality of channels in a time domain into signal components in a frequency domain by the fast Fourier transform , and outputs frequency spectrum (frequency spectrum) data in units of channels .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin (fast Fourier transform) by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (first range) so as to produce a summed long-term correlation map .
US6339758B1
CLAIM 4
. An apparatus according to claim 1 , wherein said frequency analyzer section converts the speech signal components for the plurality of channels in a time domain into signal components in a frequency domain by the fast Fourier transform (frequency bin) , and outputs frequency spectrum data in units of channels .

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (noise component, arrival direction, second noise) .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (noise component, arrival direction, second noise) comprises searching in the correlation map for frequency bins (first range) having a magnitude that exceeds a given fixed threshold .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (noise component, arrival direction, second noise) comprises comparing the summed long-term correlation map with an adaptive threshold indicative of sound activity (noise component, arrival direction, second noise) in the sound signal .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 10
. A method for detecting sound activity (noise component, arrival direction, second noise) in a sound signal (noise component, arrival direction, second noise) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (noise component, arrival direction, second noise) from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound signal (noise component, arrival direction, second noise) is detected .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity (noise component, arrival direction, second noise) in the sound signal (noise component, arrival direction, second noise) further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (noise component, arrival direction, second noise) detection comprises detecting the sound signal (noise component, arrival direction, second noise) based on a frequency dependent signal-to-noise ratio (SNR) .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 14
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (noise component, arrival direction, second noise) detection comprises comparing an average signal-to-noise ratio (SNR av ) to a threshold calculated as a function of a long-term signal-to-noise ratio (SNR LT ) .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity (noise component, arrival direction, second noise) detection in the sound signal (noise component, arrival direction, second noise) further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (noise power) .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 2
. An apparatus according to claim 1 , further comprising a spectrum subtraction noise suppression section including a speech band power calculator section which divides the obtained speech frequency components in units of frequency bands and calculates speech power for each band , a noise band power calculator section which divides the obtained noise frequency components in units of frequency bands and calculates noise power (SNR LT, SNR calculation) for each band , and a spectrum subtractor section which suppresses background noise by weighting in units of frequency bands of speech signals on the basis of the speech and noise frequency band power values obtained by said speech and noise band power calculator sections .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity (noise component, arrival direction, second noise) detection further comprises updating the noise estimates for a next frame .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal (noise component, arrival direction, second noise) and a ratio between a second order and a sixteenth order of linear prediction (speech input) residual error energies .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input (linear prediction) section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 18
. A method as defined in claim 14 , comprising classifying the sound signal (noise component, arrival direction, second noise) as one of an inactive sound signal and active sound signal , which comprises determining an inactive sound signal when the average signal-to-noise ratio (SNR av ) is inferior to the calculated threshold .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 19
. A method as defined in claim 14 , comprising classifying the sound signal (noise component, arrival direction, second noise) as one of an inactive sound signal and active sound signal , which comprises determining an active sound signal when the average signal-to-noise ratio (SNR av ) is larger than the calculated threshold .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal (noise component, arrival direction, second noise) prevents updating of noise energy estimates when a music signal (noise component, arrival direction, second noise) is detected .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal (noise component, arrival direction, second noise) from a background noise signal and prevent update of noise energy estimates on the music signal .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal (noise component, arrival direction, second noise) in a current frame and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter indicative of an activity of the sound signal (noise component, arrival direction, second noise) .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal (noise component, arrival direction, second noise) and the complementary non-stationarity parameter .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal (noise component, arrival direction, second noise) using a frequency spectrum (frequency spectrum) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 4
. An apparatus according to claim 1 , wherein said frequency analyzer section converts the speech signal components for the plurality of channels in a time domain into signal components in a frequency domain by the fast Fourier transform , and outputs frequency spectrum (frequency spectrum) data in units of channels .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal (noise component, arrival direction, second noise) using a frequency spectrum (frequency spectrum) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 4
. An apparatus according to claim 1 , wherein said frequency analyzer section converts the speech signal components for the plurality of channels in a time domain into signal components in a frequency domain by the fast Fourier transform , and outputs frequency spectrum (frequency spectrum) data in units of channels .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (frequency spectrum) of the sound signal (noise component, arrival direction, second noise) in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 4
. An apparatus according to claim 1 , wherein said frequency analyzer section converts the speech signal components for the plurality of channels in a time domain into signal components in a frequency domain by the fast Fourier transform , and outputs frequency spectrum (frequency spectrum) data in units of channels .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin (fast Fourier transform) by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (first range) so as to produce a summed long-term correlation map .
US6339758B1
CLAIM 4
. An apparatus according to claim 1 , wherein said frequency analyzer section converts the speech signal components for the plurality of channels in a time domain into signal components in a frequency domain by the fast Fourier transform (frequency bin) , and outputs frequency spectrum data in units of channels .

US8990073B2
CLAIM 34
. A device as defined in claim 31 , further comprising a detector of strong tones in the sound signal (noise component, arrival direction, second noise) .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 35
. A device for detecting sound activity (noise component, arrival direction, second noise) in a sound signal (noise component, arrival direction, second noise) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (noise component, arrival direction, second noise) from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 36
. A device for detecting sound activity (noise component, arrival direction, second noise) in a sound signal (noise component, arrival direction, second noise) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal (noise component, arrival direction, second noise) from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 37
. A device as defined in claim 36 , further comprising a signal-to-noise ratio (SNR)-based sound activity (noise component, arrival direction, second noise) detector .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity (noise component, arrival direction, second noise) detector comprises a comparator of an average signal (noise component, arrival direction, second noise) to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity (noise component, arrival direction, second noise) detector .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal (noise component, arrival direction, second noise) for distinguishing a music signal (noise component, arrival direction, second noise) from a background noise signal and preventing update of noise energy estimates .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .

US8990073B2
CLAIM 41
. A device as defined in claim 36 , further comprising a calculator of a spectral parameter used for detecting spectral changes and spectral attacks in the sound signal (noise component, arrival direction, second noise) .
US6339758B1
CLAIM 1
. A noise suppression apparatus for independently outputting speech frequency components and noise frequency components , comprising : a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions ;
a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels ;
a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech ;
a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise ;
a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section ;
a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by said second beam former processor section ;
a target speech direction correcting section which corrects a first input direction as an arrival direction (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) of the target speech to be input in said first beam former processor section on the basis of the target speech direction estimated by said target speech direction estimating section ;
and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in said second beam former processor section on the basis of the noise direction estimated by said noise direction estimating section .

US6339758B1
CLAIM 15
. A noise suppression method for independently outputting speech frequency components and noise frequency components , comprising the steps of : receiving speech uttered by a speaker at different positions to obtain speech signals of different channels ;
frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels ;
suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step , to output the target speech ;
suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise component (average signal, sound activity, sound activity detection, sound signal, sound activity detector, detecting sound activity, sound signal prevents updating, music signal) s ;
estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise ;
estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech ;
correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction ;
and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6226616B1

Filed: 1999-06-21     Issued: 2001-05-01

Sound quality of established low bit-rate audio coding systems without loss of decoder compatibility

(Original Assignee) Digital Theater Systems Inc     (Current Assignee) DTS LLC

Yu-Li You, William Paul Smith, Zoran Fejzo, Stephen Smyth
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (band signals, N subbands, readable storage) of the sound signal , the method comprising : calculating a current residual spectrum (lower band) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US6226616B1
CLAIM 9
. A multi-channel audio encoder for coding a digital audio signal sampled at a known sampling rate and having an audio bandwidth , comprising : a core encoder that extracts and codes a core signal from the digital audio signal over an audio bandwidth into core bits , said core encoder including an N-band filter bank that decomposes the core signal into N subbands (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) and N subband coders that generate the core bits , N subband decoders that reconstruct the N subband samples to form a reconstructed core signal , a summing node that forms a difference signal from the reconstructed core signal and the digital audio signal in a transform or subband domain ;
and an extension encoder that encodes the difference signal into extension bits , said extension encoder matching the core encoder over its audio bandwidth and comprising , a two band filter bank that splits the digital audio signal into lower and upper bands ;
a N-band filter bank equivalent to the core encoder' ;
s that decomposes the digital audio signal in the lower band (current residual spectrum) into N subbands , said summing node existing inside said extension encoder and comprising N subband nodes that subtract the reconstructed N subband samples from the digital audio signal' ;
s N subbands , respectively to form N difference subbands ;
N subband coders that code the N difference subbands to form the lower band extension bits ;
a M-band filter bank that decomposes the digital audio signal in the upper band into M subbands ;
and M subband coders that code the M subbands to form the upper band extension bits .

US6226616B1
CLAIM 12
. A multi-channel open-box audio decoder for reconstructing multiple audio channels from a bit stream , in which each audio channel was sampled at a known sampling rate and has an audio bandwidth , comprising : an unpacker for reading in and storing the bit stream a frame at a time , each of said frames including a core field having core bits and an extension fields having a sync word and extension bits , said unpacker extracting said core bits and detecting said sync word to extract and separate the extension bits ;
N core subband decoders that decode the core bits into N core subband signals (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) N extension subband decoders that decode the extension bits into a lower N extension subband signals ;
M extension subband decoders that decode the extension bits into an upper M extension subband signals ;
N summation nodes that sum the N core subband signals to the respective N extension subband signals to form N composite subband signals ;
and a filter that synthesizes the N composite subband signals and the M extension subband signals to reproduce a multi-channel audio signal .

US6226616B1
CLAIM 15
. An article of manufacture for use with an existing base of first generation audio decoders that are capable of reconstructing a core signal up to an audio bandwidth and sample resolution and a developing base of second generation audio decoders having a larger audio bandwidth , comprising : a portable machine readable storage (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) medium for use with said first and second generation audio decoders ;
and a single digital bit stream representing a multi-channel audio signal written onto said storage medium in a core plus extension format , said bit stream comprising a sequence of synchronized frames , each of said frames including a core field having a core sync word immediately proceeding core bits and an extension fields having an extension sync word immediately proceeding extension bits , said sequence of core bits defining a noise floor for the reconstructed core signal across the audio bandwidth of said first generation audio decoders , and said sequence of extension bits further refining the noise floor across the core encoder' ;
s audio bandwidth and defining a noise floor for the remainder of the audio bandwidth of the second generation audio decoders .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum (lower band) comprises : searching for the minima in the frequency spectrum (band signals, N subbands, readable storage) of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US6226616B1
CLAIM 9
. A multi-channel audio encoder for coding a digital audio signal sampled at a known sampling rate and having an audio bandwidth , comprising : a core encoder that extracts and codes a core signal from the digital audio signal over an audio bandwidth into core bits , said core encoder including an N-band filter bank that decomposes the core signal into N subbands (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) and N subband coders that generate the core bits , N subband decoders that reconstruct the N subband samples to form a reconstructed core signal , a summing node that forms a difference signal from the reconstructed core signal and the digital audio signal in a transform or subband domain ;
and an extension encoder that encodes the difference signal into extension bits , said extension encoder matching the core encoder over its audio bandwidth and comprising , a two band filter bank that splits the digital audio signal into lower and upper bands ;
a N-band filter bank equivalent to the core encoder' ;
s that decomposes the digital audio signal in the lower band (current residual spectrum) into N subbands , said summing node existing inside said extension encoder and comprising N subband nodes that subtract the reconstructed N subband samples from the digital audio signal' ;
s N subbands , respectively to form N difference subbands ;
N subband coders that code the N difference subbands to form the lower band extension bits ;
a M-band filter bank that decomposes the digital audio signal in the upper band into M subbands ;
and M subband coders that code the M subbands to form the upper band extension bits .

US6226616B1
CLAIM 12
. A multi-channel open-box audio decoder for reconstructing multiple audio channels from a bit stream , in which each audio channel was sampled at a known sampling rate and has an audio bandwidth , comprising : an unpacker for reading in and storing the bit stream a frame at a time , each of said frames including a core field having core bits and an extension fields having a sync word and extension bits , said unpacker extracting said core bits and detecting said sync word to extract and separate the extension bits ;
N core subband decoders that decode the core bits into N core subband signals (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) N extension subband decoders that decode the extension bits into a lower N extension subband signals ;
M extension subband decoders that decode the extension bits into an upper M extension subband signals ;
N summation nodes that sum the N core subband signals to the respective N extension subband signals to form N composite subband signals ;
and a filter that synthesizes the N composite subband signals and the M extension subband signals to reproduce a multi-channel audio signal .

US6226616B1
CLAIM 15
. An article of manufacture for use with an existing base of first generation audio decoders that are capable of reconstructing a core signal up to an audio bandwidth and sample resolution and a developing base of second generation audio decoders having a larger audio bandwidth , comprising : a portable machine readable storage (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) medium for use with said first and second generation audio decoders ;
and a single digital bit stream representing a multi-channel audio signal written onto said storage medium in a core plus extension format , said bit stream comprising a sequence of synchronized frames , each of said frames including a core field having a core sync word immediately proceeding core bits and an extension fields having an extension sync word immediately proceeding extension bits , said sequence of core bits defining a noise floor for the reconstructed core signal across the audio bandwidth of said first generation audio decoders , and said sequence of extension bits further refining the noise floor across the core encoder' ;
s audio bandwidth and defining a noise floor for the remainder of the audio bandwidth of the second generation audio decoders .

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum (lower band) comprises locating a maximum between each pair of two consecutive minima of the current residual spectrum .
US6226616B1
CLAIM 9
. A multi-channel audio encoder for coding a digital audio signal sampled at a known sampling rate and having an audio bandwidth , comprising : a core encoder that extracts and codes a core signal from the digital audio signal over an audio bandwidth into core bits , said core encoder including an N-band filter bank that decomposes the core signal into N subbands and N subband coders that generate the core bits , N subband decoders that reconstruct the N subband samples to form a reconstructed core signal , a summing node that forms a difference signal from the reconstructed core signal and the digital audio signal in a transform or subband domain ;
and an extension encoder that encodes the difference signal into extension bits , said extension encoder matching the core encoder over its audio bandwidth and comprising , a two band filter bank that splits the digital audio signal into lower and upper bands ;
a N-band filter bank equivalent to the core encoder' ;
s that decomposes the digital audio signal in the lower band (current residual spectrum) into N subbands , said summing node existing inside said extension encoder and comprising N subband nodes that subtract the reconstructed N subband samples from the digital audio signal' ;
s N subbands , respectively to form N difference subbands ;
N subband coders that code the N difference subbands to form the lower band extension bits ;
a M-band filter bank that decomposes the digital audio signal in the upper band into M subbands ;
and M subband coders that code the M subbands to form the upper band extension bits .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum (lower band) , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (band signals, N subbands, readable storage) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US6226616B1
CLAIM 9
. A multi-channel audio encoder for coding a digital audio signal sampled at a known sampling rate and having an audio bandwidth , comprising : a core encoder that extracts and codes a core signal from the digital audio signal over an audio bandwidth into core bits , said core encoder including an N-band filter bank that decomposes the core signal into N subbands (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) and N subband coders that generate the core bits , N subband decoders that reconstruct the N subband samples to form a reconstructed core signal , a summing node that forms a difference signal from the reconstructed core signal and the digital audio signal in a transform or subband domain ;
and an extension encoder that encodes the difference signal into extension bits , said extension encoder matching the core encoder over its audio bandwidth and comprising , a two band filter bank that splits the digital audio signal into lower and upper bands ;
a N-band filter bank equivalent to the core encoder' ;
s that decomposes the digital audio signal in the lower band (current residual spectrum) into N subbands , said summing node existing inside said extension encoder and comprising N subband nodes that subtract the reconstructed N subband samples from the digital audio signal' ;
s N subbands , respectively to form N difference subbands ;
N subband coders that code the N difference subbands to form the lower band extension bits ;
a M-band filter bank that decomposes the digital audio signal in the upper band into M subbands ;
and M subband coders that code the M subbands to form the upper band extension bits .

US6226616B1
CLAIM 12
. A multi-channel open-box audio decoder for reconstructing multiple audio channels from a bit stream , in which each audio channel was sampled at a known sampling rate and has an audio bandwidth , comprising : an unpacker for reading in and storing the bit stream a frame at a time , each of said frames including a core field having core bits and an extension fields having a sync word and extension bits , said unpacker extracting said core bits and detecting said sync word to extract and separate the extension bits ;
N core subband decoders that decode the core bits into N core subband signals (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) N extension subband decoders that decode the extension bits into a lower N extension subband signals ;
M extension subband decoders that decode the extension bits into an upper M extension subband signals ;
N summation nodes that sum the N core subband signals to the respective N extension subband signals to form N composite subband signals ;
and a filter that synthesizes the N composite subband signals and the M extension subband signals to reproduce a multi-channel audio signal .

US6226616B1
CLAIM 15
. An article of manufacture for use with an existing base of first generation audio decoders that are capable of reconstructing a core signal up to an audio bandwidth and sample resolution and a developing base of second generation audio decoders having a larger audio bandwidth , comprising : a portable machine readable storage (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) medium for use with said first and second generation audio decoders ;
and a single digital bit stream representing a multi-channel audio signal written onto said storage medium in a core plus extension format , said bit stream comprising a sequence of synchronized frames , each of said frames including a core field having a core sync word immediately proceeding core bits and an extension fields having an extension sync word immediately proceeding extension bits , said sequence of core bits defining a noise floor for the reconstructed core signal across the audio bandwidth of said first generation audio decoders , and said sequence of extension bits further refining the noise floor across the core encoder' ;
s audio bandwidth and defining a noise floor for the remainder of the audio bandwidth of the second generation audio decoders .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin (band signals, N subbands, readable storage) by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (band signals, N subbands, readable storage) so as to produce a summed long-term correlation map .
US6226616B1
CLAIM 9
. A multi-channel audio encoder for coding a digital audio signal sampled at a known sampling rate and having an audio bandwidth , comprising : a core encoder that extracts and codes a core signal from the digital audio signal over an audio bandwidth into core bits , said core encoder including an N-band filter bank that decomposes the core signal into N subbands (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) and N subband coders that generate the core bits , N subband decoders that reconstruct the N subband samples to form a reconstructed core signal , a summing node that forms a difference signal from the reconstructed core signal and the digital audio signal in a transform or subband domain ;
and an extension encoder that encodes the difference signal into extension bits , said extension encoder matching the core encoder over its audio bandwidth and comprising , a two band filter bank that splits the digital audio signal into lower and upper bands ;
a N-band filter bank equivalent to the core encoder' ;
s that decomposes the digital audio signal in the lower band into N subbands , said summing node existing inside said extension encoder and comprising N subband nodes that subtract the reconstructed N subband samples from the digital audio signal' ;
s N subbands , respectively to form N difference subbands ;
N subband coders that code the N difference subbands to form the lower band extension bits ;
a M-band filter bank that decomposes the digital audio signal in the upper band into M subbands ;
and M subband coders that code the M subbands to form the upper band extension bits .

US6226616B1
CLAIM 12
. A multi-channel open-box audio decoder for reconstructing multiple audio channels from a bit stream , in which each audio channel was sampled at a known sampling rate and has an audio bandwidth , comprising : an unpacker for reading in and storing the bit stream a frame at a time , each of said frames including a core field having core bits and an extension fields having a sync word and extension bits , said unpacker extracting said core bits and detecting said sync word to extract and separate the extension bits ;
N core subband decoders that decode the core bits into N core subband signals (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) N extension subband decoders that decode the extension bits into a lower N extension subband signals ;
M extension subband decoders that decode the extension bits into an upper M extension subband signals ;
N summation nodes that sum the N core subband signals to the respective N extension subband signals to form N composite subband signals ;
and a filter that synthesizes the N composite subband signals and the M extension subband signals to reproduce a multi-channel audio signal .

US6226616B1
CLAIM 15
. An article of manufacture for use with an existing base of first generation audio decoders that are capable of reconstructing a core signal up to an audio bandwidth and sample resolution and a developing base of second generation audio decoders having a larger audio bandwidth , comprising : a portable machine readable storage (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) medium for use with said first and second generation audio decoders ;
and a single digital bit stream representing a multi-channel audio signal written onto said storage medium in a core plus extension format , said bit stream comprising a sequence of synchronized frames , each of said frames including a core field having a core sync word immediately proceeding core bits and an extension fields having an extension sync word immediately proceeding extension bits , said sequence of core bits defining a noise floor for the reconstructed core signal across the audio bandwidth of said first generation audio decoders , and said sequence of extension bits further refining the noise floor across the core encoder' ;
s audio bandwidth and defining a noise floor for the remainder of the audio bandwidth of the second generation audio decoders .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map for frequency bins (band signals, N subbands, readable storage) having a magnitude that exceeds a given fixed threshold .
US6226616B1
CLAIM 9
. A multi-channel audio encoder for coding a digital audio signal sampled at a known sampling rate and having an audio bandwidth , comprising : a core encoder that extracts and codes a core signal from the digital audio signal over an audio bandwidth into core bits , said core encoder including an N-band filter bank that decomposes the core signal into N subbands (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) and N subband coders that generate the core bits , N subband decoders that reconstruct the N subband samples to form a reconstructed core signal , a summing node that forms a difference signal from the reconstructed core signal and the digital audio signal in a transform or subband domain ;
and an extension encoder that encodes the difference signal into extension bits , said extension encoder matching the core encoder over its audio bandwidth and comprising , a two band filter bank that splits the digital audio signal into lower and upper bands ;
a N-band filter bank equivalent to the core encoder' ;
s that decomposes the digital audio signal in the lower band into N subbands , said summing node existing inside said extension encoder and comprising N subband nodes that subtract the reconstructed N subband samples from the digital audio signal' ;
s N subbands , respectively to form N difference subbands ;
N subband coders that code the N difference subbands to form the lower band extension bits ;
a M-band filter bank that decomposes the digital audio signal in the upper band into M subbands ;
and M subband coders that code the M subbands to form the upper band extension bits .

US6226616B1
CLAIM 12
. A multi-channel open-box audio decoder for reconstructing multiple audio channels from a bit stream , in which each audio channel was sampled at a known sampling rate and has an audio bandwidth , comprising : an unpacker for reading in and storing the bit stream a frame at a time , each of said frames including a core field having core bits and an extension fields having a sync word and extension bits , said unpacker extracting said core bits and detecting said sync word to extract and separate the extension bits ;
N core subband decoders that decode the core bits into N core subband signals (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) N extension subband decoders that decode the extension bits into a lower N extension subband signals ;
M extension subband decoders that decode the extension bits into an upper M extension subband signals ;
N summation nodes that sum the N core subband signals to the respective N extension subband signals to form N composite subband signals ;
and a filter that synthesizes the N composite subband signals and the M extension subband signals to reproduce a multi-channel audio signal .

US6226616B1
CLAIM 15
. An article of manufacture for use with an existing base of first generation audio decoders that are capable of reconstructing a core signal up to an audio bandwidth and sample resolution and a developing base of second generation audio decoders having a larger audio bandwidth , comprising : a portable machine readable storage (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) medium for use with said first and second generation audio decoders ;
and a single digital bit stream representing a multi-channel audio signal written onto said storage medium in a core plus extension format , said bit stream comprising a sequence of synchronized frames , each of said frames including a core field having a core sync word immediately proceeding core bits and an extension fields having an extension sync word immediately proceeding extension bits , said sequence of core bits defining a noise floor for the reconstructed core signal across the audio bandwidth of said first generation audio decoders , and said sequence of extension bits further refining the noise floor across the core encoder' ;
s audio bandwidth and defining a noise floor for the remainder of the audio bandwidth of the second generation audio decoders .

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (transition band) from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US6226616B1
CLAIM 3
. The multi-channel audio encoder of claim 1 , wherein said core bits define a noise floor for the reconstructed core signal across its audio bandwidth , said extension bits being allocated at frequencies near a transition band (music signal) width of said decimation LPF and above to define a noise floor for the remainder of the extension encoder' ;
s audio bandwidth .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates (band signals, N subbands, readable storage) when a tonal sound signal is detected .
US6226616B1
CLAIM 9
. A multi-channel audio encoder for coding a digital audio signal sampled at a known sampling rate and having an audio bandwidth , comprising : a core encoder that extracts and codes a core signal from the digital audio signal over an audio bandwidth into core bits , said core encoder including an N-band filter bank that decomposes the core signal into N subbands (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) and N subband coders that generate the core bits , N subband decoders that reconstruct the N subband samples to form a reconstructed core signal , a summing node that forms a difference signal from the reconstructed core signal and the digital audio signal in a transform or subband domain ;
and an extension encoder that encodes the difference signal into extension bits , said extension encoder matching the core encoder over its audio bandwidth and comprising , a two band filter bank that splits the digital audio signal into lower and upper bands ;
a N-band filter bank equivalent to the core encoder' ;
s that decomposes the digital audio signal in the lower band into N subbands , said summing node existing inside said extension encoder and comprising N subband nodes that subtract the reconstructed N subband samples from the digital audio signal' ;
s N subbands , respectively to form N difference subbands ;
N subband coders that code the N difference subbands to form the lower band extension bits ;
a M-band filter bank that decomposes the digital audio signal in the upper band into M subbands ;
and M subband coders that code the M subbands to form the upper band extension bits .

US6226616B1
CLAIM 12
. A multi-channel open-box audio decoder for reconstructing multiple audio channels from a bit stream , in which each audio channel was sampled at a known sampling rate and has an audio bandwidth , comprising : an unpacker for reading in and storing the bit stream a frame at a time , each of said frames including a core field having core bits and an extension fields having a sync word and extension bits , said unpacker extracting said core bits and detecting said sync word to extract and separate the extension bits ;
N core subband decoders that decode the core bits into N core subband signals (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) N extension subband decoders that decode the extension bits into a lower N extension subband signals ;
M extension subband decoders that decode the extension bits into an upper M extension subband signals ;
N summation nodes that sum the N core subband signals to the respective N extension subband signals to form N composite subband signals ;
and a filter that synthesizes the N composite subband signals and the M extension subband signals to reproduce a multi-channel audio signal .

US6226616B1
CLAIM 15
. An article of manufacture for use with an existing base of first generation audio decoders that are capable of reconstructing a core signal up to an audio bandwidth and sample resolution and a developing base of second generation audio decoders having a larger audio bandwidth , comprising : a portable machine readable storage (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) medium for use with said first and second generation audio decoders ;
and a single digital bit stream representing a multi-channel audio signal written onto said storage medium in a core plus extension format , said bit stream comprising a sequence of synchronized frames , each of said frames including a core field having a core sync word immediately proceeding core bits and an extension fields having an extension sync word immediately proceeding extension bits , said sequence of core bits defining a noise floor for the reconstructed core signal across the audio bandwidth of said first generation audio decoders , and said sequence of extension bits further refining the noise floor across the core encoder' ;
s audio bandwidth and defining a noise floor for the remainder of the audio bandwidth of the second generation audio decoders .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates (band signals, N subbands, readable storage) calculated in a previous frame in a SNR calculation .
US6226616B1
CLAIM 9
. A multi-channel audio encoder for coding a digital audio signal sampled at a known sampling rate and having an audio bandwidth , comprising : a core encoder that extracts and codes a core signal from the digital audio signal over an audio bandwidth into core bits , said core encoder including an N-band filter bank that decomposes the core signal into N subbands (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) and N subband coders that generate the core bits , N subband decoders that reconstruct the N subband samples to form a reconstructed core signal , a summing node that forms a difference signal from the reconstructed core signal and the digital audio signal in a transform or subband domain ;
and an extension encoder that encodes the difference signal into extension bits , said extension encoder matching the core encoder over its audio bandwidth and comprising , a two band filter bank that splits the digital audio signal into lower and upper bands ;
a N-band filter bank equivalent to the core encoder' ;
s that decomposes the digital audio signal in the lower band into N subbands , said summing node existing inside said extension encoder and comprising N subband nodes that subtract the reconstructed N subband samples from the digital audio signal' ;
s N subbands , respectively to form N difference subbands ;
N subband coders that code the N difference subbands to form the lower band extension bits ;
a M-band filter bank that decomposes the digital audio signal in the upper band into M subbands ;
and M subband coders that code the M subbands to form the upper band extension bits .

US6226616B1
CLAIM 12
. A multi-channel open-box audio decoder for reconstructing multiple audio channels from a bit stream , in which each audio channel was sampled at a known sampling rate and has an audio bandwidth , comprising : an unpacker for reading in and storing the bit stream a frame at a time , each of said frames including a core field having core bits and an extension fields having a sync word and extension bits , said unpacker extracting said core bits and detecting said sync word to extract and separate the extension bits ;
N core subband decoders that decode the core bits into N core subband signals (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) N extension subband decoders that decode the extension bits into a lower N extension subband signals ;
M extension subband decoders that decode the extension bits into an upper M extension subband signals ;
N summation nodes that sum the N core subband signals to the respective N extension subband signals to form N composite subband signals ;
and a filter that synthesizes the N composite subband signals and the M extension subband signals to reproduce a multi-channel audio signal .

US6226616B1
CLAIM 15
. An article of manufacture for use with an existing base of first generation audio decoders that are capable of reconstructing a core signal up to an audio bandwidth and sample resolution and a developing base of second generation audio decoders having a larger audio bandwidth , comprising : a portable machine readable storage (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) medium for use with said first and second generation audio decoders ;
and a single digital bit stream representing a multi-channel audio signal written onto said storage medium in a core plus extension format , said bit stream comprising a sequence of synchronized frames , each of said frames including a core field having a core sync word immediately proceeding core bits and an extension fields having an extension sync word immediately proceeding extension bits , said sequence of core bits defining a noise floor for the reconstructed core signal across the audio bandwidth of said first generation audio decoders , and said sequence of extension bits further refining the noise floor across the core encoder' ;
s audio bandwidth and defining a noise floor for the remainder of the audio bandwidth of the second generation audio decoders .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection further comprises updating the noise estimates (band signals, N subbands, readable storage) for a next frame .
US6226616B1
CLAIM 9
. A multi-channel audio encoder for coding a digital audio signal sampled at a known sampling rate and having an audio bandwidth , comprising : a core encoder that extracts and codes a core signal from the digital audio signal over an audio bandwidth into core bits , said core encoder including an N-band filter bank that decomposes the core signal into N subbands (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) and N subband coders that generate the core bits , N subband decoders that reconstruct the N subband samples to form a reconstructed core signal , a summing node that forms a difference signal from the reconstructed core signal and the digital audio signal in a transform or subband domain ;
and an extension encoder that encodes the difference signal into extension bits , said extension encoder matching the core encoder over its audio bandwidth and comprising , a two band filter bank that splits the digital audio signal into lower and upper bands ;
a N-band filter bank equivalent to the core encoder' ;
s that decomposes the digital audio signal in the lower band into N subbands , said summing node existing inside said extension encoder and comprising N subband nodes that subtract the reconstructed N subband samples from the digital audio signal' ;
s N subbands , respectively to form N difference subbands ;
N subband coders that code the N difference subbands to form the lower band extension bits ;
a M-band filter bank that decomposes the digital audio signal in the upper band into M subbands ;
and M subband coders that code the M subbands to form the upper band extension bits .

US6226616B1
CLAIM 12
. A multi-channel open-box audio decoder for reconstructing multiple audio channels from a bit stream , in which each audio channel was sampled at a known sampling rate and has an audio bandwidth , comprising : an unpacker for reading in and storing the bit stream a frame at a time , each of said frames including a core field having core bits and an extension fields having a sync word and extension bits , said unpacker extracting said core bits and detecting said sync word to extract and separate the extension bits ;
N core subband decoders that decode the core bits into N core subband signals (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) N extension subband decoders that decode the extension bits into a lower N extension subband signals ;
M extension subband decoders that decode the extension bits into an upper M extension subband signals ;
N summation nodes that sum the N core subband signals to the respective N extension subband signals to form N composite subband signals ;
and a filter that synthesizes the N composite subband signals and the M extension subband signals to reproduce a multi-channel audio signal .

US6226616B1
CLAIM 15
. An article of manufacture for use with an existing base of first generation audio decoders that are capable of reconstructing a core signal up to an audio bandwidth and sample resolution and a developing base of second generation audio decoders having a larger audio bandwidth , comprising : a portable machine readable storage (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) medium for use with said first and second generation audio decoders ;
and a single digital bit stream representing a multi-channel audio signal written onto said storage medium in a core plus extension format , said bit stream comprising a sequence of synchronized frames , each of said frames including a core field having a core sync word immediately proceeding core bits and an extension fields having an extension sync word immediately proceeding extension bits , said sequence of core bits defining a noise floor for the reconstructed core signal across the audio bandwidth of said first generation audio decoders , and said sequence of extension bits further refining the noise floor across the core encoder' ;
s audio bandwidth and defining a noise floor for the remainder of the audio bandwidth of the second generation audio decoders .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates (band signals, N subbands, readable storage) for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
US6226616B1
CLAIM 9
. A multi-channel audio encoder for coding a digital audio signal sampled at a known sampling rate and having an audio bandwidth , comprising : a core encoder that extracts and codes a core signal from the digital audio signal over an audio bandwidth into core bits , said core encoder including an N-band filter bank that decomposes the core signal into N subbands (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) and N subband coders that generate the core bits , N subband decoders that reconstruct the N subband samples to form a reconstructed core signal , a summing node that forms a difference signal from the reconstructed core signal and the digital audio signal in a transform or subband domain ;
and an extension encoder that encodes the difference signal into extension bits , said extension encoder matching the core encoder over its audio bandwidth and comprising , a two band filter bank that splits the digital audio signal into lower and upper bands ;
a N-band filter bank equivalent to the core encoder' ;
s that decomposes the digital audio signal in the lower band into N subbands , said summing node existing inside said extension encoder and comprising N subband nodes that subtract the reconstructed N subband samples from the digital audio signal' ;
s N subbands , respectively to form N difference subbands ;
N subband coders that code the N difference subbands to form the lower band extension bits ;
a M-band filter bank that decomposes the digital audio signal in the upper band into M subbands ;
and M subband coders that code the M subbands to form the upper band extension bits .

US6226616B1
CLAIM 12
. A multi-channel open-box audio decoder for reconstructing multiple audio channels from a bit stream , in which each audio channel was sampled at a known sampling rate and has an audio bandwidth , comprising : an unpacker for reading in and storing the bit stream a frame at a time , each of said frames including a core field having core bits and an extension fields having a sync word and extension bits , said unpacker extracting said core bits and detecting said sync word to extract and separate the extension bits ;
N core subband decoders that decode the core bits into N core subband signals (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) N extension subband decoders that decode the extension bits into a lower N extension subband signals ;
M extension subband decoders that decode the extension bits into an upper M extension subband signals ;
N summation nodes that sum the N core subband signals to the respective N extension subband signals to form N composite subband signals ;
and a filter that synthesizes the N composite subband signals and the M extension subband signals to reproduce a multi-channel audio signal .

US6226616B1
CLAIM 15
. An article of manufacture for use with an existing base of first generation audio decoders that are capable of reconstructing a core signal up to an audio bandwidth and sample resolution and a developing base of second generation audio decoders having a larger audio bandwidth , comprising : a portable machine readable storage (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) medium for use with said first and second generation audio decoders ;
and a single digital bit stream representing a multi-channel audio signal written onto said storage medium in a core plus extension format , said bit stream comprising a sequence of synchronized frames , each of said frames including a core field having a core sync word immediately proceeding core bits and an extension fields having an extension sync word immediately proceeding extension bits , said sequence of core bits defining a noise floor for the reconstructed core signal across the audio bandwidth of said first generation audio decoders , and said sequence of extension bits further refining the noise floor across the core encoder' ;
s audio bandwidth and defining a noise floor for the remainder of the audio bandwidth of the second generation audio decoders .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal prevents updating of noise energy estimates (band signals, N subbands, readable storage) when a music signal (transition band) is detected .
US6226616B1
CLAIM 3
. The multi-channel audio encoder of claim 1 , wherein said core bits define a noise floor for the reconstructed core signal across its audio bandwidth , said extension bits being allocated at frequencies near a transition band (music signal) width of said decimation LPF and above to define a noise floor for the remainder of the extension encoder' ;
s audio bandwidth .

US6226616B1
CLAIM 9
. A multi-channel audio encoder for coding a digital audio signal sampled at a known sampling rate and having an audio bandwidth , comprising : a core encoder that extracts and codes a core signal from the digital audio signal over an audio bandwidth into core bits , said core encoder including an N-band filter bank that decomposes the core signal into N subbands (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) and N subband coders that generate the core bits , N subband decoders that reconstruct the N subband samples to form a reconstructed core signal , a summing node that forms a difference signal from the reconstructed core signal and the digital audio signal in a transform or subband domain ;
and an extension encoder that encodes the difference signal into extension bits , said extension encoder matching the core encoder over its audio bandwidth and comprising , a two band filter bank that splits the digital audio signal into lower and upper bands ;
a N-band filter bank equivalent to the core encoder' ;
s that decomposes the digital audio signal in the lower band into N subbands , said summing node existing inside said extension encoder and comprising N subband nodes that subtract the reconstructed N subband samples from the digital audio signal' ;
s N subbands , respectively to form N difference subbands ;
N subband coders that code the N difference subbands to form the lower band extension bits ;
a M-band filter bank that decomposes the digital audio signal in the upper band into M subbands ;
and M subband coders that code the M subbands to form the upper band extension bits .

US6226616B1
CLAIM 12
. A multi-channel open-box audio decoder for reconstructing multiple audio channels from a bit stream , in which each audio channel was sampled at a known sampling rate and has an audio bandwidth , comprising : an unpacker for reading in and storing the bit stream a frame at a time , each of said frames including a core field having core bits and an extension fields having a sync word and extension bits , said unpacker extracting said core bits and detecting said sync word to extract and separate the extension bits ;
N core subband decoders that decode the core bits into N core subband signals (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) N extension subband decoders that decode the extension bits into a lower N extension subband signals ;
M extension subband decoders that decode the extension bits into an upper M extension subband signals ;
N summation nodes that sum the N core subband signals to the respective N extension subband signals to form N composite subband signals ;
and a filter that synthesizes the N composite subband signals and the M extension subband signals to reproduce a multi-channel audio signal .

US6226616B1
CLAIM 15
. An article of manufacture for use with an existing base of first generation audio decoders that are capable of reconstructing a core signal up to an audio bandwidth and sample resolution and a developing base of second generation audio decoders having a larger audio bandwidth , comprising : a portable machine readable storage (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) medium for use with said first and second generation audio decoders ;
and a single digital bit stream representing a multi-channel audio signal written onto said storage medium in a core plus extension format , said bit stream comprising a sequence of synchronized frames , each of said frames including a core field having a core sync word immediately proceeding core bits and an extension fields having an extension sync word immediately proceeding extension bits , said sequence of core bits defining a noise floor for the reconstructed core signal across the audio bandwidth of said first generation audio decoders , and said sequence of extension bits further refining the noise floor across the core encoder' ;
s audio bandwidth and defining a noise floor for the remainder of the audio bandwidth of the second generation audio decoders .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal (transition band) from a background noise signal and prevent update of noise energy estimates (band signals, N subbands, readable storage) on the music signal .
US6226616B1
CLAIM 3
. The multi-channel audio encoder of claim 1 , wherein said core bits define a noise floor for the reconstructed core signal across its audio bandwidth , said extension bits being allocated at frequencies near a transition band (music signal) width of said decimation LPF and above to define a noise floor for the remainder of the extension encoder' ;
s audio bandwidth .

US6226616B1
CLAIM 9
. A multi-channel audio encoder for coding a digital audio signal sampled at a known sampling rate and having an audio bandwidth , comprising : a core encoder that extracts and codes a core signal from the digital audio signal over an audio bandwidth into core bits , said core encoder including an N-band filter bank that decomposes the core signal into N subbands (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) and N subband coders that generate the core bits , N subband decoders that reconstruct the N subband samples to form a reconstructed core signal , a summing node that forms a difference signal from the reconstructed core signal and the digital audio signal in a transform or subband domain ;
and an extension encoder that encodes the difference signal into extension bits , said extension encoder matching the core encoder over its audio bandwidth and comprising , a two band filter bank that splits the digital audio signal into lower and upper bands ;
a N-band filter bank equivalent to the core encoder' ;
s that decomposes the digital audio signal in the lower band into N subbands , said summing node existing inside said extension encoder and comprising N subband nodes that subtract the reconstructed N subband samples from the digital audio signal' ;
s N subbands , respectively to form N difference subbands ;
N subband coders that code the N difference subbands to form the lower band extension bits ;
a M-band filter bank that decomposes the digital audio signal in the upper band into M subbands ;
and M subband coders that code the M subbands to form the upper band extension bits .

US6226616B1
CLAIM 12
. A multi-channel open-box audio decoder for reconstructing multiple audio channels from a bit stream , in which each audio channel was sampled at a known sampling rate and has an audio bandwidth , comprising : an unpacker for reading in and storing the bit stream a frame at a time , each of said frames including a core field having core bits and an extension fields having a sync word and extension bits , said unpacker extracting said core bits and detecting said sync word to extract and separate the extension bits ;
N core subband decoders that decode the core bits into N core subband signals (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) N extension subband decoders that decode the extension bits into a lower N extension subband signals ;
M extension subband decoders that decode the extension bits into an upper M extension subband signals ;
N summation nodes that sum the N core subband signals to the respective N extension subband signals to form N composite subband signals ;
and a filter that synthesizes the N composite subband signals and the M extension subband signals to reproduce a multi-channel audio signal .

US6226616B1
CLAIM 15
. An article of manufacture for use with an existing base of first generation audio decoders that are capable of reconstructing a core signal up to an audio bandwidth and sample resolution and a developing base of second generation audio decoders having a larger audio bandwidth , comprising : a portable machine readable storage (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) medium for use with said first and second generation audio decoders ;
and a single digital bit stream representing a multi-channel audio signal written onto said storage medium in a core plus extension format , said bit stream comprising a sequence of synchronized frames , each of said frames including a core field having a core sync word immediately proceeding core bits and an extension fields having an extension sync word immediately proceeding extension bits , said sequence of core bits defining a noise floor for the reconstructed core signal across the audio bandwidth of said first generation audio decoders , and said sequence of extension bits further refining the noise floor across the core encoder' ;
s audio bandwidth and defining a noise floor for the remainder of the audio bandwidth of the second generation audio decoders .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame and an energy of the sound signal in a previous frame , for frequency bands (band signals, N subbands, readable storage) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US6226616B1
CLAIM 9
. A multi-channel audio encoder for coding a digital audio signal sampled at a known sampling rate and having an audio bandwidth , comprising : a core encoder that extracts and codes a core signal from the digital audio signal over an audio bandwidth into core bits , said core encoder including an N-band filter bank that decomposes the core signal into N subbands (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) and N subband coders that generate the core bits , N subband decoders that reconstruct the N subband samples to form a reconstructed core signal , a summing node that forms a difference signal from the reconstructed core signal and the digital audio signal in a transform or subband domain ;
and an extension encoder that encodes the difference signal into extension bits , said extension encoder matching the core encoder over its audio bandwidth and comprising , a two band filter bank that splits the digital audio signal into lower and upper bands ;
a N-band filter bank equivalent to the core encoder' ;
s that decomposes the digital audio signal in the lower band into N subbands , said summing node existing inside said extension encoder and comprising N subband nodes that subtract the reconstructed N subband samples from the digital audio signal' ;
s N subbands , respectively to form N difference subbands ;
N subband coders that code the N difference subbands to form the lower band extension bits ;
a M-band filter bank that decomposes the digital audio signal in the upper band into M subbands ;
and M subband coders that code the M subbands to form the upper band extension bits .

US6226616B1
CLAIM 12
. A multi-channel open-box audio decoder for reconstructing multiple audio channels from a bit stream , in which each audio channel was sampled at a known sampling rate and has an audio bandwidth , comprising : an unpacker for reading in and storing the bit stream a frame at a time , each of said frames including a core field having core bits and an extension fields having a sync word and extension bits , said unpacker extracting said core bits and detecting said sync word to extract and separate the extension bits ;
N core subband decoders that decode the core bits into N core subband signals (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) N extension subband decoders that decode the extension bits into a lower N extension subband signals ;
M extension subband decoders that decode the extension bits into an upper M extension subband signals ;
N summation nodes that sum the N core subband signals to the respective N extension subband signals to form N composite subband signals ;
and a filter that synthesizes the N composite subband signals and the M extension subband signals to reproduce a multi-channel audio signal .

US6226616B1
CLAIM 15
. An article of manufacture for use with an existing base of first generation audio decoders that are capable of reconstructing a core signal up to an audio bandwidth and sample resolution and a developing base of second generation audio decoders having a larger audio bandwidth , comprising : a portable machine readable storage (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) medium for use with said first and second generation audio decoders ;
and a single digital bit stream representing a multi-channel audio signal written onto said storage medium in a core plus extension format , said bit stream comprising a sequence of synchronized frames , each of said frames including a core field having a core sync word immediately proceeding core bits and an extension fields having an extension sync word immediately proceeding extension bits , said sequence of core bits defining a noise floor for the reconstructed core signal across the audio bandwidth of said first generation audio decoders , and said sequence of extension bits further refining the noise floor across the core encoder' ;
s audio bandwidth and defining a noise floor for the remainder of the audio bandwidth of the second generation audio decoders .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates (band signals, N subbands, readable storage) is prevented in response to having simultaneously the activity prediction parameter larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US6226616B1
CLAIM 9
. A multi-channel audio encoder for coding a digital audio signal sampled at a known sampling rate and having an audio bandwidth , comprising : a core encoder that extracts and codes a core signal from the digital audio signal over an audio bandwidth into core bits , said core encoder including an N-band filter bank that decomposes the core signal into N subbands (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) and N subband coders that generate the core bits , N subband decoders that reconstruct the N subband samples to form a reconstructed core signal , a summing node that forms a difference signal from the reconstructed core signal and the digital audio signal in a transform or subband domain ;
and an extension encoder that encodes the difference signal into extension bits , said extension encoder matching the core encoder over its audio bandwidth and comprising , a two band filter bank that splits the digital audio signal into lower and upper bands ;
a N-band filter bank equivalent to the core encoder' ;
s that decomposes the digital audio signal in the lower band into N subbands , said summing node existing inside said extension encoder and comprising N subband nodes that subtract the reconstructed N subband samples from the digital audio signal' ;
s N subbands , respectively to form N difference subbands ;
N subband coders that code the N difference subbands to form the lower band extension bits ;
a M-band filter bank that decomposes the digital audio signal in the upper band into M subbands ;
and M subband coders that code the M subbands to form the upper band extension bits .

US6226616B1
CLAIM 12
. A multi-channel open-box audio decoder for reconstructing multiple audio channels from a bit stream , in which each audio channel was sampled at a known sampling rate and has an audio bandwidth , comprising : an unpacker for reading in and storing the bit stream a frame at a time , each of said frames including a core field having core bits and an extension fields having a sync word and extension bits , said unpacker extracting said core bits and detecting said sync word to extract and separate the extension bits ;
N core subband decoders that decode the core bits into N core subband signals (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) N extension subband decoders that decode the extension bits into a lower N extension subband signals ;
M extension subband decoders that decode the extension bits into an upper M extension subband signals ;
N summation nodes that sum the N core subband signals to the respective N extension subband signals to form N composite subband signals ;
and a filter that synthesizes the N composite subband signals and the M extension subband signals to reproduce a multi-channel audio signal .

US6226616B1
CLAIM 15
. An article of manufacture for use with an existing base of first generation audio decoders that are capable of reconstructing a core signal up to an audio bandwidth and sample resolution and a developing base of second generation audio decoders having a larger audio bandwidth , comprising : a portable machine readable storage (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) medium for use with said first and second generation audio decoders ;
and a single digital bit stream representing a multi-channel audio signal written onto said storage medium in a core plus extension format , said bit stream comprising a sequence of synchronized frames , each of said frames including a core field having a core sync word immediately proceeding core bits and an extension fields having an extension sync word immediately proceeding extension bits , said sequence of core bits defining a noise floor for the reconstructed core signal across the audio bandwidth of said first generation audio decoders , and said sequence of extension bits further refining the noise floor across the core encoder' ;
s audio bandwidth and defining a noise floor for the remainder of the audio bandwidth of the second generation audio decoders .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (band signals, N subbands, readable storage) into a first group (band signals, N subbands, readable storage) of a certain number of first frequency (band signals, N subbands, readable storage) bands and a second group of a rest of the frequency bands ;

calculating a first energy (band signals, N subbands, readable storage) value (band signals, N subbands, readable storage) for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US6226616B1
CLAIM 9
. A multi-channel audio encoder for coding a digital audio signal sampled at a known sampling rate and having an audio bandwidth , comprising : a core encoder that extracts and codes a core signal from the digital audio signal over an audio bandwidth into core bits , said core encoder including an N-band filter bank that decomposes the core signal into N subbands (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) and N subband coders that generate the core bits , N subband decoders that reconstruct the N subband samples to form a reconstructed core signal , a summing node that forms a difference signal from the reconstructed core signal and the digital audio signal in a transform or subband domain ;
and an extension encoder that encodes the difference signal into extension bits , said extension encoder matching the core encoder over its audio bandwidth and comprising , a two band filter bank that splits the digital audio signal into lower and upper bands ;
a N-band filter bank equivalent to the core encoder' ;
s that decomposes the digital audio signal in the lower band into N subbands , said summing node existing inside said extension encoder and comprising N subband nodes that subtract the reconstructed N subband samples from the digital audio signal' ;
s N subbands , respectively to form N difference subbands ;
N subband coders that code the N difference subbands to form the lower band extension bits ;
a M-band filter bank that decomposes the digital audio signal in the upper band into M subbands ;
and M subband coders that code the M subbands to form the upper band extension bits .

US6226616B1
CLAIM 12
. A multi-channel open-box audio decoder for reconstructing multiple audio channels from a bit stream , in which each audio channel was sampled at a known sampling rate and has an audio bandwidth , comprising : an unpacker for reading in and storing the bit stream a frame at a time , each of said frames including a core field having core bits and an extension fields having a sync word and extension bits , said unpacker extracting said core bits and detecting said sync word to extract and separate the extension bits ;
N core subband decoders that decode the core bits into N core subband signals (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) N extension subband decoders that decode the extension bits into a lower N extension subband signals ;
M extension subband decoders that decode the extension bits into an upper M extension subband signals ;
N summation nodes that sum the N core subband signals to the respective N extension subband signals to form N composite subband signals ;
and a filter that synthesizes the N composite subband signals and the M extension subband signals to reproduce a multi-channel audio signal .

US6226616B1
CLAIM 15
. An article of manufacture for use with an existing base of first generation audio decoders that are capable of reconstructing a core signal up to an audio bandwidth and sample resolution and a developing base of second generation audio decoders having a larger audio bandwidth , comprising : a portable machine readable storage (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) medium for use with said first and second generation audio decoders ;
and a single digital bit stream representing a multi-channel audio signal written onto said storage medium in a core plus extension format , said bit stream comprising a sequence of synchronized frames , each of said frames including a core field having a core sync word immediately proceeding core bits and an extension fields having an extension sync word immediately proceeding extension bits , said sequence of core bits defining a noise floor for the reconstructed core signal across the audio bandwidth of said first generation audio decoders , and said sequence of extension bits further refining the noise floor across the core encoder' ;
s audio bandwidth and defining a noise floor for the remainder of the audio bandwidth of the second generation audio decoders .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates (band signals, N subbands, readable storage) is prevented in response to having the noise character parameter inferior than a given fixed threshold .
US6226616B1
CLAIM 9
. A multi-channel audio encoder for coding a digital audio signal sampled at a known sampling rate and having an audio bandwidth , comprising : a core encoder that extracts and codes a core signal from the digital audio signal over an audio bandwidth into core bits , said core encoder including an N-band filter bank that decomposes the core signal into N subbands (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) and N subband coders that generate the core bits , N subband decoders that reconstruct the N subband samples to form a reconstructed core signal , a summing node that forms a difference signal from the reconstructed core signal and the digital audio signal in a transform or subband domain ;
and an extension encoder that encodes the difference signal into extension bits , said extension encoder matching the core encoder over its audio bandwidth and comprising , a two band filter bank that splits the digital audio signal into lower and upper bands ;
a N-band filter bank equivalent to the core encoder' ;
s that decomposes the digital audio signal in the lower band into N subbands , said summing node existing inside said extension encoder and comprising N subband nodes that subtract the reconstructed N subband samples from the digital audio signal' ;
s N subbands , respectively to form N difference subbands ;
N subband coders that code the N difference subbands to form the lower band extension bits ;
a M-band filter bank that decomposes the digital audio signal in the upper band into M subbands ;
and M subband coders that code the M subbands to form the upper band extension bits .

US6226616B1
CLAIM 12
. A multi-channel open-box audio decoder for reconstructing multiple audio channels from a bit stream , in which each audio channel was sampled at a known sampling rate and has an audio bandwidth , comprising : an unpacker for reading in and storing the bit stream a frame at a time , each of said frames including a core field having core bits and an extension fields having a sync word and extension bits , said unpacker extracting said core bits and detecting said sync word to extract and separate the extension bits ;
N core subband decoders that decode the core bits into N core subband signals (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) N extension subband decoders that decode the extension bits into a lower N extension subband signals ;
M extension subband decoders that decode the extension bits into an upper M extension subband signals ;
N summation nodes that sum the N core subband signals to the respective N extension subband signals to form N composite subband signals ;
and a filter that synthesizes the N composite subband signals and the M extension subband signals to reproduce a multi-channel audio signal .

US6226616B1
CLAIM 15
. An article of manufacture for use with an existing base of first generation audio decoders that are capable of reconstructing a core signal up to an audio bandwidth and sample resolution and a developing base of second generation audio decoders having a larger audio bandwidth , comprising : a portable machine readable storage (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) medium for use with said first and second generation audio decoders ;
and a single digital bit stream representing a multi-channel audio signal written onto said storage medium in a core plus extension format , said bit stream comprising a sequence of synchronized frames , each of said frames including a core field having a core sync word immediately proceeding core bits and an extension fields having an extension sync word immediately proceeding extension bits , said sequence of core bits defining a noise floor for the reconstructed core signal across the audio bandwidth of said first generation audio decoders , and said sequence of extension bits further refining the noise floor across the core encoder' ;
s audio bandwidth and defining a noise floor for the remainder of the audio bandwidth of the second generation audio decoders .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (band signals, N subbands, readable storage) of the sound signal , the device comprising : means for calculating a current residual spectrum (lower band) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US6226616B1
CLAIM 9
. A multi-channel audio encoder for coding a digital audio signal sampled at a known sampling rate and having an audio bandwidth , comprising : a core encoder that extracts and codes a core signal from the digital audio signal over an audio bandwidth into core bits , said core encoder including an N-band filter bank that decomposes the core signal into N subbands (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) and N subband coders that generate the core bits , N subband decoders that reconstruct the N subband samples to form a reconstructed core signal , a summing node that forms a difference signal from the reconstructed core signal and the digital audio signal in a transform or subband domain ;
and an extension encoder that encodes the difference signal into extension bits , said extension encoder matching the core encoder over its audio bandwidth and comprising , a two band filter bank that splits the digital audio signal into lower and upper bands ;
a N-band filter bank equivalent to the core encoder' ;
s that decomposes the digital audio signal in the lower band (current residual spectrum) into N subbands , said summing node existing inside said extension encoder and comprising N subband nodes that subtract the reconstructed N subband samples from the digital audio signal' ;
s N subbands , respectively to form N difference subbands ;
N subband coders that code the N difference subbands to form the lower band extension bits ;
a M-band filter bank that decomposes the digital audio signal in the upper band into M subbands ;
and M subband coders that code the M subbands to form the upper band extension bits .

US6226616B1
CLAIM 12
. A multi-channel open-box audio decoder for reconstructing multiple audio channels from a bit stream , in which each audio channel was sampled at a known sampling rate and has an audio bandwidth , comprising : an unpacker for reading in and storing the bit stream a frame at a time , each of said frames including a core field having core bits and an extension fields having a sync word and extension bits , said unpacker extracting said core bits and detecting said sync word to extract and separate the extension bits ;
N core subband decoders that decode the core bits into N core subband signals (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) N extension subband decoders that decode the extension bits into a lower N extension subband signals ;
M extension subband decoders that decode the extension bits into an upper M extension subband signals ;
N summation nodes that sum the N core subband signals to the respective N extension subband signals to form N composite subband signals ;
and a filter that synthesizes the N composite subband signals and the M extension subband signals to reproduce a multi-channel audio signal .

US6226616B1
CLAIM 15
. An article of manufacture for use with an existing base of first generation audio decoders that are capable of reconstructing a core signal up to an audio bandwidth and sample resolution and a developing base of second generation audio decoders having a larger audio bandwidth , comprising : a portable machine readable storage (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) medium for use with said first and second generation audio decoders ;
and a single digital bit stream representing a multi-channel audio signal written onto said storage medium in a core plus extension format , said bit stream comprising a sequence of synchronized frames , each of said frames including a core field having a core sync word immediately proceeding core bits and an extension fields having an extension sync word immediately proceeding extension bits , said sequence of core bits defining a noise floor for the reconstructed core signal across the audio bandwidth of said first generation audio decoders , and said sequence of extension bits further refining the noise floor across the core encoder' ;
s audio bandwidth and defining a noise floor for the remainder of the audio bandwidth of the second generation audio decoders .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (band signals, N subbands, readable storage) of the sound signal , the device comprising : a calculator of a current residual spectrum (lower band) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US6226616B1
CLAIM 9
. A multi-channel audio encoder for coding a digital audio signal sampled at a known sampling rate and having an audio bandwidth , comprising : a core encoder that extracts and codes a core signal from the digital audio signal over an audio bandwidth into core bits , said core encoder including an N-band filter bank that decomposes the core signal into N subbands (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) and N subband coders that generate the core bits , N subband decoders that reconstruct the N subband samples to form a reconstructed core signal , a summing node that forms a difference signal from the reconstructed core signal and the digital audio signal in a transform or subband domain ;
and an extension encoder that encodes the difference signal into extension bits , said extension encoder matching the core encoder over its audio bandwidth and comprising , a two band filter bank that splits the digital audio signal into lower and upper bands ;
a N-band filter bank equivalent to the core encoder' ;
s that decomposes the digital audio signal in the lower band (current residual spectrum) into N subbands , said summing node existing inside said extension encoder and comprising N subband nodes that subtract the reconstructed N subband samples from the digital audio signal' ;
s N subbands , respectively to form N difference subbands ;
N subband coders that code the N difference subbands to form the lower band extension bits ;
a M-band filter bank that decomposes the digital audio signal in the upper band into M subbands ;
and M subband coders that code the M subbands to form the upper band extension bits .

US6226616B1
CLAIM 12
. A multi-channel open-box audio decoder for reconstructing multiple audio channels from a bit stream , in which each audio channel was sampled at a known sampling rate and has an audio bandwidth , comprising : an unpacker for reading in and storing the bit stream a frame at a time , each of said frames including a core field having core bits and an extension fields having a sync word and extension bits , said unpacker extracting said core bits and detecting said sync word to extract and separate the extension bits ;
N core subband decoders that decode the core bits into N core subband signals (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) N extension subband decoders that decode the extension bits into a lower N extension subband signals ;
M extension subband decoders that decode the extension bits into an upper M extension subband signals ;
N summation nodes that sum the N core subband signals to the respective N extension subband signals to form N composite subband signals ;
and a filter that synthesizes the N composite subband signals and the M extension subband signals to reproduce a multi-channel audio signal .

US6226616B1
CLAIM 15
. An article of manufacture for use with an existing base of first generation audio decoders that are capable of reconstructing a core signal up to an audio bandwidth and sample resolution and a developing base of second generation audio decoders having a larger audio bandwidth , comprising : a portable machine readable storage (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) medium for use with said first and second generation audio decoders ;
and a single digital bit stream representing a multi-channel audio signal written onto said storage medium in a core plus extension format , said bit stream comprising a sequence of synchronized frames , each of said frames including a core field having a core sync word immediately proceeding core bits and an extension fields having an extension sync word immediately proceeding extension bits , said sequence of core bits defining a noise floor for the reconstructed core signal across the audio bandwidth of said first generation audio decoders , and said sequence of extension bits further refining the noise floor across the core encoder' ;
s audio bandwidth and defining a noise floor for the remainder of the audio bandwidth of the second generation audio decoders .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum (lower band) comprises : a locator of the minima in the frequency spectrum (band signals, N subbands, readable storage) of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US6226616B1
CLAIM 9
. A multi-channel audio encoder for coding a digital audio signal sampled at a known sampling rate and having an audio bandwidth , comprising : a core encoder that extracts and codes a core signal from the digital audio signal over an audio bandwidth into core bits , said core encoder including an N-band filter bank that decomposes the core signal into N subbands (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) and N subband coders that generate the core bits , N subband decoders that reconstruct the N subband samples to form a reconstructed core signal , a summing node that forms a difference signal from the reconstructed core signal and the digital audio signal in a transform or subband domain ;
and an extension encoder that encodes the difference signal into extension bits , said extension encoder matching the core encoder over its audio bandwidth and comprising , a two band filter bank that splits the digital audio signal into lower and upper bands ;
a N-band filter bank equivalent to the core encoder' ;
s that decomposes the digital audio signal in the lower band (current residual spectrum) into N subbands , said summing node existing inside said extension encoder and comprising N subband nodes that subtract the reconstructed N subband samples from the digital audio signal' ;
s N subbands , respectively to form N difference subbands ;
N subband coders that code the N difference subbands to form the lower band extension bits ;
a M-band filter bank that decomposes the digital audio signal in the upper band into M subbands ;
and M subband coders that code the M subbands to form the upper band extension bits .

US6226616B1
CLAIM 12
. A multi-channel open-box audio decoder for reconstructing multiple audio channels from a bit stream , in which each audio channel was sampled at a known sampling rate and has an audio bandwidth , comprising : an unpacker for reading in and storing the bit stream a frame at a time , each of said frames including a core field having core bits and an extension fields having a sync word and extension bits , said unpacker extracting said core bits and detecting said sync word to extract and separate the extension bits ;
N core subband decoders that decode the core bits into N core subband signals (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) N extension subband decoders that decode the extension bits into a lower N extension subband signals ;
M extension subband decoders that decode the extension bits into an upper M extension subband signals ;
N summation nodes that sum the N core subband signals to the respective N extension subband signals to form N composite subband signals ;
and a filter that synthesizes the N composite subband signals and the M extension subband signals to reproduce a multi-channel audio signal .

US6226616B1
CLAIM 15
. An article of manufacture for use with an existing base of first generation audio decoders that are capable of reconstructing a core signal up to an audio bandwidth and sample resolution and a developing base of second generation audio decoders having a larger audio bandwidth , comprising : a portable machine readable storage (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) medium for use with said first and second generation audio decoders ;
and a single digital bit stream representing a multi-channel audio signal written onto said storage medium in a core plus extension format , said bit stream comprising a sequence of synchronized frames , each of said frames including a core field having a core sync word immediately proceeding core bits and an extension fields having an extension sync word immediately proceeding extension bits , said sequence of core bits defining a noise floor for the reconstructed core signal across the audio bandwidth of said first generation audio decoders , and said sequence of extension bits further refining the noise floor across the core encoder' ;
s audio bandwidth and defining a noise floor for the remainder of the audio bandwidth of the second generation audio decoders .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin (band signals, N subbands, readable storage) by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (band signals, N subbands, readable storage) so as to produce a summed long-term correlation map .
US6226616B1
CLAIM 9
. A multi-channel audio encoder for coding a digital audio signal sampled at a known sampling rate and having an audio bandwidth , comprising : a core encoder that extracts and codes a core signal from the digital audio signal over an audio bandwidth into core bits , said core encoder including an N-band filter bank that decomposes the core signal into N subbands (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) and N subband coders that generate the core bits , N subband decoders that reconstruct the N subband samples to form a reconstructed core signal , a summing node that forms a difference signal from the reconstructed core signal and the digital audio signal in a transform or subband domain ;
and an extension encoder that encodes the difference signal into extension bits , said extension encoder matching the core encoder over its audio bandwidth and comprising , a two band filter bank that splits the digital audio signal into lower and upper bands ;
a N-band filter bank equivalent to the core encoder' ;
s that decomposes the digital audio signal in the lower band into N subbands , said summing node existing inside said extension encoder and comprising N subband nodes that subtract the reconstructed N subband samples from the digital audio signal' ;
s N subbands , respectively to form N difference subbands ;
N subband coders that code the N difference subbands to form the lower band extension bits ;
a M-band filter bank that decomposes the digital audio signal in the upper band into M subbands ;
and M subband coders that code the M subbands to form the upper band extension bits .

US6226616B1
CLAIM 12
. A multi-channel open-box audio decoder for reconstructing multiple audio channels from a bit stream , in which each audio channel was sampled at a known sampling rate and has an audio bandwidth , comprising : an unpacker for reading in and storing the bit stream a frame at a time , each of said frames including a core field having core bits and an extension fields having a sync word and extension bits , said unpacker extracting said core bits and detecting said sync word to extract and separate the extension bits ;
N core subband decoders that decode the core bits into N core subband signals (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) N extension subband decoders that decode the extension bits into a lower N extension subband signals ;
M extension subband decoders that decode the extension bits into an upper M extension subband signals ;
N summation nodes that sum the N core subband signals to the respective N extension subband signals to form N composite subband signals ;
and a filter that synthesizes the N composite subband signals and the M extension subband signals to reproduce a multi-channel audio signal .

US6226616B1
CLAIM 15
. An article of manufacture for use with an existing base of first generation audio decoders that are capable of reconstructing a core signal up to an audio bandwidth and sample resolution and a developing base of second generation audio decoders having a larger audio bandwidth , comprising : a portable machine readable storage (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) medium for use with said first and second generation audio decoders ;
and a single digital bit stream representing a multi-channel audio signal written onto said storage medium in a core plus extension format , said bit stream comprising a sequence of synchronized frames , each of said frames including a core field having a core sync word immediately proceeding core bits and an extension fields having an extension sync word immediately proceeding extension bits , said sequence of core bits defining a noise floor for the reconstructed core signal across the audio bandwidth of said first generation audio decoders , and said sequence of extension bits further refining the noise floor across the core encoder' ;
s audio bandwidth and defining a noise floor for the remainder of the audio bandwidth of the second generation audio decoders .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (transition band) from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US6226616B1
CLAIM 3
. The multi-channel audio encoder of claim 1 , wherein said core bits define a noise floor for the reconstructed core signal across its audio bandwidth , said extension bits being allocated at frequencies near a transition band (music signal) width of said decimation LPF and above to define a noise floor for the remainder of the extension encoder' ;
s audio bandwidth .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal (transition band) from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US6226616B1
CLAIM 3
. The multi-channel audio encoder of claim 1 , wherein said core bits define a noise floor for the reconstructed core signal across its audio bandwidth , said extension bits being allocated at frequencies near a transition band (music signal) width of said decimation LPF and above to define a noise floor for the remainder of the extension encoder' ;
s audio bandwidth .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator (N difference) for updating noise energy estimates (band signals, N subbands, readable storage) in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector .
US6226616B1
CLAIM 9
. A multi-channel audio encoder for coding a digital audio signal sampled at a known sampling rate and having an audio bandwidth , comprising : a core encoder that extracts and codes a core signal from the digital audio signal over an audio bandwidth into core bits , said core encoder including an N-band filter bank that decomposes the core signal into N subbands (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) and N subband coders that generate the core bits , N subband decoders that reconstruct the N subband samples to form a reconstructed core signal , a summing node that forms a difference signal from the reconstructed core signal and the digital audio signal in a transform or subband domain ;
and an extension encoder that encodes the difference signal into extension bits , said extension encoder matching the core encoder over its audio bandwidth and comprising , a two band filter bank that splits the digital audio signal into lower and upper bands ;
a N-band filter bank equivalent to the core encoder' ;
s that decomposes the digital audio signal in the lower band into N subbands , said summing node existing inside said extension encoder and comprising N subband nodes that subtract the reconstructed N subband samples from the digital audio signal' ;
s N subbands , respectively to form N difference (noise estimator) subbands ;
N subband coders that code the N difference subbands to form the lower band extension bits ;
a M-band filter bank that decomposes the digital audio signal in the upper band into M subbands ;
and M subband coders that code the M subbands to form the upper band extension bits .

US6226616B1
CLAIM 12
. A multi-channel open-box audio decoder for reconstructing multiple audio channels from a bit stream , in which each audio channel was sampled at a known sampling rate and has an audio bandwidth , comprising : an unpacker for reading in and storing the bit stream a frame at a time , each of said frames including a core field having core bits and an extension fields having a sync word and extension bits , said unpacker extracting said core bits and detecting said sync word to extract and separate the extension bits ;
N core subband decoders that decode the core bits into N core subband signals (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) N extension subband decoders that decode the extension bits into a lower N extension subband signals ;
M extension subband decoders that decode the extension bits into an upper M extension subband signals ;
N summation nodes that sum the N core subband signals to the respective N extension subband signals to form N composite subband signals ;
and a filter that synthesizes the N composite subband signals and the M extension subband signals to reproduce a multi-channel audio signal .

US6226616B1
CLAIM 15
. An article of manufacture for use with an existing base of first generation audio decoders that are capable of reconstructing a core signal up to an audio bandwidth and sample resolution and a developing base of second generation audio decoders having a larger audio bandwidth , comprising : a portable machine readable storage (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) medium for use with said first and second generation audio decoders ;
and a single digital bit stream representing a multi-channel audio signal written onto said storage medium in a core plus extension format , said bit stream comprising a sequence of synchronized frames , each of said frames including a core field having a core sync word immediately proceeding core bits and an extension fields having an extension sync word immediately proceeding extension bits , said sequence of core bits defining a noise floor for the reconstructed core signal across the audio bandwidth of said first generation audio decoders , and said sequence of extension bits further refining the noise floor across the core encoder' ;
s audio bandwidth and defining a noise floor for the remainder of the audio bandwidth of the second generation audio decoders .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal (transition band) from a background noise signal and preventing update of noise energy estimates (band signals, N subbands, readable storage) .
US6226616B1
CLAIM 3
. The multi-channel audio encoder of claim 1 , wherein said core bits define a noise floor for the reconstructed core signal across its audio bandwidth , said extension bits being allocated at frequencies near a transition band (music signal) width of said decimation LPF and above to define a noise floor for the remainder of the extension encoder' ;
s audio bandwidth .

US6226616B1
CLAIM 9
. A multi-channel audio encoder for coding a digital audio signal sampled at a known sampling rate and having an audio bandwidth , comprising : a core encoder that extracts and codes a core signal from the digital audio signal over an audio bandwidth into core bits , said core encoder including an N-band filter bank that decomposes the core signal into N subbands (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) and N subband coders that generate the core bits , N subband decoders that reconstruct the N subband samples to form a reconstructed core signal , a summing node that forms a difference signal from the reconstructed core signal and the digital audio signal in a transform or subband domain ;
and an extension encoder that encodes the difference signal into extension bits , said extension encoder matching the core encoder over its audio bandwidth and comprising , a two band filter bank that splits the digital audio signal into lower and upper bands ;
a N-band filter bank equivalent to the core encoder' ;
s that decomposes the digital audio signal in the lower band into N subbands , said summing node existing inside said extension encoder and comprising N subband nodes that subtract the reconstructed N subband samples from the digital audio signal' ;
s N subbands , respectively to form N difference subbands ;
N subband coders that code the N difference subbands to form the lower band extension bits ;
a M-band filter bank that decomposes the digital audio signal in the upper band into M subbands ;
and M subband coders that code the M subbands to form the upper band extension bits .

US6226616B1
CLAIM 12
. A multi-channel open-box audio decoder for reconstructing multiple audio channels from a bit stream , in which each audio channel was sampled at a known sampling rate and has an audio bandwidth , comprising : an unpacker for reading in and storing the bit stream a frame at a time , each of said frames including a core field having core bits and an extension fields having a sync word and extension bits , said unpacker extracting said core bits and detecting said sync word to extract and separate the extension bits ;
N core subband decoders that decode the core bits into N core subband signals (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) N extension subband decoders that decode the extension bits into a lower N extension subband signals ;
M extension subband decoders that decode the extension bits into an upper M extension subband signals ;
N summation nodes that sum the N core subband signals to the respective N extension subband signals to form N composite subband signals ;
and a filter that synthesizes the N composite subband signals and the M extension subband signals to reproduce a multi-channel audio signal .

US6226616B1
CLAIM 15
. An article of manufacture for use with an existing base of first generation audio decoders that are capable of reconstructing a core signal up to an audio bandwidth and sample resolution and a developing base of second generation audio decoders having a larger audio bandwidth , comprising : a portable machine readable storage (frequency spectrum, first energy value, first group, first frequency, first energy, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, frequency bands, first frequency bands, noise energy estimates, noise estimates, updating noise energy estimates) medium for use with said first and second generation audio decoders ;
and a single digital bit stream representing a multi-channel audio signal written onto said storage medium in a core plus extension format , said bit stream comprising a sequence of synchronized frames , each of said frames including a core field having a core sync word immediately proceeding core bits and an extension fields having an extension sync word immediately proceeding extension bits , said sequence of core bits defining a noise floor for the reconstructed core signal across the audio bandwidth of said first generation audio decoders , and said sequence of extension bits further refining the noise floor across the core encoder' ;
s audio bandwidth and defining a noise floor for the remainder of the audio bandwidth of the second generation audio decoders .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
JP2000357000A

Filed: 1999-06-15     Issued: 2000-12-26

雑音信号符号化装置および音声信号符号化装置

(Original Assignee) Matsushita Electric Ind Co Ltd; 松下電器産業株式会社     

Koji Yoshida, 幸司 吉田
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (更新後) , the correlation map of a current frame , and an initial value of the long term correlation map .
JP2000357000A
CLAIM 8
【請求項8】 符号化側で入力雑音信号に対して符号化 された雑音モデルパラメータおよび雑音モデル更新フラ グにしたがって、必要な場合に雑音モデルの更新を行う 雑音モデル更新手段と、前記雑音モデル更新手段の出力 を用いて更新後 (update factor, update decision, prevent update, preventing update) の雑音モデルに関する情報を記憶する雑 音モデル記憶手段と、前記雑音モデル記憶手段で記憶し ている雑音モデルに関する情報から雑音信号を生成する 雑音信号生成手段と、を具備することを特徴とする雑音 信号生成装置。

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update (更新後) of noise energy estimates when a tonal sound signal is detected .
JP2000357000A
CLAIM 8
【請求項8】 符号化側で入力雑音信号に対して符号化 された雑音モデルパラメータおよび雑音モデル更新フラ グにしたがって、必要な場合に雑音モデルの更新を行う 雑音モデル更新手段と、前記雑音モデル更新手段の出力 を用いて更新後 (update factor, update decision, prevent update, preventing update) の雑音モデルに関する情報を記憶する雑 音モデル記憶手段と、前記雑音モデル記憶手段で記憶し ている雑音モデルに関する情報から雑音信号を生成する 雑音信号生成手段と、を具備することを特徴とする雑音 信号生成装置。

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity in the sound signal further comprises using a signal-to-noise ratio (SNR)-based sound activity detection (検出手段) .
JP2000357000A
CLAIM 1
【請求項1】 雑音信号を含む音声信号の前記雑音信号 に対して信号分析を行う分析手段と、前記雑音信号を表 わす雑音モデルに関する情報を記憶する記憶手段と、現 入力の雑音信号の信号分析結果に基づいて、記憶された 雑音モデルに関する情報の変化を検出する検出手段 (sound activity detection) と、 雑音モデルに関する情報の変化が検出された場合に、前 記変化の変化量分だけ前記記憶された雑音モデルに関す る情報を更新する更新手段と、を具備することを特徴と する雑音信号符号化装置。

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection (検出手段) comprises detecting the sound signal based on a frequency dependent signal-to-noise ratio (SNR) .
JP2000357000A
CLAIM 1
【請求項1】 雑音信号を含む音声信号の前記雑音信号 に対して信号分析を行う分析手段と、前記雑音信号を表 わす雑音モデルに関する情報を記憶する記憶手段と、現 入力の雑音信号の信号分析結果に基づいて、記憶された 雑音モデルに関する情報の変化を検出する検出手段 (sound activity detection) と、 雑音モデルに関する情報の変化が検出された場合に、前 記変化の変化量分だけ前記記憶された雑音モデルに関す る情報を更新する更新手段と、を具備することを特徴と する雑音信号符号化装置。

US8990073B2
CLAIM 14
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection (検出手段) comprises comparing an average signal-to-noise ratio (SNR av ) to a threshold calculated as a function of a long-term signal-to-noise ratio (SNR LT ) .
JP2000357000A
CLAIM 1
【請求項1】 雑音信号を含む音声信号の前記雑音信号 に対して信号分析を行う分析手段と、前記雑音信号を表 わす雑音モデルに関する情報を記憶する記憶手段と、現 入力の雑音信号の信号分析結果に基づいて、記憶された 雑音モデルに関する情報の変化を検出する検出手段 (sound activity detection) と、 雑音モデルに関する情報の変化が検出された場合に、前 記変化の変化量分だけ前記記憶された雑音モデルに関す る情報を更新する更新手段と、を具備することを特徴と する雑音信号符号化装置。

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection (検出手段) in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (信号又) .
JP2000357000A
CLAIM 1
【請求項1】 雑音信号を含む音声信号の前記雑音信号 に対して信号分析を行う分析手段と、前記雑音信号を表 わす雑音モデルに関する情報を記憶する記憶手段と、現 入力の雑音信号の信号分析結果に基づいて、記憶された 雑音モデルに関する情報の変化を検出する検出手段 (sound activity detection) と、 雑音モデルに関する情報の変化が検出された場合に、前 記変化の変化量分だけ前記記憶された雑音モデルに関す る情報を更新する更新手段と、を具備することを特徴と する雑音信号符号化装置。

JP2000357000A
CLAIM 4
【請求項4】 入力音声信号を、音声信号とこの音声信 号に重畳している背景雑音信号とに分離する音声/雑音 信号分離手段と、前記入力音声信号又 (SNR calculation) は前記音声/雑音 信号分離手段により得られる音声信号から有音区間か雑 音信号のみを含む無音区間かを判定する有音/無音判定 手段と、判定結果が有音である場合に前記入力音声信号 に対して音声符号化を行う音声符号化手段と、前記音声 /雑音信号分離手段により得られる背景雑音信号の符号 化を行う請求項1又は請求項2記載の雑音信号符号化装 置と、前記有音/無音判定手段、前記音声符号化手段、 および前記雑音信号符号化装置からの出力を多重化する 多重化手段と、を具備することを特徴とする音声信号符 号化装置。

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection (検出手段) further comprises updating the noise estimates for a next frame .
JP2000357000A
CLAIM 1
【請求項1】 雑音信号を含む音声信号の前記雑音信号 に対して信号分析を行う分析手段と、前記雑音信号を表 わす雑音モデルに関する情報を記憶する記憶手段と、現 入力の雑音信号の信号分析結果に基づいて、記憶された 雑音モデルに関する情報の変化を検出する検出手段 (sound activity detection) と、 雑音モデルに関する情報の変化が検出された場合に、前 記変化の変化量分だけ前記記憶された雑音モデルに関す る情報を更新する更新手段と、を具備することを特徴と する雑音信号符号化装置。

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision (更新後) based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
JP2000357000A
CLAIM 8
【請求項8】 符号化側で入力雑音信号に対して符号化 された雑音モデルパラメータおよび雑音モデル更新フラ グにしたがって、必要な場合に雑音モデルの更新を行う 雑音モデル更新手段と、前記雑音モデル更新手段の出力 を用いて更新後 (update factor, update decision, prevent update, preventing update) の雑音モデルに関する情報を記憶する雑 音モデル記憶手段と、前記雑音モデル記憶手段で記憶し ている雑音モデルに関する情報から雑音信号を生成する 雑音信号生成手段と、を具備することを特徴とする雑音 信号生成装置。

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal and prevent update (更新後) of noise energy estimates on the music signal .
JP2000357000A
CLAIM 8
【請求項8】 符号化側で入力雑音信号に対して符号化 された雑音モデルパラメータおよび雑音モデル更新フラ グにしたがって、必要な場合に雑音モデルの更新を行う 雑音モデル更新手段と、前記雑音モデル更新手段の出力 を用いて更新後 (update factor, update decision, prevent update, preventing update) の雑音モデルに関する情報を記憶する雑 音モデル記憶手段と、前記雑音モデル記憶手段で記憶し ている雑音モデルに関する情報から雑音信号を生成する 雑音信号生成手段と、を具備することを特徴とする雑音 信号生成装置。

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision (の判定) obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
JP2000357000A
CLAIM 5
【請求項5】 入力音声信号に対して信号分析を行う分 析手段と、前記入力音声信号が有音信号であるかどうか を判定するために必要な音声の特徴パターンを記憶する 音声モデル記憶手段と、前記入力音声信号に含まれる雑 音信号を表現する雑音モデルに関する情報を記憶する雑 音モデル記憶手段と、前記分析手段、音声モデル記憶手 段および雑音モデル記憶手段の出力を用いて、前記入力 音声信号が有音区間か雑音信号のみを含む無音区間かを 判定すると共に、前記無音区間の場合に雑音モデルを更 新するかどうかの判定 (binary decision) を行うモード判定手段と、前記モ ード判定手段が有音区間と判定した場合に入力音声信号 に対して音声符号化を行う音声符号化手段と、前記モー ド判定手段が無音区間でかつ雑音モデルを更新すると判 定した場合にその雑音モデルの更新を行う雑音モデル更 新手段と、音声符号化手段および雑音モデル更新手段か らの出力を多重化する多重化手段と、を具備することを 特徴とする音声信号符号化装置。

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (更新後) , the correlation map of a current frame , and an initial value of the long-term correlation map .
JP2000357000A
CLAIM 8
【請求項8】 符号化側で入力雑音信号に対して符号化 された雑音モデルパラメータおよび雑音モデル更新フラ グにしたがって、必要な場合に雑音モデルの更新を行う 雑音モデル更新手段と、前記雑音モデル更新手段の出力 を用いて更新後 (update factor, update decision, prevent update, preventing update) の雑音モデルに関する情報を記憶する雑 音モデル記憶手段と、前記雑音モデル記憶手段で記憶し ている雑音モデルに関する情報から雑音信号を生成する 雑音信号生成手段と、を具備することを特徴とする雑音 信号生成装置。

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (更新後) , the correlation map of a current frame , and an initial value of the long-term correlation map .
JP2000357000A
CLAIM 8
【請求項8】 符号化側で入力雑音信号に対して符号化 された雑音モデルパラメータおよび雑音モデル更新フラ グにしたがって、必要な場合に雑音モデルの更新を行う 雑音モデル更新手段と、前記雑音モデル更新手段の出力 を用いて更新後 (update factor, update decision, prevent update, preventing update) の雑音モデルに関する情報を記憶する雑 音モデル記憶手段と、前記雑音モデル記憶手段で記憶し ている雑音モデルに関する情報から雑音信号を生成する 雑音信号生成手段と、を具備することを特徴とする雑音 信号生成装置。

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator (音声信号復号化) for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector .
JP2000357000A
CLAIM 10
【請求項10】 符号化側で符号化された音声データ、 雑音モデルパラメータ、有音/無音判定フラグおよび雑 音モデル更新フラグを含む信号を受信し、前記信号から 雑音モデルパラメータ、有音/無音判定フラグおよび雑 音モデル更新フラグを分離する分離手段と、前記有音/ 無音判定フラグが有音区間を示す場合に、前記音声デー タに対して音声復号を行う音声復号化手段と、前記有音 /無音判定フラグが無音区間を示す場合に、前記雑音モ デルパラメータおよび雑音モデル更新フラグから雑音信 号の生成を行う請求項8又は請求項9記載の雑音信号生 成装置と、前記音声復号化手段から出力される復号音声 と前記雑音信号生成装置から出力される雑音信号のいず れかを、前記有音/無音判定フラグに応じて切り替えて 出力信号として出力する出力切り替え手段と、を具備す ることを特徴とする音声信号復号化 (noise estimator) 装置。

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal and preventing update (更新後) of noise energy estimates .
JP2000357000A
CLAIM 8
【請求項8】 符号化側で入力雑音信号に対して符号化 された雑音モデルパラメータおよび雑音モデル更新フラ グにしたがって、必要な場合に雑音モデルの更新を行う 雑音モデル更新手段と、前記雑音モデル更新手段の出力 を用いて更新後 (update factor, update decision, prevent update, preventing update) の雑音モデルに関する情報を記憶する雑 音モデル記憶手段と、前記雑音モデル記憶手段で記憶し ている雑音モデルに関する情報から雑音信号を生成する 雑音信号生成手段と、を具備することを特徴とする雑音 信号生成装置。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
CN1274456A

Filed: 1999-05-18     Issued: 2000-11-22

语音编码器

(Original Assignee) 萨里大学     

S·P·维勒特, A·M·康多兹
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (一种语, 的因子) , the correlation map of a current frame , and an initial value of the long term correlation map (系数对应) .
CN1274456A
CLAIM 1
. 一种语 (update factor) 音编码器,该语音编码器含有对划分为帧的,每帧含有事先确定个数数字样本的输入语音信号进行编码的编码器,该编码器包括(采用):·为每帧样本产生至少一组线性预测系数的线性预测编码装置,·为每帧样本确定至少一个基音值的基音确定装置,该基音确定装置包括:用频域技术分析样本(频域分析)的第一种估计装置,用时域技术分析样本(时域分析)的第二种估计装置及由所述频域分析和时域分析结果导出所述基音值的基音评价装置,·用于定义每帧中浊音和清音信号量度的语音装置,·用于为每帧样本产生幅值信息的幅值确定装置,·用于量化前述线性预测系数,基音值,浊音和清音信号量度及幅值信息来为每帧样本产生一组量化指标的量化装置,其中,前述第一种估计装置对若干候选基音中的每个基音值生成以个量度,前述第二种估计装置对相同候选基音中的每个基音值生成相应的第二个量度,前述基音评价装置至少组合若干以上第一个量度和与之相应的第二个量度并通过引用该组合结果选出候选基音其中之一。

CN1274456A
CLAIM 7
. 一种如权利要求6所要求的语音编码器,其中前述幅值被进一步用一个随频率降低而增加的因子 (update factor) 加权。

CN1274456A
CLAIM 45
. 一种用于对表征LSF系数,基音值,浊音和清音信号量度和幅值信息的一组量化指标进行解码的语音编码器,包括,从前述表征基音值,浊音和清音信号量度和幅值信息的量化指标中导出一个激励信号的装置,过滤与前述LSF系数对应 (term correlation map) 的激励信号的LPC合成滤波器,比较LPC合成滤波器输出基音周期能量和相应激励信号基音周期能量的装置,调整激励信号以减小所比较的基音周期能量差距的装置及过滤调整后的激励信号再一级LPC合成滤波器。

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (背景噪声) ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
CN1274456A
CLAIM 4
. 一种如权利要求3所要求的语音编码器,其中加权的数量还由当前帧的背景噪声 (background noise signal) 水平决定。

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update (选择前) of noise energy estimates when a tonal sound signal is detected .
CN1274456A
CLAIM 40
. 一种语音编码器,其中,含有一个对输入语音信号进行编码的编码器,该编码器包括对输入语音信号取样以产生数字样本并将该样本划分为每帧含有事先确定个数数字样本的帧的装置,对每帧样本进行分析并为每帧前导部分和结尾部分分别产生一组线性频谱频率(LSF)系数的线性预测编码装置,为每帧样本确定至少一个基音值的基音确定装置,用于定义每帧中浊音和清音信号量度的语音装置,用于为每帧样本产生幅值信息的幅值确定装置,用于量化前述LSF系数组,基音值,浊音和清音信号量度及幅值信息来产生一组量化指标的量化装置,其中,前述量化装置通过下式为当前帧的前导部分定义了一组量化的LSF系数(LSF’2):LSF’2=αLSF’1+(1-α)LSF’3式中LSF’3和LSF’1分别为量化的当前帧的尾段和紧邻前一帧LSF系数,α为第一个矢量量化码表中的一个矢量,定义当前帧的每组前导和尾段部分的LSF系数LSF’2和LSF’3为第二个矢量量化码表中相应LSF量化矢量Q2,Q3及相应预期值P2,P3的组合,此处P2=λQ1,P3=λQ2,λ为一常数,Q1为紧邻前一帧尾段的LSF量化矢量,分别从第一个矢量码表和第二个矢量码表中选择前 (preventing update) 述矢量Q3和前述矢量α,以最大限度减低由线性预测编码装置产生的当前帧的LSF系数(LSF2,LSF3)与相应的量化LSF系数(LSF’2,LSF’3)之间的失真。

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction (预测编码) residual error energies .
CN1274456A
CLAIM 1
. 一种语音编码器,该语音编码器含有对划分为帧的,每帧含有事先确定个数数字样本的输入语音信号进行编码的编码器,该编码器包括(采用):·为每帧样本产生至少一组线性预测系数的线性预测编码 (linear prediction) 装置,·为每帧样本确定至少一个基音值的基音确定装置,该基音确定装置包括:用频域技术分析样本(频域分析)的第一种估计装置,用时域技术分析样本(时域分析)的第二种估计装置及由所述频域分析和时域分析结果导出所述基音值的基音评价装置,·用于定义每帧中浊音和清音信号量度的语音装置,·用于为每帧样本产生幅值信息的幅值确定装置,·用于量化前述线性预测系数,基音值,浊音和清音信号量度及幅值信息来为每帧样本产生一组量化指标的量化装置,其中,前述第一种估计装置对若干候选基音中的每个基音值生成以个量度,前述第二种估计装置对相同候选基音中的每个基音值生成相应的第二个量度,前述基音评价装置至少组合若干以上第一个量度和与之相应的第二个量度并通过引用该组合结果选出候选基音其中之一。

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal (背景噪声) and prevent update of noise energy estimates on the music signal .
CN1274456A
CLAIM 4
. 一种如权利要求3所要求的语音编码器,其中加权的数量还由当前帧的背景噪声 (background noise signal) 水平决定。

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision (产生表征) obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
CN1274456A
CLAIM 43
. 一种如权利要求1到42任何一项所要求的语音编码器,其中进一步包括了解码器,提供对由前述编码器产生的量化指标进行解码及再处理解码后的量化指标以产生表征 (binary decision) 输入语音信号的数字样本序列的装置。

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency (的第二个) bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
CN1274456A
CLAIM 1
. 一种语音编码器,该语音编码器含有对划分为帧的,每帧含有事先确定个数数字样本的输入语音信号进行编码的编码器,该编码器包括(采用):·为每帧样本产生至少一组线性预测系数的线性预测编码装置,·为每帧样本确定至少一个基音值的基音确定装置,该基音确定装置包括:用频域技术分析样本(频域分析)的第一种估计装置,用时域技术分析样本(时域分析)的第二种估计装置及由所述频域分析和时域分析结果导出所述基音值的基音评价装置,·用于定义每帧中浊音和清音信号量度的语音装置,·用于为每帧样本产生幅值信息的幅值确定装置,·用于量化前述线性预测系数,基音值,浊音和清音信号量度及幅值信息来为每帧样本产生一组量化指标的量化装置,其中,前述第一种估计装置对若干候选基音中的每个基音值生成以个量度,前述第二种估计装置对相同候选基音中的每个基音值生成相应的第二个 (first frequency) 量度,前述基音评价装置至少组合若干以上第一个量度和与之相应的第二个量度并通过引用该组合结果选出候选基音其中之一。

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (一种语, 的因子) , the correlation map of a current frame , and an initial value of the long-term correlation map .
CN1274456A
CLAIM 1
. 一种语 (update factor) 音编码器,该语音编码器含有对划分为帧的,每帧含有事先确定个数数字样本的输入语音信号进行编码的编码器,该编码器包括(采用):·为每帧样本产生至少一组线性预测系数的线性预测编码装置,·为每帧样本确定至少一个基音值的基音确定装置,该基音确定装置包括:用频域技术分析样本(频域分析)的第一种估计装置,用时域技术分析样本(时域分析)的第二种估计装置及由所述频域分析和时域分析结果导出所述基音值的基音评价装置,·用于定义每帧中浊音和清音信号量度的语音装置,·用于为每帧样本产生幅值信息的幅值确定装置,·用于量化前述线性预测系数,基音值,浊音和清音信号量度及幅值信息来为每帧样本产生一组量化指标的量化装置,其中,前述第一种估计装置对若干候选基音中的每个基音值生成以个量度,前述第二种估计装置对相同候选基音中的每个基音值生成相应的第二个量度,前述基音评价装置至少组合若干以上第一个量度和与之相应的第二个量度并通过引用该组合结果选出候选基音其中之一。

CN1274456A
CLAIM 7
. 一种如权利要求6所要求的语音编码器,其中前述幅值被进一步用一个随频率降低而增加的因子 (update factor) 加权。

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (一种语, 的因子) , the correlation map of a current frame , and an initial value of the long-term correlation map .
CN1274456A
CLAIM 1
. 一种语 (update factor) 音编码器,该语音编码器含有对划分为帧的,每帧含有事先确定个数数字样本的输入语音信号进行编码的编码器,该编码器包括(采用):·为每帧样本产生至少一组线性预测系数的线性预测编码装置,·为每帧样本确定至少一个基音值的基音确定装置,该基音确定装置包括:用频域技术分析样本(频域分析)的第一种估计装置,用时域技术分析样本(时域分析)的第二种估计装置及由所述频域分析和时域分析结果导出所述基音值的基音评价装置,·用于定义每帧中浊音和清音信号量度的语音装置,·用于为每帧样本产生幅值信息的幅值确定装置,·用于量化前述线性预测系数,基音值,浊音和清音信号量度及幅值信息来为每帧样本产生一组量化指标的量化装置,其中,前述第一种估计装置对若干候选基音中的每个基音值生成以个量度,前述第二种估计装置对相同候选基音中的每个基音值生成相应的第二个量度,前述基音评价装置至少组合若干以上第一个量度和与之相应的第二个量度并通过引用该组合结果选出候选基音其中之一。

CN1274456A
CLAIM 7
. 一种如权利要求6所要求的语音编码器,其中前述幅值被进一步用一个随频率降低而增加的因子 (update factor) 加权。

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (背景噪声) ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
CN1274456A
CLAIM 4
. 一种如权利要求3所要求的语音编码器,其中加权的数量还由当前帧的背景噪声 (background noise signal) 水平决定。

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal (背景噪声) ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
CN1274456A
CLAIM 4
. 一种如权利要求3所要求的语音编码器,其中加权的数量还由当前帧的背景噪声 (background noise signal) 水平决定。

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal (背景噪声) and preventing update (选择前) of noise energy estimates .
CN1274456A
CLAIM 4
. 一种如权利要求3所要求的语音编码器,其中加权的数量还由当前帧的背景噪声 (background noise signal) 水平决定。

CN1274456A
CLAIM 40
. 一种语音编码器,其中,含有一个对输入语音信号进行编码的编码器,该编码器包括对输入语音信号取样以产生数字样本并将该样本划分为每帧含有事先确定个数数字样本的帧的装置,对每帧样本进行分析并为每帧前导部分和结尾部分分别产生一组线性频谱频率(LSF)系数的线性预测编码装置,为每帧样本确定至少一个基音值的基音确定装置,用于定义每帧中浊音和清音信号量度的语音装置,用于为每帧样本产生幅值信息的幅值确定装置,用于量化前述LSF系数组,基音值,浊音和清音信号量度及幅值信息来产生一组量化指标的量化装置,其中,前述量化装置通过下式为当前帧的前导部分定义了一组量化的LSF系数(LSF’2):LSF’2=αLSF’1+(1-α)LSF’3式中LSF’3和LSF’1分别为量化的当前帧的尾段和紧邻前一帧LSF系数,α为第一个矢量量化码表中的一个矢量,定义当前帧的每组前导和尾段部分的LSF系数LSF’2和LSF’3为第二个矢量量化码表中相应LSF量化矢量Q2,Q3及相应预期值P2,P3的组合,此处P2=λQ1,P3=λQ2,λ为一常数,Q1为紧邻前一帧尾段的LSF量化矢量,分别从第一个矢量码表和第二个矢量码表中选择前 (preventing update) 述矢量Q3和前述矢量α,以最大限度减低由线性预测编码装置产生的当前帧的LSF系数(LSF2,LSF3)与相应的量化LSF系数(LSF’2,LSF’3)之间的失真。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6351730B2

Filed: 1999-03-30     Issued: 2002-02-26

Low-complexity, low-delay, scalable and embedded speech and audio coding with adaptive frame loss concealment

(Original Assignee) Nokia of America Corp     (Current Assignee) Nokia of America Corp

Juin-Hwey Chen
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (domain representation) of the sound signal , the method comprising : calculating a current residual spectrum (sample values) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long term correlation map .
US6351730B2
CLAIM 12
. The method of claim 11 using M transforms for each signal frame , said transforms performed over partially overlapping windows which cover the audio signal in a current frame (current frame) and least one adjacent frame , wherein the overlapping portion is equal to 1/M of the frame size .

US6351730B2
CLAIM 31
. The method of claim 30 wherein a measure of discontinuities is computed in terms of both waveform sample values (current residual spectrum) and waveform slope .

US6351730B2
CLAIM 42
. A system for embedded coding of audio signals comprising : a frame extractor for dividing an input audio signal into a plurality of signal frames corresponding to successive time intervals ;
means for performing transform computation to provide transform-domain representation (frequency spectrum) of the input audio signal in each frame , said transform-domain representation having n NB bands , where n> ;
1 ;
means for providing a first encoded data stream corresponding to a user-specified portion of the transform-domain representation having m NB bands , where m< ;
n , which first encoded data stream contains information sufficient to reconstruct a representation of the input audio signal ;
means for providing one or more secondary encoded data streams comprising additional information to the user-specified portion of the transform-domain representation of the input audio signal ;
and means for providing an embedded output signal based at least on said first encoded data stream and said one or more secondary encoded data streams .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum (sample values) comprises : searching for the minima in the frequency spectrum (domain representation) of the sound signal in the current frame (current frame) ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US6351730B2
CLAIM 12
. The method of claim 11 using M transforms for each signal frame , said transforms performed over partially overlapping windows which cover the audio signal in a current frame (current frame) and least one adjacent frame , wherein the overlapping portion is equal to 1/M of the frame size .

US6351730B2
CLAIM 31
. The method of claim 30 wherein a measure of discontinuities is computed in terms of both waveform sample values (current residual spectrum) and waveform slope .

US6351730B2
CLAIM 42
. A system for embedded coding of audio signals comprising : a frame extractor for dividing an input audio signal into a plurality of signal frames corresponding to successive time intervals ;
means for performing transform computation to provide transform-domain representation (frequency spectrum) of the input audio signal in each frame , said transform-domain representation having n NB bands , where n> ;
1 ;
means for providing a first encoded data stream corresponding to a user-specified portion of the transform-domain representation having m NB bands , where m< ;
n , which first encoded data stream contains information sufficient to reconstruct a representation of the input audio signal ;
means for providing one or more secondary encoded data streams comprising additional information to the user-specified portion of the transform-domain representation of the input audio signal ;
and means for providing an embedded output signal based at least on said first encoded data stream and said one or more secondary encoded data streams .

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum (sample values) comprises locating a maximum between each pair of two consecutive minima (successive time intervals) of the current residual spectrum .
US6351730B2
CLAIM 1
. A system for processing audio signals comprising : (a) a frame extractor for dividing an input audio signal into a plurality of signal frames corresponding to successive time intervals (consecutive minima) ;
(b) a transform processor for performing transform computation of the input audio signal in at least one signal frame , said transform processor generating a transform signal having one or more (NB) bands ;
(c) a quantizer providing quantized values associated with the transform signal in said NB bands ;
(d) an output processor for forming an output bit stream corresponding to an encoded version of the input audio signal ;
and (e) a decoder capable of recontructing from the output bit stream at least two replicas of the input audio signal , each replica having a different sampling rate , without using downsampling .

US6351730B2
CLAIM 31
. The method of claim 30 wherein a measure of discontinuities is computed in terms of both waveform sample values (current residual spectrum) and waveform slope .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum (sample values) , calculating a normalized correlation value with the previous residual spectrum , over frequency bins between two consecutive minima (successive time intervals) in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US6351730B2
CLAIM 1
. A system for processing audio signals comprising : (a) a frame extractor for dividing an input audio signal into a plurality of signal frames corresponding to successive time intervals (consecutive minima) ;
(b) a transform processor for performing transform computation of the input audio signal in at least one signal frame , said transform processor generating a transform signal having one or more (NB) bands ;
(c) a quantizer providing quantized values associated with the transform signal in said NB bands ;
(d) an output processor for forming an output bit stream corresponding to an encoded version of the input audio signal ;
and (e) a decoder capable of recontructing from the output bit stream at least two replicas of the input audio signal , each replica having a different sampling rate , without using downsampling .

US6351730B2
CLAIM 31
. The method of claim 30 wherein a measure of discontinuities is computed in terms of both waveform sample values (current residual spectrum) and waveform slope .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis (sampling rate) ;

and summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US6351730B2
CLAIM 1
. A system for processing audio signals comprising : (a) a frame extractor for dividing an input audio signal into a plurality of signal frames corresponding to successive time intervals ;
(b) a transform processor for performing transform computation of the input audio signal in at least one signal frame , said transform processor generating a transform signal having one or more (NB) bands ;
(c) a quantizer providing quantized values associated with the transform signal in said NB bands ;
(d) an output processor for forming an output bit stream corresponding to an encoded version of the input audio signal ;
and (e) a decoder capable of recontructing from the output bit stream at least two replicas of the input audio signal , each replica having a different sampling rate (frequency bin basis) , without using downsampling .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity in the sound signal further comprises using a signal-to-noise ratio (SNR)-based sound activity detection (Discrete Cosine Transform) .
US6351730B2
CLAIM 14
. The method of claim 11 wherein said at least two relatively short-size transforms are Modified Discrete Cosine Transform (sound activity detection) s (MDCTs) .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection (Discrete Cosine Transform) comprises detecting the sound signal based on a frequency dependent signal-to-noise ratio (SNR) .
US6351730B2
CLAIM 14
. The method of claim 11 wherein said at least two relatively short-size transforms are Modified Discrete Cosine Transform (sound activity detection) s (MDCTs) .

US8990073B2
CLAIM 14
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection (Discrete Cosine Transform) comprises comparing an average signal-to-noise ratio (SNR av ) to a threshold calculated as a function of a long-term signal-to-noise ratio (SNR LT ) .
US6351730B2
CLAIM 14
. The method of claim 11 wherein said at least two relatively short-size transforms are Modified Discrete Cosine Transform (sound activity detection) s (MDCTs) .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection (Discrete Cosine Transform) in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (noise ratio, Noise Ratio) .
US6351730B2
CLAIM 5
. The system of claim 3 wherein said bit allocator warps possibly quantized log-gain values to target signal-to-noise ratio (noise ratio, SNR LT, SNR calculation) (TSNR) values in the base-2 log domain using a predefined warping function .

US6351730B2
CLAIM 14
. The method of claim 11 wherein said at least two relatively short-size transforms are Modified Discrete Cosine Transform (sound activity detection) s (MDCTs) .

US6351730B2
CLAIM 21
. The method of claim 20 wherein prior to bit allocation , the NB log-gains are mapped to a Target Signal to Noise Ratio (noise ratio, SNR LT, SNR calculation) (TSNR) scale using a warping curve .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection (Discrete Cosine Transform) further comprises updating the noise estimates for a next frame .
US6351730B2
CLAIM 14
. The method of claim 11 wherein said at least two relatively short-size transforms are Modified Discrete Cosine Transform (sound activity detection) s (MDCTs) .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order (following steps) and a sixteenth order of linear prediction residual error energies .
US6351730B2
CLAIM 38
. A coding method for use in processing of audio signals divided into frames corresponding to successive time intervals , where for each input frame at least one transform domain computation is performed , and the transform coefficients are divided into NB bands , the method comprising : computing a base-2 logarithm of the average power of the transform coefficients in the NB bands to obtain a log-gain array LG(i) , i=0 , . . . , NB−1 ;
encoding information about each frame based on the log-gain array LG(i) , said encoded information comprising the transform coefficients , where the encoding step comprises : computing a quantized log-gain array LGQ(i) , i=0 , . . . , NB−1 ;
and converting the quantized log-gain coefficients of the array LGQ(i) into a linear-gain domain using the following steps (second order) : (1) providing a table containing all possible values of the linear gain g(0) corresponding to the number of bits allocated to LGQ(0) ;
(2) finding the value of g(0) using table lookup ;
(3) from the second band onward , applying the formula : g (i)=2 LGQ(i)/2 =2 ½[DLGQ(i)+LGQ(i−1)] =2 LGQ(i−1)/2 ×2 DLGQ(i)/2 =g (i− 1)×2 DLGQ(i)/2 to compute recursively all linear gains using a single multiplication per linear gain , where each of the quantities 2 DLGQ(i)/2 are found using table lookup ;
and decoding said encoded information about each frame to reconstruct the input audio signal .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character (wise linear function) parameter (wise linear function) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
US6351730B2
CLAIM 22
. The method of claim 21 wherein the warping curve is a piece-wise linear function (noise character parameter, noise character) .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame (current frame) energy and an average frame energy .
US6351730B2
CLAIM 12
. The method of claim 11 using M transforms for each signal frame , said transforms performed over partially overlapping windows which cover the audio signal in a current frame (current frame) and least one adjacent frame , wherein the overlapping portion is equal to 1/M of the frame size .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame (current frame) and an energy of the sound signal in a previous frame , for frequency bands (frequency bands, second band) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US6351730B2
CLAIM 12
. The method of claim 11 using M transforms for each signal frame , said transforms performed over partially overlapping windows which cover the audio signal in a current frame (current frame) and least one adjacent frame , wherein the overlapping portion is equal to 1/M of the frame size .

US6351730B2
CLAIM 18
. The method of claim 15 wherein transform coefficients T(k , m) obtained by each of said at least two transform computations are divided into NB frequency bands (frequency bands, first frequency bands) , and encoding information about each frame is done using the base-2 logarithm of the average power of the coefficients in the NB bands , said base-2 logarithm of the average power being defined as the log-gain .

US6351730B2
CLAIM 38
. A coding method for use in processing of audio signals divided into frames corresponding to successive time intervals , where for each input frame at least one transform domain computation is performed , and the transform coefficients are divided into NB bands , the method comprising : computing a base-2 logarithm of the average power of the transform coefficients in the NB bands to obtain a log-gain array LG(i) , i=0 , . . . , NB−1 ;
encoding information about each frame based on the log-gain array LG(i) , said encoded information comprising the transform coefficients , where the encoding step comprises : computing a quantized log-gain array LGQ(i) , i=0 , . . . , NB−1 ;
and converting the quantized log-gain coefficients of the array LGQ(i) into a linear-gain domain using the following steps : (1) providing a table containing all possible values of the linear gain g(0) corresponding to the number of bits allocated to LGQ(0) ;
(2) finding the value of g(0) using table lookup ;
(3) from the second band (frequency bands, first frequency bands) onward , applying the formula : g (i)=2 LGQ(i)/2 =2 ½[DLGQ(i)+LGQ(i−1)] =2 LGQ(i−1)/2 ×2 DLGQ(i)/2 =g (i− 1)×2 DLGQ(i)/2 to compute recursively all linear gains using a single multiplication per linear gain , where each of the quantities 2 DLGQ(i)/2 are found using table lookup ;
and decoding said encoded information about each frame to reconstruct the input audio signal .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character (wise linear function) parameter (wise linear function) comprises : dividing a plurality of frequency bands (frequency bands, second band) into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US6351730B2
CLAIM 18
. The method of claim 15 wherein transform coefficients T(k , m) obtained by each of said at least two transform computations are divided into NB frequency bands (frequency bands, first frequency bands) , and encoding information about each frame is done using the base-2 logarithm of the average power of the coefficients in the NB bands , said base-2 logarithm of the average power being defined as the log-gain .

US6351730B2
CLAIM 22
. The method of claim 21 wherein the warping curve is a piece-wise linear function (noise character parameter, noise character) .

US6351730B2
CLAIM 38
. A coding method for use in processing of audio signals divided into frames corresponding to successive time intervals , where for each input frame at least one transform domain computation is performed , and the transform coefficients are divided into NB bands , the method comprising : computing a base-2 logarithm of the average power of the transform coefficients in the NB bands to obtain a log-gain array LG(i) , i=0 , . . . , NB−1 ;
encoding information about each frame based on the log-gain array LG(i) , said encoded information comprising the transform coefficients , where the encoding step comprises : computing a quantized log-gain array LGQ(i) , i=0 , . . . , NB−1 ;
and converting the quantized log-gain coefficients of the array LGQ(i) into a linear-gain domain using the following steps : (1) providing a table containing all possible values of the linear gain g(0) corresponding to the number of bits allocated to LGQ(0) ;
(2) finding the value of g(0) using table lookup ;
(3) from the second band (frequency bands, first frequency bands) onward , applying the formula : g (i)=2 LGQ(i)/2 =2 ½[DLGQ(i)+LGQ(i−1)] =2 LGQ(i−1)/2 ×2 DLGQ(i)/2 =g (i− 1)×2 DLGQ(i)/2 to compute recursively all linear gains using a single multiplication per linear gain , where each of the quantities 2 DLGQ(i)/2 are found using table lookup ;
and decoding said encoded information about each frame to reconstruct the input audio signal .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character (wise linear function) parameter (wise linear function) inferior than a given fixed threshold .
US6351730B2
CLAIM 22
. The method of claim 21 wherein the warping curve is a piece-wise linear function (noise character parameter, noise character) .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (domain representation) of the sound signal , the device comprising : means for calculating a current residual spectrum (sample values) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long-term correlation map .
US6351730B2
CLAIM 12
. The method of claim 11 using M transforms for each signal frame , said transforms performed over partially overlapping windows which cover the audio signal in a current frame (current frame) and least one adjacent frame , wherein the overlapping portion is equal to 1/M of the frame size .

US6351730B2
CLAIM 31
. The method of claim 30 wherein a measure of discontinuities is computed in terms of both waveform sample values (current residual spectrum) and waveform slope .

US6351730B2
CLAIM 42
. A system for embedded coding of audio signals comprising : a frame extractor for dividing an input audio signal into a plurality of signal frames corresponding to successive time intervals ;
means for performing transform computation to provide transform-domain representation (frequency spectrum) of the input audio signal in each frame , said transform-domain representation having n NB bands , where n> ;
1 ;
means for providing a first encoded data stream corresponding to a user-specified portion of the transform-domain representation having m NB bands , where m< ;
n , which first encoded data stream contains information sufficient to reconstruct a representation of the input audio signal ;
means for providing one or more secondary encoded data streams comprising additional information to the user-specified portion of the transform-domain representation of the input audio signal ;
and means for providing an embedded output signal based at least on said first encoded data stream and said one or more secondary encoded data streams .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (domain representation) of the sound signal , the device comprising : a calculator of a current residual spectrum (sample values) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long-term correlation map .
US6351730B2
CLAIM 12
. The method of claim 11 using M transforms for each signal frame , said transforms performed over partially overlapping windows which cover the audio signal in a current frame (current frame) and least one adjacent frame , wherein the overlapping portion is equal to 1/M of the frame size .

US6351730B2
CLAIM 31
. The method of claim 30 wherein a measure of discontinuities is computed in terms of both waveform sample values (current residual spectrum) and waveform slope .

US6351730B2
CLAIM 42
. A system for embedded coding of audio signals comprising : a frame extractor for dividing an input audio signal into a plurality of signal frames corresponding to successive time intervals ;
means for performing transform computation to provide transform-domain representation (frequency spectrum) of the input audio signal in each frame , said transform-domain representation having n NB bands , where n> ;
1 ;
means for providing a first encoded data stream corresponding to a user-specified portion of the transform-domain representation having m NB bands , where m< ;
n , which first encoded data stream contains information sufficient to reconstruct a representation of the input audio signal ;
means for providing one or more secondary encoded data streams comprising additional information to the user-specified portion of the transform-domain representation of the input audio signal ;
and means for providing an embedded output signal based at least on said first encoded data stream and said one or more secondary encoded data streams .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum (sample values) comprises : a locator of the minima in the frequency spectrum (domain representation) of the sound signal in the current frame (current frame) ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US6351730B2
CLAIM 12
. The method of claim 11 using M transforms for each signal frame , said transforms performed over partially overlapping windows which cover the audio signal in a current frame (current frame) and least one adjacent frame , wherein the overlapping portion is equal to 1/M of the frame size .

US6351730B2
CLAIM 31
. The method of claim 30 wherein a measure of discontinuities is computed in terms of both waveform sample values (current residual spectrum) and waveform slope .

US6351730B2
CLAIM 42
. A system for embedded coding of audio signals comprising : a frame extractor for dividing an input audio signal into a plurality of signal frames corresponding to successive time intervals ;
means for performing transform computation to provide transform-domain representation (frequency spectrum) of the input audio signal in each frame , said transform-domain representation having n NB bands , where n> ;
1 ;
means for providing a first encoded data stream corresponding to a user-specified portion of the transform-domain representation having m NB bands , where m< ;
n , which first encoded data stream contains information sufficient to reconstruct a representation of the input audio signal ;
means for providing one or more secondary encoded data streams comprising additional information to the user-specified portion of the transform-domain representation of the input audio signal ;
and means for providing an embedded output signal based at least on said first encoded data stream and said one or more secondary encoded data streams .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis (sampling rate) ;

and an adder for summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US6351730B2
CLAIM 1
. A system for processing audio signals comprising : (a) a frame extractor for dividing an input audio signal into a plurality of signal frames corresponding to successive time intervals ;
(b) a transform processor for performing transform computation of the input audio signal in at least one signal frame , said transform processor generating a transform signal having one or more (NB) bands ;
(c) a quantizer providing quantized values associated with the transform signal in said NB bands ;
(d) an output processor for forming an output bit stream corresponding to an encoded version of the input audio signal ;
and (e) a decoder capable of recontructing from the output bit stream at least two replicas of the input audio signal , each replica having a different sampling rate (frequency bin basis) , without using downsampling .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal to noise ratio (noise ratio, Noise Ratio) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US6351730B2
CLAIM 5
. The system of claim 3 wherein said bit allocator warps possibly quantized log-gain values to target signal-to-noise ratio (noise ratio, SNR LT, SNR calculation) (TSNR) values in the base-2 log domain using a predefined warping function .

US6351730B2
CLAIM 21
. The method of claim 20 wherein prior to bit allocation , the NB log-gains are mapped to a Target Signal to Noise Ratio (noise ratio, SNR LT, SNR calculation) (TSNR) scale using a warping curve .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator (domain signal) for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector .
US6351730B2
CLAIM 16
. The method of claim 15 wherein each transform includes a DCT type IV transform computation , given by the expression : X k = 2 M  ∑ n = 0 M - 1  x n  cos    [ (n + 1 2)  (k + 1 2)  π M ] where x n is the time domain signal (noise estimator) , X k is the DCT type IV transform of x n , and M is the transform size .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character (wise linear function) of the sound signal for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates .
US6351730B2
CLAIM 22
. The method of claim 21 wherein the warping curve is a piece-wise linear function (noise character parameter, noise character) .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6363345B1

Filed: 1999-02-18     Issued: 2002-03-26

System, method and apparatus for cancelling noise

(Original Assignee) Andrea Electronics Corp     (Current Assignee) Andrea Electronics Corp

Joseph Marash, Baruch Berdugo
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (frequency spectrum) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US6363345B1
CLAIM 1
. An apparatus for canceling noise , comprising : an input for inputting an audio signal which includes a noise signal ;
a frequency spectrum (frequency spectrum) generator for generating the frequency spectrum of said audio signal thereby generating frequency bins of said audio signal ;
and a threshold detector for setting a threshold for each frequency bin using a noise estimation process and for detecting for each frequency bin whether the magnitude of the frequency bin is less than the corresponding threshold , thereby detecting the position of noise elements for each frequency bin .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (frequency spectrum) of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US6363345B1
CLAIM 1
. An apparatus for canceling noise , comprising : an input for inputting an audio signal which includes a noise signal ;
a frequency spectrum (frequency spectrum) generator for generating the frequency spectrum of said audio signal thereby generating frequency bins of said audio signal ;
and a threshold detector for setting a threshold for each frequency bin using a noise estimation process and for detecting for each frequency bin whether the magnitude of the frequency bin is less than the corresponding threshold , thereby detecting the position of noise elements for each frequency bin .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (noise estimation) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US6363345B1
CLAIM 1
. An apparatus for canceling noise , comprising : an input for inputting an audio signal which includes a noise signal ;
a frequency spectrum generator for generating the frequency spectrum of said audio signal thereby generating frequency bins of said audio signal ;
and a threshold detector for setting a threshold for each frequency bin using a noise estimation (frequency bins) process and for detecting for each frequency bin whether the magnitude of the frequency bin is less than the corresponding threshold , thereby detecting the position of noise elements for each frequency bin .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (noise estimation) so as to produce a summed long-term correlation map .
US6363345B1
CLAIM 1
. An apparatus for canceling noise , comprising : an input for inputting an audio signal which includes a noise signal ;
a frequency spectrum generator for generating the frequency spectrum of said audio signal thereby generating frequency bins of said audio signal ;
and a threshold detector for setting a threshold for each frequency bin using a noise estimation (frequency bins) process and for detecting for each frequency bin whether the magnitude of the frequency bin is less than the corresponding threshold , thereby detecting the position of noise elements for each frequency bin .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map for frequency bins (noise estimation) having a magnitude that exceeds a given fixed threshold .
US6363345B1
CLAIM 1
. An apparatus for canceling noise , comprising : an input for inputting an audio signal which includes a noise signal ;
a frequency spectrum generator for generating the frequency spectrum of said audio signal thereby generating frequency bins of said audio signal ;
and a threshold detector for setting a threshold for each frequency bin using a noise estimation (frequency bins) process and for detecting for each frequency bin whether the magnitude of the frequency bin is less than the corresponding threshold , thereby detecting the position of noise elements for each frequency bin .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (minimum values) indicative of sound activity in the sound signal .
US6363345B1
CLAIM 22
. The apparatus according to claim 21 , wherein said estimator estimates said magnitude of each frequency bin as a function of the maximum and the minimum values (adaptive threshold) of the complex element of said frequency bins for a number n of frequency bins .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (frequency spectrum) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US6363345B1
CLAIM 1
. An apparatus for canceling noise , comprising : an input for inputting an audio signal which includes a noise signal ;
a frequency spectrum (frequency spectrum) generator for generating the frequency spectrum of said audio signal thereby generating frequency bins of said audio signal ;
and a threshold detector for setting a threshold for each frequency bin using a noise estimation process and for detecting for each frequency bin whether the magnitude of the frequency bin is less than the corresponding threshold , thereby detecting the position of noise elements for each frequency bin .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (frequency spectrum) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US6363345B1
CLAIM 1
. An apparatus for canceling noise , comprising : an input for inputting an audio signal which includes a noise signal ;
a frequency spectrum (frequency spectrum) generator for generating the frequency spectrum of said audio signal thereby generating frequency bins of said audio signal ;
and a threshold detector for setting a threshold for each frequency bin using a noise estimation process and for detecting for each frequency bin whether the magnitude of the frequency bin is less than the corresponding threshold , thereby detecting the position of noise elements for each frequency bin .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (frequency spectrum) of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US6363345B1
CLAIM 1
. An apparatus for canceling noise , comprising : an input for inputting an audio signal which includes a noise signal ;
a frequency spectrum (frequency spectrum) generator for generating the frequency spectrum of said audio signal thereby generating frequency bins of said audio signal ;
and a threshold detector for setting a threshold for each frequency bin using a noise estimation process and for detecting for each frequency bin whether the magnitude of the frequency bin is less than the corresponding threshold , thereby detecting the position of noise elements for each frequency bin .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (noise estimation) so as to produce a summed long-term correlation map .
US6363345B1
CLAIM 1
. An apparatus for canceling noise , comprising : an input for inputting an audio signal which includes a noise signal ;
a frequency spectrum generator for generating the frequency spectrum of said audio signal thereby generating frequency bins of said audio signal ;
and a threshold detector for setting a threshold for each frequency bin using a noise estimation (frequency bins) process and for detecting for each frequency bin whether the magnitude of the frequency bin is less than the corresponding threshold , thereby detecting the position of noise elements for each frequency bin .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6381570B2

Filed: 1999-02-12     Issued: 2002-04-30

Adaptive two-threshold method for discriminating noise from speech in a communication signal

(Original Assignee) Telogy Networks Inc     (Current Assignee) Telogy Networks Inc

Dunling Li, Zoran Mladenovic, Bogdan Kosanovic
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum (sample values) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US6381570B2
CLAIM 1
. A method of discriminating noise and voice energy in a communication signal , comprising the steps of : for a plurality of block periods : sampling said signal a number of times to obtain sample values (current residual spectrum) ;
calculating a block energy value for said signal by summing the squares of said sample values from said number of samples ;
and for an update period equal to a sum of said plurality of block periods : assigning a maximum block energy value calculated during said update period to a variable E max ;
assigning a minimum block energy value calculated during said update period to a variable E min ;
calculating a noise energy threshold value based on the relative values of E max and E min , wherein between a first upper bound and a first lower bound said noise energy threshold may assume a continuum of values ;
calculating a voice energy threshold value based on the relative values of E max and E min , wherein between a second upper bound and a second lower bound said voice energy threshold may assume a continuum of values ;
and updating said noise energy threshold and said voice energy threshold in accordance with said calculations for their respective values ;
said voice energy estimation value E voice is updated according to the formula : E voice , n =(1-α voice)*E voice , n−1 +α voice *E n , where E voice , n is said voice energy estimation value for said current block period , α voice is a voice time constant , E voice , n−1 is said voice energy estimation value for an immediately preceding voice block period , and E n is said current block energy ;
and said noise energy estimation value E noise is updated according to the formula : E noise , n =(1-α noise)*E noise , n−1 +α- noise *E n , where E noise , n is said noise energy estimation value for said current block period , α noise is a noise time constant , E noise , n−1 is said noise energy estimation value for an immediately preceding noise block period , E n is said current block energy .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum (sample values) comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US6381570B2
CLAIM 1
. A method of discriminating noise and voice energy in a communication signal , comprising the steps of : for a plurality of block periods : sampling said signal a number of times to obtain sample values (current residual spectrum) ;
calculating a block energy value for said signal by summing the squares of said sample values from said number of samples ;
and for an update period equal to a sum of said plurality of block periods : assigning a maximum block energy value calculated during said update period to a variable E max ;
assigning a minimum block energy value calculated during said update period to a variable E min ;
calculating a noise energy threshold value based on the relative values of E max and E min , wherein between a first upper bound and a first lower bound said noise energy threshold may assume a continuum of values ;
calculating a voice energy threshold value based on the relative values of E max and E min , wherein between a second upper bound and a second lower bound said voice energy threshold may assume a continuum of values ;
and updating said noise energy threshold and said voice energy threshold in accordance with said calculations for their respective values ;
said voice energy estimation value E voice is updated according to the formula : E voice , n =(1-α voice)*E voice , n−1 +α voice *E n , where E voice , n is said voice energy estimation value for said current block period , α voice is a voice time constant , E voice , n−1 is said voice energy estimation value for an immediately preceding voice block period , and E n is said current block energy ;
and said noise energy estimation value E noise is updated according to the formula : E noise , n =(1-α noise)*E noise , n−1 +α- noise *E n , where E noise , n is said noise energy estimation value for said current block period , α noise is a noise time constant , E noise , n−1 is said noise energy estimation value for an immediately preceding noise block period , E n is said current block energy .

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum (sample values) comprises locating a maximum between each pair of two consecutive minima of the current residual spectrum .
US6381570B2
CLAIM 1
. A method of discriminating noise and voice energy in a communication signal , comprising the steps of : for a plurality of block periods : sampling said signal a number of times to obtain sample values (current residual spectrum) ;
calculating a block energy value for said signal by summing the squares of said sample values from said number of samples ;
and for an update period equal to a sum of said plurality of block periods : assigning a maximum block energy value calculated during said update period to a variable E max ;
assigning a minimum block energy value calculated during said update period to a variable E min ;
calculating a noise energy threshold value based on the relative values of E max and E min , wherein between a first upper bound and a first lower bound said noise energy threshold may assume a continuum of values ;
calculating a voice energy threshold value based on the relative values of E max and E min , wherein between a second upper bound and a second lower bound said voice energy threshold may assume a continuum of values ;
and updating said noise energy threshold and said voice energy threshold in accordance with said calculations for their respective values ;
said voice energy estimation value E voice is updated according to the formula : E voice , n =(1-α voice)*E voice , n−1 +α voice *E n , where E voice , n is said voice energy estimation value for said current block period , α voice is a voice time constant , E voice , n−1 is said voice energy estimation value for an immediately preceding voice block period , and E n is said current block energy ;
and said noise energy estimation value E noise is updated according to the formula : E noise , n =(1-α noise)*E noise , n−1 +α- noise *E n , where E noise , n is said noise energy estimation value for said current block period , α noise is a noise time constant , E noise , n−1 is said noise energy estimation value for an immediately preceding noise block period , E n is said current block energy .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum (sample values) , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (relative values) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US6381570B2
CLAIM 1
. A method of discriminating noise and voice energy in a communication signal , comprising the steps of : for a plurality of block periods : sampling said signal a number of times to obtain sample values (current residual spectrum) ;
calculating a block energy value for said signal by summing the squares of said sample values from said number of samples ;
and for an update period equal to a sum of said plurality of block periods : assigning a maximum block energy value calculated during said update period to a variable E max ;
assigning a minimum block energy value calculated during said update period to a variable E min ;
calculating a noise energy threshold value based on the relative values (frequency bins) of E max and E min , wherein between a first upper bound and a first lower bound said noise energy threshold may assume a continuum of values ;
calculating a voice energy threshold value based on the relative values of E max and E min , wherein between a second upper bound and a second lower bound said voice energy threshold may assume a continuum of values ;
and updating said noise energy threshold and said voice energy threshold in accordance with said calculations for their respective values ;
said voice energy estimation value E voice is updated according to the formula : E voice , n =(1-α voice)*E voice , n−1 +α voice *E n , where E voice , n is said voice energy estimation value for said current block period , α voice is a voice time constant , E voice , n−1 is said voice energy estimation value for an immediately preceding voice block period , and E n is said current block energy ;
and said noise energy estimation value E noise is updated according to the formula : E noise , n =(1-α noise)*E noise , n−1 +α- noise *E n , where E noise , n is said noise energy estimation value for said current block period , α noise is a noise time constant , E noise , n−1 is said noise energy estimation value for an immediately preceding noise block period , E n is said current block energy .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (relative values) so as to produce a summed long-term correlation map .
US6381570B2
CLAIM 1
. A method of discriminating noise and voice energy in a communication signal , comprising the steps of : for a plurality of block periods : sampling said signal a number of times to obtain sample values ;
calculating a block energy value for said signal by summing the squares of said sample values from said number of samples ;
and for an update period equal to a sum of said plurality of block periods : assigning a maximum block energy value calculated during said update period to a variable E max ;
assigning a minimum block energy value calculated during said update period to a variable E min ;
calculating a noise energy threshold value based on the relative values (frequency bins) of E max and E min , wherein between a first upper bound and a first lower bound said noise energy threshold may assume a continuum of values ;
calculating a voice energy threshold value based on the relative values of E max and E min , wherein between a second upper bound and a second lower bound said voice energy threshold may assume a continuum of values ;
and updating said noise energy threshold and said voice energy threshold in accordance with said calculations for their respective values ;
said voice energy estimation value E voice is updated according to the formula : E voice , n =(1-α voice)*E voice , n−1 +α voice *E n , where E voice , n is said voice energy estimation value for said current block period , α voice is a voice time constant , E voice , n−1 is said voice energy estimation value for an immediately preceding voice block period , and E n is said current block energy ;
and said noise energy estimation value E noise is updated according to the formula : E noise , n =(1-α noise)*E noise , n−1 +α- noise *E n , where E noise , n is said noise energy estimation value for said current block period , α noise is a noise time constant , E noise , n−1 is said noise energy estimation value for an immediately preceding noise block period , E n is said current block energy .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map for frequency bins (relative values) having a magnitude that exceeds a given fixed threshold .
US6381570B2
CLAIM 1
. A method of discriminating noise and voice energy in a communication signal , comprising the steps of : for a plurality of block periods : sampling said signal a number of times to obtain sample values ;
calculating a block energy value for said signal by summing the squares of said sample values from said number of samples ;
and for an update period equal to a sum of said plurality of block periods : assigning a maximum block energy value calculated during said update period to a variable E max ;
assigning a minimum block energy value calculated during said update period to a variable E min ;
calculating a noise energy threshold value based on the relative values (frequency bins) of E max and E min , wherein between a first upper bound and a first lower bound said noise energy threshold may assume a continuum of values ;
calculating a voice energy threshold value based on the relative values of E max and E min , wherein between a second upper bound and a second lower bound said voice energy threshold may assume a continuum of values ;
and updating said noise energy threshold and said voice energy threshold in accordance with said calculations for their respective values ;
said voice energy estimation value E voice is updated according to the formula : E voice , n =(1-α voice)*E voice , n−1 +α voice *E n , where E voice , n is said voice energy estimation value for said current block period , α voice is a voice time constant , E voice , n−1 is said voice energy estimation value for an immediately preceding voice block period , and E n is said current block energy ;
and said noise energy estimation value E noise is updated according to the formula : E noise , n =(1-α noise)*E noise , n−1 +α- noise *E n , where E noise , n is said noise energy estimation value for said current block period , α noise is a noise time constant , E noise , n−1 is said noise energy estimation value for an immediately preceding noise block period , E n is said current block energy .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error (energy threshold) energies .
US6381570B2
CLAIM 1
. A method of discriminating noise and voice energy in a communication signal , comprising the steps of : for a plurality of block periods : sampling said signal a number of times to obtain sample values ;
calculating a block energy value for said signal by summing the squares of said sample values from said number of samples ;
and for an update period equal to a sum of said plurality of block periods : assigning a maximum block energy value calculated during said update period to a variable E max ;
assigning a minimum block energy value calculated during said update period to a variable E min ;
calculating a noise energy threshold (residual error) value based on the relative values of E max and E min , wherein between a first upper bound and a first lower bound said noise energy threshold may assume a continuum of values ;
calculating a voice energy threshold value based on the relative values of E max and E min , wherein between a second upper bound and a second lower bound said voice energy threshold may assume a continuum of values ;
and updating said noise energy threshold and said voice energy threshold in accordance with said calculations for their respective values ;
said voice energy estimation value E voice is updated according to the formula : E voice , n =(1-α voice)*E voice , n−1 +α voice *E n , where E voice , n is said voice energy estimation value for said current block period , α voice is a voice time constant , E voice , n−1 is said voice energy estimation value for an immediately preceding voice block period , and E n is said current block energy ;
and said noise energy estimation value E noise is updated according to the formula : E noise , n =(1-α noise)*E noise , n−1 +α- noise *E n , where E noise , n is said noise energy estimation value for said current block period , α noise is a noise time constant , E noise , n−1 is said noise energy estimation value for an immediately preceding noise block period , E n is said current block energy .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum (sample values) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US6381570B2
CLAIM 1
. A method of discriminating noise and voice energy in a communication signal , comprising the steps of : for a plurality of block periods : sampling said signal a number of times to obtain sample values (current residual spectrum) ;
calculating a block energy value for said signal by summing the squares of said sample values from said number of samples ;
and for an update period equal to a sum of said plurality of block periods : assigning a maximum block energy value calculated during said update period to a variable E max ;
assigning a minimum block energy value calculated during said update period to a variable E min ;
calculating a noise energy threshold value based on the relative values of E max and E min , wherein between a first upper bound and a first lower bound said noise energy threshold may assume a continuum of values ;
calculating a voice energy threshold value based on the relative values of E max and E min , wherein between a second upper bound and a second lower bound said voice energy threshold may assume a continuum of values ;
and updating said noise energy threshold and said voice energy threshold in accordance with said calculations for their respective values ;
said voice energy estimation value E voice is updated according to the formula : E voice , n =(1-α voice)*E voice , n−1 +α voice *E n , where E voice , n is said voice energy estimation value for said current block period , α voice is a voice time constant , E voice , n−1 is said voice energy estimation value for an immediately preceding voice block period , and E n is said current block energy ;
and said noise energy estimation value E noise is updated according to the formula : E noise , n =(1-α noise)*E noise , n−1 +α- noise *E n , where E noise , n is said noise energy estimation value for said current block period , α noise is a noise time constant , E noise , n−1 is said noise energy estimation value for an immediately preceding noise block period , E n is said current block energy .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum (sample values) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US6381570B2
CLAIM 1
. A method of discriminating noise and voice energy in a communication signal , comprising the steps of : for a plurality of block periods : sampling said signal a number of times to obtain sample values (current residual spectrum) ;
calculating a block energy value for said signal by summing the squares of said sample values from said number of samples ;
and for an update period equal to a sum of said plurality of block periods : assigning a maximum block energy value calculated during said update period to a variable E max ;
assigning a minimum block energy value calculated during said update period to a variable E min ;
calculating a noise energy threshold value based on the relative values of E max and E min , wherein between a first upper bound and a first lower bound said noise energy threshold may assume a continuum of values ;
calculating a voice energy threshold value based on the relative values of E max and E min , wherein between a second upper bound and a second lower bound said voice energy threshold may assume a continuum of values ;
and updating said noise energy threshold and said voice energy threshold in accordance with said calculations for their respective values ;
said voice energy estimation value E voice is updated according to the formula : E voice , n =(1-α voice)*E voice , n−1 +α voice *E n , where E voice , n is said voice energy estimation value for said current block period , α voice is a voice time constant , E voice , n−1 is said voice energy estimation value for an immediately preceding voice block period , and E n is said current block energy ;
and said noise energy estimation value E noise is updated according to the formula : E noise , n =(1-α noise)*E noise , n−1 +α- noise *E n , where E noise , n is said noise energy estimation value for said current block period , α noise is a noise time constant , E noise , n−1 is said noise energy estimation value for an immediately preceding noise block period , E n is said current block energy .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum (sample values) comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US6381570B2
CLAIM 1
. A method of discriminating noise and voice energy in a communication signal , comprising the steps of : for a plurality of block periods : sampling said signal a number of times to obtain sample values (current residual spectrum) ;
calculating a block energy value for said signal by summing the squares of said sample values from said number of samples ;
and for an update period equal to a sum of said plurality of block periods : assigning a maximum block energy value calculated during said update period to a variable E max ;
assigning a minimum block energy value calculated during said update period to a variable E min ;
calculating a noise energy threshold value based on the relative values of E max and E min , wherein between a first upper bound and a first lower bound said noise energy threshold may assume a continuum of values ;
calculating a voice energy threshold value based on the relative values of E max and E min , wherein between a second upper bound and a second lower bound said voice energy threshold may assume a continuum of values ;
and updating said noise energy threshold and said voice energy threshold in accordance with said calculations for their respective values ;
said voice energy estimation value E voice is updated according to the formula : E voice , n =(1-α voice)*E voice , n−1 +α voice *E n , where E voice , n is said voice energy estimation value for said current block period , α voice is a voice time constant , E voice , n−1 is said voice energy estimation value for an immediately preceding voice block period , and E n is said current block energy ;
and said noise energy estimation value E noise is updated according to the formula : E noise , n =(1-α noise)*E noise , n−1 +α- noise *E n , where E noise , n is said noise energy estimation value for said current block period , α noise is a noise time constant , E noise , n−1 is said noise energy estimation value for an immediately preceding noise block period , E n is said current block energy .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (relative values) so as to produce a summed long-term correlation map .
US6381570B2
CLAIM 1
. A method of discriminating noise and voice energy in a communication signal , comprising the steps of : for a plurality of block periods : sampling said signal a number of times to obtain sample values ;
calculating a block energy value for said signal by summing the squares of said sample values from said number of samples ;
and for an update period equal to a sum of said plurality of block periods : assigning a maximum block energy value calculated during said update period to a variable E max ;
assigning a minimum block energy value calculated during said update period to a variable E min ;
calculating a noise energy threshold value based on the relative values (frequency bins) of E max and E min , wherein between a first upper bound and a first lower bound said noise energy threshold may assume a continuum of values ;
calculating a voice energy threshold value based on the relative values of E max and E min , wherein between a second upper bound and a second lower bound said voice energy threshold may assume a continuum of values ;
and updating said noise energy threshold and said voice energy threshold in accordance with said calculations for their respective values ;
said voice energy estimation value E voice is updated according to the formula : E voice , n =(1-α voice)*E voice , n−1 +α voice *E n , where E voice , n is said voice energy estimation value for said current block period , α voice is a voice time constant , E voice , n−1 is said voice energy estimation value for an immediately preceding voice block period , and E n is said current block energy ;
and said noise energy estimation value E noise is updated according to the formula : E noise , n =(1-α noise)*E noise , n−1 +α- noise *E n , where E noise , n is said noise energy estimation value for said current block period , α noise is a noise time constant , E noise , n−1 is said noise energy estimation value for an immediately preceding noise block period , E n is said current block energy .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6680972B1

Filed: 1999-02-09     Issued: 2004-01-20

Source coding enhancement using spectral-band replication

(Original Assignee) Coding Technologies Sweden AB     (Current Assignee) LARS GUSAF LILJERYD ; Dolby International AB

Lars Gustaf Liljeryd, Per Rune Albin Ekstrand, Lars Fredrik Henn, Hans Magnus Kristofer Kjörling
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (frequency bands, band signals, band channel) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US6680972B1
CLAIM 1
. A method for decoding an encoded signal , the encoded signal being derived from an original signal and representing only a portion of frequency bands (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) included in the original signal , comprising : providing subband samples for a plurality of subbands or a plurality of spectral coefficients , the subband samples or spectral coefficients representing the encoded signal ;
transposing subband samples or spectral coefficients , which represent source frequencies to corresponding destination frequencies in a reconstruction frequency band which is included in the original signal and which is not included in the encoded signal , wherein a destination frequency is related to a corresponding source frequency by means of the following equation : f dest =f source ·M±Δf source , wherein f dest is a destination frequency , f source is a source frequency corresponding to the destination frequency , M is a transposition factor not equal to one , and Δf source is a deviation from an exact transposition being greater than or equal to zero and smaller than 5 percent of a bandwidth of a critical band , in which the destination frequency is located , wherein , for each subband or frequency coefficient for a certain destination frequency , phase information for respective subband samples or a respective frequency coefficient is derived only from phase information from subband samples or a frequency coefficient for a certain source frequency which corresponds to the certain destination frequency , wherein the subband samples or spectral coefficients are adjusted using spectral envelope information derived from the original signal or the encoded signal to obtain adjusted transposed subband samples or adjusted transposed spectral coefficients , before or after the step of transposing ;
and combining the subband samples and the adjusted transposed subband samples or the spectral coefficients and the adjusted transposed spectral coefficients , such that a decoded output signal is obtained .

US6680972B1
CLAIM 5
. A method according to claim 4 , wherein the spectral envelope information is transmitted as subband samples in an arbitrary number of subband channel (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) s of the encoded signal , where the gains of said subband channels are set to a low level ;
whereby compatibility with standardised decoders is ensured .

US6680972B1
CLAIM 11
. A method according to claim 1 , wherein the step of providing includes the step of bandpass filtering a signal using an analysis filter bank or transform of such a nature that real- or complex-valued subband signals (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) of lowpass type corresponding to source frequencies are generated ;
wherein the step of transposing includes the step of patching an arbitrary number of channels k of said analysis filter bank or transform to channels Mk , which correspond to destination frequencies , M≠1 , in a synthesis filter bank or transform ;
and wherein the filter bank or transform is used in the step of combining .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (frequency bands, band signals, band channel) of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US6680972B1
CLAIM 1
. A method for decoding an encoded signal , the encoded signal being derived from an original signal and representing only a portion of frequency bands (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) included in the original signal , comprising : providing subband samples for a plurality of subbands or a plurality of spectral coefficients , the subband samples or spectral coefficients representing the encoded signal ;
transposing subband samples or spectral coefficients , which represent source frequencies to corresponding destination frequencies in a reconstruction frequency band which is included in the original signal and which is not included in the encoded signal , wherein a destination frequency is related to a corresponding source frequency by means of the following equation : f dest =f source ·M±Δf source , wherein f dest is a destination frequency , f source is a source frequency corresponding to the destination frequency , M is a transposition factor not equal to one , and Δf source is a deviation from an exact transposition being greater than or equal to zero and smaller than 5 percent of a bandwidth of a critical band , in which the destination frequency is located , wherein , for each subband or frequency coefficient for a certain destination frequency , phase information for respective subband samples or a respective frequency coefficient is derived only from phase information from subband samples or a frequency coefficient for a certain source frequency which corresponds to the certain destination frequency , wherein the subband samples or spectral coefficients are adjusted using spectral envelope information derived from the original signal or the encoded signal to obtain adjusted transposed subband samples or adjusted transposed spectral coefficients , before or after the step of transposing ;
and combining the subband samples and the adjusted transposed subband samples or the spectral coefficients and the adjusted transposed spectral coefficients , such that a decoded output signal is obtained .

US6680972B1
CLAIM 5
. A method according to claim 4 , wherein the spectral envelope information is transmitted as subband samples in an arbitrary number of subband channel (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) s of the encoded signal , where the gains of said subband channels are set to a low level ;
whereby compatibility with standardised decoders is ensured .

US6680972B1
CLAIM 11
. A method according to claim 1 , wherein the step of providing includes the step of bandpass filtering a signal using an analysis filter bank or transform of such a nature that real- or complex-valued subband signals (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) of lowpass type corresponding to source frequencies are generated ;
wherein the step of transposing includes the step of patching an arbitrary number of channels k of said analysis filter bank or transform to channels Mk , which correspond to destination frequencies , M≠1 , in a synthesis filter bank or transform ;
and wherein the filter bank or transform is used in the step of combining .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (frequency bands, band signals, band channel) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US6680972B1
CLAIM 1
. A method for decoding an encoded signal , the encoded signal being derived from an original signal and representing only a portion of frequency bands (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) included in the original signal , comprising : providing subband samples for a plurality of subbands or a plurality of spectral coefficients , the subband samples or spectral coefficients representing the encoded signal ;
transposing subband samples or spectral coefficients , which represent source frequencies to corresponding destination frequencies in a reconstruction frequency band which is included in the original signal and which is not included in the encoded signal , wherein a destination frequency is related to a corresponding source frequency by means of the following equation : f dest =f source ·M±Δf source , wherein f dest is a destination frequency , f source is a source frequency corresponding to the destination frequency , M is a transposition factor not equal to one , and Δf source is a deviation from an exact transposition being greater than or equal to zero and smaller than 5 percent of a bandwidth of a critical band , in which the destination frequency is located , wherein , for each subband or frequency coefficient for a certain destination frequency , phase information for respective subband samples or a respective frequency coefficient is derived only from phase information from subband samples or a frequency coefficient for a certain source frequency which corresponds to the certain destination frequency , wherein the subband samples or spectral coefficients are adjusted using spectral envelope information derived from the original signal or the encoded signal to obtain adjusted transposed subband samples or adjusted transposed spectral coefficients , before or after the step of transposing ;
and combining the subband samples and the adjusted transposed subband samples or the spectral coefficients and the adjusted transposed spectral coefficients , such that a decoded output signal is obtained .

US6680972B1
CLAIM 5
. A method according to claim 4 , wherein the spectral envelope information is transmitted as subband samples in an arbitrary number of subband channel (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) s of the encoded signal , where the gains of said subband channels are set to a low level ;
whereby compatibility with standardised decoders is ensured .

US6680972B1
CLAIM 11
. A method according to claim 1 , wherein the step of providing includes the step of bandpass filtering a signal using an analysis filter bank or transform of such a nature that real- or complex-valued subband signals (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) of lowpass type corresponding to source frequencies are generated ;
wherein the step of transposing includes the step of patching an arbitrary number of channels k of said analysis filter bank or transform to channels Mk , which correspond to destination frequencies , M≠1 , in a synthesis filter bank or transform ;
and wherein the filter bank or transform is used in the step of combining .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin (frequency bands, band signals, band channel) by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (frequency bands, band signals, band channel) so as to produce a summed long-term correlation map .
US6680972B1
CLAIM 1
. A method for decoding an encoded signal , the encoded signal being derived from an original signal and representing only a portion of frequency bands (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) included in the original signal , comprising : providing subband samples for a plurality of subbands or a plurality of spectral coefficients , the subband samples or spectral coefficients representing the encoded signal ;
transposing subband samples or spectral coefficients , which represent source frequencies to corresponding destination frequencies in a reconstruction frequency band which is included in the original signal and which is not included in the encoded signal , wherein a destination frequency is related to a corresponding source frequency by means of the following equation : f dest =f source ·M±Δf source , wherein f dest is a destination frequency , f source is a source frequency corresponding to the destination frequency , M is a transposition factor not equal to one , and Δf source is a deviation from an exact transposition being greater than or equal to zero and smaller than 5 percent of a bandwidth of a critical band , in which the destination frequency is located , wherein , for each subband or frequency coefficient for a certain destination frequency , phase information for respective subband samples or a respective frequency coefficient is derived only from phase information from subband samples or a frequency coefficient for a certain source frequency which corresponds to the certain destination frequency , wherein the subband samples or spectral coefficients are adjusted using spectral envelope information derived from the original signal or the encoded signal to obtain adjusted transposed subband samples or adjusted transposed spectral coefficients , before or after the step of transposing ;
and combining the subband samples and the adjusted transposed subband samples or the spectral coefficients and the adjusted transposed spectral coefficients , such that a decoded output signal is obtained .

US6680972B1
CLAIM 5
. A method according to claim 4 , wherein the spectral envelope information is transmitted as subband samples in an arbitrary number of subband channel (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) s of the encoded signal , where the gains of said subband channels are set to a low level ;
whereby compatibility with standardised decoders is ensured .

US6680972B1
CLAIM 11
. A method according to claim 1 , wherein the step of providing includes the step of bandpass filtering a signal using an analysis filter bank or transform of such a nature that real- or complex-valued subband signals (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) of lowpass type corresponding to source frequencies are generated ;
wherein the step of transposing includes the step of patching an arbitrary number of channels k of said analysis filter bank or transform to channels Mk , which correspond to destination frequencies , M≠1 , in a synthesis filter bank or transform ;
and wherein the filter bank or transform is used in the step of combining .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map for frequency bins (frequency bands, band signals, band channel) having a magnitude that exceeds a given fixed threshold .
US6680972B1
CLAIM 1
. A method for decoding an encoded signal , the encoded signal being derived from an original signal and representing only a portion of frequency bands (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) included in the original signal , comprising : providing subband samples for a plurality of subbands or a plurality of spectral coefficients , the subband samples or spectral coefficients representing the encoded signal ;
transposing subband samples or spectral coefficients , which represent source frequencies to corresponding destination frequencies in a reconstruction frequency band which is included in the original signal and which is not included in the encoded signal , wherein a destination frequency is related to a corresponding source frequency by means of the following equation : f dest =f source ·M±Δf source , wherein f dest is a destination frequency , f source is a source frequency corresponding to the destination frequency , M is a transposition factor not equal to one , and Δf source is a deviation from an exact transposition being greater than or equal to zero and smaller than 5 percent of a bandwidth of a critical band , in which the destination frequency is located , wherein , for each subband or frequency coefficient for a certain destination frequency , phase information for respective subband samples or a respective frequency coefficient is derived only from phase information from subband samples or a frequency coefficient for a certain source frequency which corresponds to the certain destination frequency , wherein the subband samples or spectral coefficients are adjusted using spectral envelope information derived from the original signal or the encoded signal to obtain adjusted transposed subband samples or adjusted transposed spectral coefficients , before or after the step of transposing ;
and combining the subband samples and the adjusted transposed subband samples or the spectral coefficients and the adjusted transposed spectral coefficients , such that a decoded output signal is obtained .

US6680972B1
CLAIM 5
. A method according to claim 4 , wherein the spectral envelope information is transmitted as subband samples in an arbitrary number of subband channel (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) s of the encoded signal , where the gains of said subband channels are set to a low level ;
whereby compatibility with standardised decoders is ensured .

US6680972B1
CLAIM 11
. A method according to claim 1 , wherein the step of providing includes the step of bandpass filtering a signal using an analysis filter bank or transform of such a nature that real- or complex-valued subband signals (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) of lowpass type corresponding to source frequencies are generated ;
wherein the step of transposing includes the step of patching an arbitrary number of channels k of said analysis filter bank or transform to channels Mk , which correspond to destination frequencies , M≠1 , in a synthesis filter bank or transform ;
and wherein the filter bank or transform is used in the step of combining .

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (first adder) ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US6680972B1
CLAIM 18
. A decoder according to claim 17 , in which the decoded output signal is a monophonic audio signal , the decoder further comprising : a first delay and a first attenuator for forming a first delayed signal ;
a second delay being different from the first delay and a second attenuator for forming a second delayed signal ;
a first adder (background noise signal) for adding said decoded output signal and said first delayed signal , forming a left-channel output signal ;
and a second adder for adding said decoded output signal and said second delayed signal , forming a right-channel output signal ;
whereby obtaining a pseudo stereophonic signal .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal (first adder) and prevent update of noise energy estimates on the music signal .
US6680972B1
CLAIM 18
. A decoder according to claim 17 , in which the decoded output signal is a monophonic audio signal , the decoder further comprising : a first delay and a first attenuator for forming a first delayed signal ;
a second delay being different from the first delay and a second attenuator for forming a second delayed signal ;
a first adder (background noise signal) for adding said decoded output signal and said first delayed signal , forming a left-channel output signal ;
and a second adder for adding said decoded output signal and said second delayed signal , forming a right-channel output signal ;
whereby obtaining a pseudo stereophonic signal .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame and an energy of the sound signal in a previous frame , for frequency bands (frequency bands, band signals, band channel) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US6680972B1
CLAIM 1
. A method for decoding an encoded signal , the encoded signal being derived from an original signal and representing only a portion of frequency bands (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) included in the original signal , comprising : providing subband samples for a plurality of subbands or a plurality of spectral coefficients , the subband samples or spectral coefficients representing the encoded signal ;
transposing subband samples or spectral coefficients , which represent source frequencies to corresponding destination frequencies in a reconstruction frequency band which is included in the original signal and which is not included in the encoded signal , wherein a destination frequency is related to a corresponding source frequency by means of the following equation : f dest =f source ·M±Δf source , wherein f dest is a destination frequency , f source is a source frequency corresponding to the destination frequency , M is a transposition factor not equal to one , and Δf source is a deviation from an exact transposition being greater than or equal to zero and smaller than 5 percent of a bandwidth of a critical band , in which the destination frequency is located , wherein , for each subband or frequency coefficient for a certain destination frequency , phase information for respective subband samples or a respective frequency coefficient is derived only from phase information from subband samples or a frequency coefficient for a certain source frequency which corresponds to the certain destination frequency , wherein the subband samples or spectral coefficients are adjusted using spectral envelope information derived from the original signal or the encoded signal to obtain adjusted transposed subband samples or adjusted transposed spectral coefficients , before or after the step of transposing ;
and combining the subband samples and the adjusted transposed subband samples or the spectral coefficients and the adjusted transposed spectral coefficients , such that a decoded output signal is obtained .

US6680972B1
CLAIM 5
. A method according to claim 4 , wherein the spectral envelope information is transmitted as subband samples in an arbitrary number of subband channel (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) s of the encoded signal , where the gains of said subband channels are set to a low level ;
whereby compatibility with standardised decoders is ensured .

US6680972B1
CLAIM 11
. A method according to claim 1 , wherein the step of providing includes the step of bandpass filtering a signal using an analysis filter bank or transform of such a nature that real- or complex-valued subband signals (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) of lowpass type corresponding to source frequencies are generated ;
wherein the step of transposing includes the step of patching an arbitrary number of channels k of said analysis filter bank or transform to channels Mk , which correspond to destination frequencies , M≠1 , in a synthesis filter bank or transform ;
and wherein the filter bank or transform is used in the step of combining .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (frequency bands, band signals, band channel) into a first group (frequency bands, band signals, band channel) of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy (frequency bands, band signals, band channel) value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US6680972B1
CLAIM 1
. A method for decoding an encoded signal , the encoded signal being derived from an original signal and representing only a portion of frequency bands (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) included in the original signal , comprising : providing subband samples for a plurality of subbands or a plurality of spectral coefficients , the subband samples or spectral coefficients representing the encoded signal ;
transposing subband samples or spectral coefficients , which represent source frequencies to corresponding destination frequencies in a reconstruction frequency band which is included in the original signal and which is not included in the encoded signal , wherein a destination frequency is related to a corresponding source frequency by means of the following equation : f dest =f source ·M±Δf source , wherein f dest is a destination frequency , f source is a source frequency corresponding to the destination frequency , M is a transposition factor not equal to one , and Δf source is a deviation from an exact transposition being greater than or equal to zero and smaller than 5 percent of a bandwidth of a critical band , in which the destination frequency is located , wherein , for each subband or frequency coefficient for a certain destination frequency , phase information for respective subband samples or a respective frequency coefficient is derived only from phase information from subband samples or a frequency coefficient for a certain source frequency which corresponds to the certain destination frequency , wherein the subband samples or spectral coefficients are adjusted using spectral envelope information derived from the original signal or the encoded signal to obtain adjusted transposed subband samples or adjusted transposed spectral coefficients , before or after the step of transposing ;
and combining the subband samples and the adjusted transposed subband samples or the spectral coefficients and the adjusted transposed spectral coefficients , such that a decoded output signal is obtained .

US6680972B1
CLAIM 5
. A method according to claim 4 , wherein the spectral envelope information is transmitted as subband samples in an arbitrary number of subband channel (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) s of the encoded signal , where the gains of said subband channels are set to a low level ;
whereby compatibility with standardised decoders is ensured .

US6680972B1
CLAIM 11
. A method according to claim 1 , wherein the step of providing includes the step of bandpass filtering a signal using an analysis filter bank or transform of such a nature that real- or complex-valued subband signals (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) of lowpass type corresponding to source frequencies are generated ;
wherein the step of transposing includes the step of patching an arbitrary number of channels k of said analysis filter bank or transform to channels Mk , which correspond to destination frequencies , M≠1 , in a synthesis filter bank or transform ;
and wherein the filter bank or transform is used in the step of combining .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (frequency bands, band signals, band channel) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US6680972B1
CLAIM 1
. A method for decoding an encoded signal , the encoded signal being derived from an original signal and representing only a portion of frequency bands (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) included in the original signal , comprising : providing subband samples for a plurality of subbands or a plurality of spectral coefficients , the subband samples or spectral coefficients representing the encoded signal ;
transposing subband samples or spectral coefficients , which represent source frequencies to corresponding destination frequencies in a reconstruction frequency band which is included in the original signal and which is not included in the encoded signal , wherein a destination frequency is related to a corresponding source frequency by means of the following equation : f dest =f source ·M±Δf source , wherein f dest is a destination frequency , f source is a source frequency corresponding to the destination frequency , M is a transposition factor not equal to one , and Δf source is a deviation from an exact transposition being greater than or equal to zero and smaller than 5 percent of a bandwidth of a critical band , in which the destination frequency is located , wherein , for each subband or frequency coefficient for a certain destination frequency , phase information for respective subband samples or a respective frequency coefficient is derived only from phase information from subband samples or a frequency coefficient for a certain source frequency which corresponds to the certain destination frequency , wherein the subband samples or spectral coefficients are adjusted using spectral envelope information derived from the original signal or the encoded signal to obtain adjusted transposed subband samples or adjusted transposed spectral coefficients , before or after the step of transposing ;
and combining the subband samples and the adjusted transposed subband samples or the spectral coefficients and the adjusted transposed spectral coefficients , such that a decoded output signal is obtained .

US6680972B1
CLAIM 5
. A method according to claim 4 , wherein the spectral envelope information is transmitted as subband samples in an arbitrary number of subband channel (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) s of the encoded signal , where the gains of said subband channels are set to a low level ;
whereby compatibility with standardised decoders is ensured .

US6680972B1
CLAIM 11
. A method according to claim 1 , wherein the step of providing includes the step of bandpass filtering a signal using an analysis filter bank or transform of such a nature that real- or complex-valued subband signals (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) of lowpass type corresponding to source frequencies are generated ;
wherein the step of transposing includes the step of patching an arbitrary number of channels k of said analysis filter bank or transform to channels Mk , which correspond to destination frequencies , M≠1 , in a synthesis filter bank or transform ;
and wherein the filter bank or transform is used in the step of combining .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (frequency bands, band signals, band channel) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US6680972B1
CLAIM 1
. A method for decoding an encoded signal , the encoded signal being derived from an original signal and representing only a portion of frequency bands (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) included in the original signal , comprising : providing subband samples for a plurality of subbands or a plurality of spectral coefficients , the subband samples or spectral coefficients representing the encoded signal ;
transposing subband samples or spectral coefficients , which represent source frequencies to corresponding destination frequencies in a reconstruction frequency band which is included in the original signal and which is not included in the encoded signal , wherein a destination frequency is related to a corresponding source frequency by means of the following equation : f dest =f source ·M±Δf source , wherein f dest is a destination frequency , f source is a source frequency corresponding to the destination frequency , M is a transposition factor not equal to one , and Δf source is a deviation from an exact transposition being greater than or equal to zero and smaller than 5 percent of a bandwidth of a critical band , in which the destination frequency is located , wherein , for each subband or frequency coefficient for a certain destination frequency , phase information for respective subband samples or a respective frequency coefficient is derived only from phase information from subband samples or a frequency coefficient for a certain source frequency which corresponds to the certain destination frequency , wherein the subband samples or spectral coefficients are adjusted using spectral envelope information derived from the original signal or the encoded signal to obtain adjusted transposed subband samples or adjusted transposed spectral coefficients , before or after the step of transposing ;
and combining the subband samples and the adjusted transposed subband samples or the spectral coefficients and the adjusted transposed spectral coefficients , such that a decoded output signal is obtained .

US6680972B1
CLAIM 5
. A method according to claim 4 , wherein the spectral envelope information is transmitted as subband samples in an arbitrary number of subband channel (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) s of the encoded signal , where the gains of said subband channels are set to a low level ;
whereby compatibility with standardised decoders is ensured .

US6680972B1
CLAIM 11
. A method according to claim 1 , wherein the step of providing includes the step of bandpass filtering a signal using an analysis filter bank or transform of such a nature that real- or complex-valued subband signals (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) of lowpass type corresponding to source frequencies are generated ;
wherein the step of transposing includes the step of patching an arbitrary number of channels k of said analysis filter bank or transform to channels Mk , which correspond to destination frequencies , M≠1 , in a synthesis filter bank or transform ;
and wherein the filter bank or transform is used in the step of combining .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (frequency bands, band signals, band channel) of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US6680972B1
CLAIM 1
. A method for decoding an encoded signal , the encoded signal being derived from an original signal and representing only a portion of frequency bands (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) included in the original signal , comprising : providing subband samples for a plurality of subbands or a plurality of spectral coefficients , the subband samples or spectral coefficients representing the encoded signal ;
transposing subband samples or spectral coefficients , which represent source frequencies to corresponding destination frequencies in a reconstruction frequency band which is included in the original signal and which is not included in the encoded signal , wherein a destination frequency is related to a corresponding source frequency by means of the following equation : f dest =f source ·M±Δf source , wherein f dest is a destination frequency , f source is a source frequency corresponding to the destination frequency , M is a transposition factor not equal to one , and Δf source is a deviation from an exact transposition being greater than or equal to zero and smaller than 5 percent of a bandwidth of a critical band , in which the destination frequency is located , wherein , for each subband or frequency coefficient for a certain destination frequency , phase information for respective subband samples or a respective frequency coefficient is derived only from phase information from subband samples or a frequency coefficient for a certain source frequency which corresponds to the certain destination frequency , wherein the subband samples or spectral coefficients are adjusted using spectral envelope information derived from the original signal or the encoded signal to obtain adjusted transposed subband samples or adjusted transposed spectral coefficients , before or after the step of transposing ;
and combining the subband samples and the adjusted transposed subband samples or the spectral coefficients and the adjusted transposed spectral coefficients , such that a decoded output signal is obtained .

US6680972B1
CLAIM 5
. A method according to claim 4 , wherein the spectral envelope information is transmitted as subband samples in an arbitrary number of subband channel (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) s of the encoded signal , where the gains of said subband channels are set to a low level ;
whereby compatibility with standardised decoders is ensured .

US6680972B1
CLAIM 11
. A method according to claim 1 , wherein the step of providing includes the step of bandpass filtering a signal using an analysis filter bank or transform of such a nature that real- or complex-valued subband signals (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) of lowpass type corresponding to source frequencies are generated ;
wherein the step of transposing includes the step of patching an arbitrary number of channels k of said analysis filter bank or transform to channels Mk , which correspond to destination frequencies , M≠1 , in a synthesis filter bank or transform ;
and wherein the filter bank or transform is used in the step of combining .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin (frequency bands, band signals, band channel) by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (frequency bands, band signals, band channel) so as to produce a summed long-term correlation map .
US6680972B1
CLAIM 1
. A method for decoding an encoded signal , the encoded signal being derived from an original signal and representing only a portion of frequency bands (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) included in the original signal , comprising : providing subband samples for a plurality of subbands or a plurality of spectral coefficients , the subband samples or spectral coefficients representing the encoded signal ;
transposing subband samples or spectral coefficients , which represent source frequencies to corresponding destination frequencies in a reconstruction frequency band which is included in the original signal and which is not included in the encoded signal , wherein a destination frequency is related to a corresponding source frequency by means of the following equation : f dest =f source ·M±Δf source , wherein f dest is a destination frequency , f source is a source frequency corresponding to the destination frequency , M is a transposition factor not equal to one , and Δf source is a deviation from an exact transposition being greater than or equal to zero and smaller than 5 percent of a bandwidth of a critical band , in which the destination frequency is located , wherein , for each subband or frequency coefficient for a certain destination frequency , phase information for respective subband samples or a respective frequency coefficient is derived only from phase information from subband samples or a frequency coefficient for a certain source frequency which corresponds to the certain destination frequency , wherein the subband samples or spectral coefficients are adjusted using spectral envelope information derived from the original signal or the encoded signal to obtain adjusted transposed subband samples or adjusted transposed spectral coefficients , before or after the step of transposing ;
and combining the subband samples and the adjusted transposed subband samples or the spectral coefficients and the adjusted transposed spectral coefficients , such that a decoded output signal is obtained .

US6680972B1
CLAIM 5
. A method according to claim 4 , wherein the spectral envelope information is transmitted as subband samples in an arbitrary number of subband channel (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) s of the encoded signal , where the gains of said subband channels are set to a low level ;
whereby compatibility with standardised decoders is ensured .

US6680972B1
CLAIM 11
. A method according to claim 1 , wherein the step of providing includes the step of bandpass filtering a signal using an analysis filter bank or transform of such a nature that real- or complex-valued subband signals (frequency bands, first group, first frequency, first energy, frequency spectrum, frequency bins, frequency bin, frequency bin basis, frequency dependent signal, first frequency bands, first energy value) of lowpass type corresponding to source frequencies are generated ;
wherein the step of transposing includes the step of patching an arbitrary number of channels k of said analysis filter bank or transform to channels Mk , which correspond to destination frequencies , M≠1 , in a synthesis filter bank or transform ;
and wherein the filter bank or transform is used in the step of combining .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (first adder) ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US6680972B1
CLAIM 18
. A decoder according to claim 17 , in which the decoded output signal is a monophonic audio signal , the decoder further comprising : a first delay and a first attenuator for forming a first delayed signal ;
a second delay being different from the first delay and a second attenuator for forming a second delayed signal ;
a first adder (background noise signal) for adding said decoded output signal and said first delayed signal , forming a left-channel output signal ;
and a second adder for adding said decoded output signal and said second delayed signal , forming a right-channel output signal ;
whereby obtaining a pseudo stereophonic signal .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal (first adder) ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US6680972B1
CLAIM 18
. A decoder according to claim 17 , in which the decoded output signal is a monophonic audio signal , the decoder further comprising : a first delay and a first attenuator for forming a first delayed signal ;
a second delay being different from the first delay and a second attenuator for forming a second delayed signal ;
a first adder (background noise signal) for adding said decoded output signal and said first delayed signal , forming a left-channel output signal ;
and a second adder for adding said decoded output signal and said second delayed signal , forming a right-channel output signal ;
whereby obtaining a pseudo stereophonic signal .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal (bandpass filter) to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US6680972B1
CLAIM 9
. A method according claim 1 , wherein the step of providing includes a step of filtering a signal through a set of N≧2 bandpass filter (average signal) s with passbands comprising the source frequencies [f 1 , . . . , f N ] respectively , forming N bandpass signals ;
wherein the step of transposing includes the step of shifting the bandpass signals in frequency to regions comprising the destination frequencies M[f 1 , . . . , f N ] .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal (first adder) and preventing update of noise energy estimates .
US6680972B1
CLAIM 18
. A decoder according to claim 17 , in which the decoded output signal is a monophonic audio signal , the decoder further comprising : a first delay and a first attenuator for forming a first delayed signal ;
a second delay being different from the first delay and a second attenuator for forming a second delayed signal ;
a first adder (background noise signal) for adding said decoded output signal and said first delayed signal , forming a left-channel output signal ;
and a second adder for adding said decoded output signal and said second delayed signal , forming a right-channel output signal ;
whereby obtaining a pseudo stereophonic signal .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US20030009325A1

Filed: 1999-01-22     Issued: 2003-01-09

Method for signal controlled switching between different audio coding schemes

(Original Assignee) Deutsche Telekom AG     (Current Assignee) Deutsche Telekom AG

Raif Kirchherr, Joachim Stegmann
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long term correlation map .
US20030009325A1
CLAIM 5
. The method as recited in claim 4 further comprising sampling the input audio signals so as to form a plurality of frames , the plurality of frames including a current frame (current frame) to be classified and a previous frame , the classifying step further including determining a difference between LSF coefficients of the current frame and the previous frame .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame (current frame) ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US20030009325A1
CLAIM 5
. The method as recited in claim 4 further comprising sampling the input audio signals so as to form a plurality of frames , the plurality of frames including a current frame (current frame) to be classified and a previous frame , the classifying step further including determining a difference between LSF coefficients of the current frame and the previous frame .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (previous frame, speech signal) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
US20030009325A1
CLAIM 1
. A method for signal controlled switching between audio coding schemes comprising : receiving input audio signals ;
classifying a first set of the input audio signals as speech or non-speech signal (activity prediction parameter, noise character parameter) s ;
coding the speech signals using a time domain coding scheme ;
and coding the nonspeech signals using a transform coding scheme .

US20030009325A1
CLAIM 5
. The method as recited in claim 4 further comprising sampling the input audio signals so as to form a plurality of frames , the plurality of frames including a current frame to be classified and a previous frame (activity prediction parameter, noise character parameter) , the classifying step further including determining a difference between LSF coefficients of the current frame and the previous frame .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame (current frame) energy and an average frame energy .
US20030009325A1
CLAIM 5
. The method as recited in claim 4 further comprising sampling the input audio signals so as to form a plurality of frames , the plurality of frames including a current frame (current frame) to be classified and a previous frame , the classifying step further including determining a difference between LSF coefficients of the current frame and the previous frame .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame (current frame) and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US20030009325A1
CLAIM 5
. The method as recited in claim 4 further comprising sampling the input audio signals so as to form a plurality of frames , the plurality of frames including a current frame (current frame) to be classified and a previous frame , the classifying step further including determining a difference between LSF coefficients of the current frame and the previous frame .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (previous frame, speech signal) indicative of an activity of the sound signal .
US20030009325A1
CLAIM 1
. A method for signal controlled switching between audio coding schemes comprising : receiving input audio signals ;
classifying a first set of the input audio signals as speech or non-speech signal (activity prediction parameter, noise character parameter) s ;
coding the speech signals using a time domain coding scheme ;
and coding the nonspeech signals using a transform coding scheme .

US20030009325A1
CLAIM 5
. The method as recited in claim 4 further comprising sampling the input audio signals so as to form a plurality of frames , the plurality of frames including a current frame to be classified and a previous frame (activity prediction parameter, noise character parameter) , the classifying step further including determining a difference between LSF coefficients of the current frame and the previous frame .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (previous frame, speech signal) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
US20030009325A1
CLAIM 1
. A method for signal controlled switching between audio coding schemes comprising : receiving input audio signals ;
classifying a first set of the input audio signals as speech or non-speech signal (activity prediction parameter, noise character parameter) s ;
coding the speech signals using a time domain coding scheme ;
and coding the nonspeech signals using a transform coding scheme .

US20030009325A1
CLAIM 5
. The method as recited in claim 4 further comprising sampling the input audio signals so as to form a plurality of frames , the plurality of frames including a current frame to be classified and a previous frame (activity prediction parameter, noise character parameter) , the classifying step further including determining a difference between LSF coefficients of the current frame and the previous frame .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (previous frame, speech signal) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US20030009325A1
CLAIM 1
. A method for signal controlled switching between audio coding schemes comprising : receiving input audio signals ;
classifying a first set of the input audio signals as speech or non-speech signal (activity prediction parameter, noise character parameter) s ;
coding the speech signals using a time domain coding scheme ;
and coding the nonspeech signals using a transform coding scheme .

US20030009325A1
CLAIM 5
. The method as recited in claim 4 further comprising sampling the input audio signals so as to form a plurality of frames , the plurality of frames including a current frame to be classified and a previous frame (activity prediction parameter, noise character parameter) , the classifying step further including determining a difference between LSF coefficients of the current frame and the previous frame .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (previous frame, speech signal) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US20030009325A1
CLAIM 1
. A method for signal controlled switching between audio coding schemes comprising : receiving input audio signals ;
classifying a first set of the input audio signals as speech or non-speech signal (activity prediction parameter, noise character parameter) s ;
coding the speech signals using a time domain coding scheme ;
and coding the nonspeech signals using a transform coding scheme .

US20030009325A1
CLAIM 5
. The method as recited in claim 4 further comprising sampling the input audio signals so as to form a plurality of frames , the plurality of frames including a current frame to be classified and a previous frame (activity prediction parameter, noise character parameter) , the classifying step further including determining a difference between LSF coefficients of the current frame and the previous frame .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (previous frame, speech signal) inferior than a given fixed threshold .
US20030009325A1
CLAIM 1
. A method for signal controlled switching between audio coding schemes comprising : receiving input audio signals ;
classifying a first set of the input audio signals as speech or non-speech signal (activity prediction parameter, noise character parameter) s ;
coding the speech signals using a time domain coding scheme ;
and coding the nonspeech signals using a transform coding scheme .

US20030009325A1
CLAIM 5
. The method as recited in claim 4 further comprising sampling the input audio signals so as to form a plurality of frames , the plurality of frames including a current frame to be classified and a previous frame (activity prediction parameter, noise character parameter) , the classifying step further including determining a difference between LSF coefficients of the current frame and the previous frame .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long-term correlation map .
US20030009325A1
CLAIM 5
. The method as recited in claim 4 further comprising sampling the input audio signals so as to form a plurality of frames , the plurality of frames including a current frame (current frame) to be classified and a previous frame , the classifying step further including determining a difference between LSF coefficients of the current frame and the previous frame .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (current frame) , and an initial value of the long-term correlation map .
US20030009325A1
CLAIM 5
. The method as recited in claim 4 further comprising sampling the input audio signals so as to form a plurality of frames , the plurality of frames including a current frame (current frame) to be classified and a previous frame , the classifying step further including determining a difference between LSF coefficients of the current frame and the previous frame .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame (current frame) ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US20030009325A1
CLAIM 5
. The method as recited in claim 4 further comprising sampling the input audio signals so as to form a plurality of frames , the plurality of frames including a current frame (current frame) to be classified and a previous frame , the classifying step further including determining a difference between LSF coefficients of the current frame and the previous frame .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6266633B1

Filed: 1998-12-22     Issued: 2001-07-24

Noise suppression and channel equalization preprocessor for speech and speaker recognizers: method and apparatus

(Original Assignee) ITT Manufacturing Enterprises LLC     (Current Assignee) Harris Corp

Alan Lawrence Higgins, Steven F. Boll, Jack E. Porter
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (noise component) using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map (probability density function) between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function (correlation map) of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal (noise component) in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map (probability density function) comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component value associated with each frequency within said set of frequencies based on a probability density function (correlation map) of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map (probability density function) comprises : filtering the correlation map through a one-pole filter on a frequency bin (fast Fourier transform) by frequency bin basis ;

and summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US6266633B1
CLAIM 1
. A method for combining noise suppression and channel equalization in a preprocessor for enhancing the quality of a noisy input voice signal comprising : sampling said noisy voice signal at a predetermined sampling rate (frequency bin basis) f s ;
segmenting said sampled voice signal into a plurality of frames ;
transforming each of said frames into a magnitude and phase spectural sample representation as a function of a predetermined set of discrete frequencies f ;
determining a noise threshold N f associated with each frequency f ;
determining a channel frequency response C f associated with each frequency f according to said nose threshold N f ;
subtracting said noise threshold N f from each of the magnitudes of the spectral samples to provide a noise suppressed sample sequence ;
applying blind deconvolution to said noise suppressed samples ;
and transforming said deconvolved noise suppressed sampled sequence to a temporal representation to provide a noise reduced output signal indicative of said input voice signal ;
wherein said noise threshold N f of each frequency f is at least partially based upon data indicative of a spectral magnitude histogram .

US6266633B1
CLAIM 3
. The method according to claim 2 , wherein the step of transforming each of said frames to a magnitude and phase representation as a function of frequency comprises performing a 1024-point fast Fourier transform (frequency bin) (FFT) on each said frame to provide magnitude values M ft of said spectral samples where t represents the frame number (t=0 , 1 , . . . , 511) and f represents a particular frequency within said set of discrete frequencies .

US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component value associated with each frequency within said set of frequencies based on a probability density function (correlation map) of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (noise component) .
US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (noise component) comprises searching in the correlation map (probability density function) for frequency bins having a magnitude that exceeds a given fixed threshold .
US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function (correlation map) of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (noise component) comprises comparing the summed long-term correlation map (probability density function) with an adaptive threshold (input voice, sampling means) indicative of sound activity (noise component) in the sound signal .
US6266633B1
CLAIM 1
. A method for combining noise suppression and channel equalization in a preprocessor for enhancing the quality of a noisy input voice (sound activity detection, adaptive threshold) signal comprising : sampling said noisy voice signal at a predetermined sampling rate f s ;
segmenting said sampled voice signal into a plurality of frames ;
transforming each of said frames into a magnitude and phase spectural sample representation as a function of a predetermined set of discrete frequencies f ;
determining a noise threshold N f associated with each frequency f ;
determining a channel frequency response C f associated with each frequency f according to said nose threshold N f ;
subtracting said noise threshold N f from each of the magnitudes of the spectral samples to provide a noise suppressed sample sequence ;
applying blind deconvolution to said noise suppressed samples ;
and transforming said deconvolved noise suppressed sampled sequence to a temporal representation to provide a noise reduced output signal indicative of said input voice signal ;
wherein said noise threshold N f of each frequency f is at least partially based upon data indicative of a spectral magnitude histogram .

US6266633B1
CLAIM 23
. In a speech verification system for verifying a voice of a user including means for prompting said user to speak in a limited vocabulary comprising an at least one utterance , sampling means (sound activity detection, adaptive threshold) for sampling said at least one utterance at a predetermined rate to provide a sampled input signal , verification means for comparing a preprocessed signal indicative of said at least one speech utterance with a prestored voice model of said user to authenticate said user , a method for preprocessing said sampled input signal indicative of said speech utterance for output to said verification means comprising the steps of : converting said sampled input signal into a plurality of speech frames having a predetermined number of samples per frame ;
processing said plurality of speech frames by sequentially performing N-point discrete Fourier transform on each said speech frame to provide a spectral sample sequence corresponding to a given frame ;
determining the magnitudes of said spectral sample sequence and generating a histogram of the magnitude as a function of a discrete set of frequencies over all samples comprising the speech utterance ;
detecting a peak amplitude associated with said histogram over said entire utterance to determine a noise amplitude N f at each corresponding frequency within the discrete set of frequencies ;
determining a channel frequency response C f based on said detected noise amplitude N f ;
subtracting from the magnitude of each said spectral sample said noise amplitude N f and setting any negative results of said subtraction to zero , to provide a subtracted sample sequence ;
filtering said subtracted sample sequence via a blind deconvolution filter having a frequency response inversely proportional to the channel frequency response C f to provide a channel equalized spectral sample sequence ;
converting said channel equalized spectral sample sequence to a temporal sequence by performing an N point inverse discrete Fourier transform ;
and accumulating and shifting said temporal sequence according to the frame period to provide said preprocessed signal for input to said verification system .

US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function (correlation map) of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 10
. A method for detecting sound activity (noise component) in a sound signal (noise component) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound signal (noise component) is detected .
US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity (noise component) in the sound signal (noise component) further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
US6266633B1
CLAIM 1
. A method for combining noise suppression and channel equalization in a preprocessor for enhancing the quality of a noisy input voice (sound activity detection, adaptive threshold) signal comprising : sampling said noisy voice signal at a predetermined sampling rate f s ;
segmenting said sampled voice signal into a plurality of frames ;
transforming each of said frames into a magnitude and phase spectural sample representation as a function of a predetermined set of discrete frequencies f ;
determining a noise threshold N f associated with each frequency f ;
determining a channel frequency response C f associated with each frequency f according to said nose threshold N f ;
subtracting said noise threshold N f from each of the magnitudes of the spectral samples to provide a noise suppressed sample sequence ;
applying blind deconvolution to said noise suppressed samples ;
and transforming said deconvolved noise suppressed sampled sequence to a temporal representation to provide a noise reduced output signal indicative of said input voice signal ;
wherein said noise threshold N f of each frequency f is at least partially based upon data indicative of a spectral magnitude histogram .

US6266633B1
CLAIM 23
. In a speech verification system for verifying a voice of a user including means for prompting said user to speak in a limited vocabulary comprising an at least one utterance , sampling means (sound activity detection, adaptive threshold) for sampling said at least one utterance at a predetermined rate to provide a sampled input signal , verification means for comparing a preprocessed signal indicative of said at least one speech utterance with a prestored voice model of said user to authenticate said user , a method for preprocessing said sampled input signal indicative of said speech utterance for output to said verification means comprising the steps of : converting said sampled input signal into a plurality of speech frames having a predetermined number of samples per frame ;
processing said plurality of speech frames by sequentially performing N-point discrete Fourier transform on each said speech frame to provide a spectral sample sequence corresponding to a given frame ;
determining the magnitudes of said spectral sample sequence and generating a histogram of the magnitude as a function of a discrete set of frequencies over all samples comprising the speech utterance ;
detecting a peak amplitude associated with said histogram over said entire utterance to determine a noise amplitude N f at each corresponding frequency within the discrete set of frequencies ;
determining a channel frequency response C f based on said detected noise amplitude N f ;
subtracting from the magnitude of each said spectral sample said noise amplitude N f and setting any negative results of said subtraction to zero , to provide a subtracted sample sequence ;
filtering said subtracted sample sequence via a blind deconvolution filter having a frequency response inversely proportional to the channel frequency response C f to provide a channel equalized spectral sample sequence ;
converting said channel equalized spectral sample sequence to a temporal sequence by performing an N point inverse discrete Fourier transform ;
and accumulating and shifting said temporal sequence according to the frame period to provide said preprocessed signal for input to said verification system .

US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (noise component) detection comprises detecting the sound signal (noise component) based on a frequency dependent signal-to-noise ratio (SNR) .
US6266633B1
CLAIM 1
. A method for combining noise suppression and channel equalization in a preprocessor for enhancing the quality of a noisy input voice (sound activity detection, adaptive threshold) signal comprising : sampling said noisy voice signal at a predetermined sampling rate f s ;
segmenting said sampled voice signal into a plurality of frames ;
transforming each of said frames into a magnitude and phase spectural sample representation as a function of a predetermined set of discrete frequencies f ;
determining a noise threshold N f associated with each frequency f ;
determining a channel frequency response C f associated with each frequency f according to said nose threshold N f ;
subtracting said noise threshold N f from each of the magnitudes of the spectral samples to provide a noise suppressed sample sequence ;
applying blind deconvolution to said noise suppressed samples ;
and transforming said deconvolved noise suppressed sampled sequence to a temporal representation to provide a noise reduced output signal indicative of said input voice signal ;
wherein said noise threshold N f of each frequency f is at least partially based upon data indicative of a spectral magnitude histogram .

US6266633B1
CLAIM 23
. In a speech verification system for verifying a voice of a user including means for prompting said user to speak in a limited vocabulary comprising an at least one utterance , sampling means (sound activity detection, adaptive threshold) for sampling said at least one utterance at a predetermined rate to provide a sampled input signal , verification means for comparing a preprocessed signal indicative of said at least one speech utterance with a prestored voice model of said user to authenticate said user , a method for preprocessing said sampled input signal indicative of said speech utterance for output to said verification means comprising the steps of : converting said sampled input signal into a plurality of speech frames having a predetermined number of samples per frame ;
processing said plurality of speech frames by sequentially performing N-point discrete Fourier transform on each said speech frame to provide a spectral sample sequence corresponding to a given frame ;
determining the magnitudes of said spectral sample sequence and generating a histogram of the magnitude as a function of a discrete set of frequencies over all samples comprising the speech utterance ;
detecting a peak amplitude associated with said histogram over said entire utterance to determine a noise amplitude N f at each corresponding frequency within the discrete set of frequencies ;
determining a channel frequency response C f based on said detected noise amplitude N f ;
subtracting from the magnitude of each said spectral sample said noise amplitude N f and setting any negative results of said subtraction to zero , to provide a subtracted sample sequence ;
filtering said subtracted sample sequence via a blind deconvolution filter having a frequency response inversely proportional to the channel frequency response C f to provide a channel equalized spectral sample sequence ;
converting said channel equalized spectral sample sequence to a temporal sequence by performing an N point inverse discrete Fourier transform ;
and accumulating and shifting said temporal sequence according to the frame period to provide said preprocessed signal for input to said verification system .

US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 14
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (noise component) detection comprises comparing an average signal-to-noise ratio (SNR av ) to a threshold calculated as a function of a long-term signal-to-noise ratio (SNR LT ) .
US6266633B1
CLAIM 1
. A method for combining noise suppression and channel equalization in a preprocessor for enhancing the quality of a noisy input voice (sound activity detection, adaptive threshold) signal comprising : sampling said noisy voice signal at a predetermined sampling rate f s ;
segmenting said sampled voice signal into a plurality of frames ;
transforming each of said frames into a magnitude and phase spectural sample representation as a function of a predetermined set of discrete frequencies f ;
determining a noise threshold N f associated with each frequency f ;
determining a channel frequency response C f associated with each frequency f according to said nose threshold N f ;
subtracting said noise threshold N f from each of the magnitudes of the spectral samples to provide a noise suppressed sample sequence ;
applying blind deconvolution to said noise suppressed samples ;
and transforming said deconvolved noise suppressed sampled sequence to a temporal representation to provide a noise reduced output signal indicative of said input voice signal ;
wherein said noise threshold N f of each frequency f is at least partially based upon data indicative of a spectral magnitude histogram .

US6266633B1
CLAIM 23
. In a speech verification system for verifying a voice of a user including means for prompting said user to speak in a limited vocabulary comprising an at least one utterance , sampling means (sound activity detection, adaptive threshold) for sampling said at least one utterance at a predetermined rate to provide a sampled input signal , verification means for comparing a preprocessed signal indicative of said at least one speech utterance with a prestored voice model of said user to authenticate said user , a method for preprocessing said sampled input signal indicative of said speech utterance for output to said verification means comprising the steps of : converting said sampled input signal into a plurality of speech frames having a predetermined number of samples per frame ;
processing said plurality of speech frames by sequentially performing N-point discrete Fourier transform on each said speech frame to provide a spectral sample sequence corresponding to a given frame ;
determining the magnitudes of said spectral sample sequence and generating a histogram of the magnitude as a function of a discrete set of frequencies over all samples comprising the speech utterance ;
detecting a peak amplitude associated with said histogram over said entire utterance to determine a noise amplitude N f at each corresponding frequency within the discrete set of frequencies ;
determining a channel frequency response C f based on said detected noise amplitude N f ;
subtracting from the magnitude of each said spectral sample said noise amplitude N f and setting any negative results of said subtraction to zero , to provide a subtracted sample sequence ;
filtering said subtracted sample sequence via a blind deconvolution filter having a frequency response inversely proportional to the channel frequency response C f to provide a channel equalized spectral sample sequence ;
converting said channel equalized spectral sample sequence to a temporal sequence by performing an N point inverse discrete Fourier transform ;
and accumulating and shifting said temporal sequence according to the frame period to provide said preprocessed signal for input to said verification system .

US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity (noise component) detection in the sound signal (noise component) further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
US6266633B1
CLAIM 1
. A method for combining noise suppression and channel equalization in a preprocessor for enhancing the quality of a noisy input voice (sound activity detection, adaptive threshold) signal comprising : sampling said noisy voice signal at a predetermined sampling rate f s ;
segmenting said sampled voice signal into a plurality of frames ;
transforming each of said frames into a magnitude and phase spectural sample representation as a function of a predetermined set of discrete frequencies f ;
determining a noise threshold N f associated with each frequency f ;
determining a channel frequency response C f associated with each frequency f according to said nose threshold N f ;
subtracting said noise threshold N f from each of the magnitudes of the spectral samples to provide a noise suppressed sample sequence ;
applying blind deconvolution to said noise suppressed samples ;
and transforming said deconvolved noise suppressed sampled sequence to a temporal representation to provide a noise reduced output signal indicative of said input voice signal ;
wherein said noise threshold N f of each frequency f is at least partially based upon data indicative of a spectral magnitude histogram .

US6266633B1
CLAIM 23
. In a speech verification system for verifying a voice of a user including means for prompting said user to speak in a limited vocabulary comprising an at least one utterance , sampling means (sound activity detection, adaptive threshold) for sampling said at least one utterance at a predetermined rate to provide a sampled input signal , verification means for comparing a preprocessed signal indicative of said at least one speech utterance with a prestored voice model of said user to authenticate said user , a method for preprocessing said sampled input signal indicative of said speech utterance for output to said verification means comprising the steps of : converting said sampled input signal into a plurality of speech frames having a predetermined number of samples per frame ;
processing said plurality of speech frames by sequentially performing N-point discrete Fourier transform on each said speech frame to provide a spectral sample sequence corresponding to a given frame ;
determining the magnitudes of said spectral sample sequence and generating a histogram of the magnitude as a function of a discrete set of frequencies over all samples comprising the speech utterance ;
detecting a peak amplitude associated with said histogram over said entire utterance to determine a noise amplitude N f at each corresponding frequency within the discrete set of frequencies ;
determining a channel frequency response C f based on said detected noise amplitude N f ;
subtracting from the magnitude of each said spectral sample said noise amplitude N f and setting any negative results of said subtraction to zero , to provide a subtracted sample sequence ;
filtering said subtracted sample sequence via a blind deconvolution filter having a frequency response inversely proportional to the channel frequency response C f to provide a channel equalized spectral sample sequence ;
converting said channel equalized spectral sample sequence to a temporal sequence by performing an N point inverse discrete Fourier transform ;
and accumulating and shifting said temporal sequence according to the frame period to provide said preprocessed signal for input to said verification system .

US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity (noise component) detection further comprises updating the noise estimates for a next frame .
US6266633B1
CLAIM 1
. A method for combining noise suppression and channel equalization in a preprocessor for enhancing the quality of a noisy input voice (sound activity detection, adaptive threshold) signal comprising : sampling said noisy voice signal at a predetermined sampling rate f s ;
segmenting said sampled voice signal into a plurality of frames ;
transforming each of said frames into a magnitude and phase spectural sample representation as a function of a predetermined set of discrete frequencies f ;
determining a noise threshold N f associated with each frequency f ;
determining a channel frequency response C f associated with each frequency f according to said nose threshold N f ;
subtracting said noise threshold N f from each of the magnitudes of the spectral samples to provide a noise suppressed sample sequence ;
applying blind deconvolution to said noise suppressed samples ;
and transforming said deconvolved noise suppressed sampled sequence to a temporal representation to provide a noise reduced output signal indicative of said input voice signal ;
wherein said noise threshold N f of each frequency f is at least partially based upon data indicative of a spectral magnitude histogram .

US6266633B1
CLAIM 23
. In a speech verification system for verifying a voice of a user including means for prompting said user to speak in a limited vocabulary comprising an at least one utterance , sampling means (sound activity detection, adaptive threshold) for sampling said at least one utterance at a predetermined rate to provide a sampled input signal , verification means for comparing a preprocessed signal indicative of said at least one speech utterance with a prestored voice model of said user to authenticate said user , a method for preprocessing said sampled input signal indicative of said speech utterance for output to said verification means comprising the steps of : converting said sampled input signal into a plurality of speech frames having a predetermined number of samples per frame ;
processing said plurality of speech frames by sequentially performing N-point discrete Fourier transform on each said speech frame to provide a spectral sample sequence corresponding to a given frame ;
determining the magnitudes of said spectral sample sequence and generating a histogram of the magnitude as a function of a discrete set of frequencies over all samples comprising the speech utterance ;
detecting a peak amplitude associated with said histogram over said entire utterance to determine a noise amplitude N f at each corresponding frequency within the discrete set of frequencies ;
determining a channel frequency response C f based on said detected noise amplitude N f ;
subtracting from the magnitude of each said spectral sample said noise amplitude N f and setting any negative results of said subtraction to zero , to provide a subtracted sample sequence ;
filtering said subtracted sample sequence via a blind deconvolution filter having a frequency response inversely proportional to the channel frequency response C f to provide a channel equalized spectral sample sequence ;
converting said channel equalized spectral sample sequence to a temporal sequence by performing an N point inverse discrete Fourier transform ;
and accumulating and shifting said temporal sequence according to the frame period to provide said preprocessed signal for input to said verification system .

US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal (noise component) and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 18
. A method as defined in claim 14 , comprising classifying the sound signal (noise component) as one of an inactive sound signal and active sound signal , which comprises determining an inactive sound signal when the average signal-to-noise ratio (SNR av ) is inferior to the calculated threshold .
US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 19
. A method as defined in claim 14 , comprising classifying the sound signal (noise component) as one of an inactive sound signal and active sound signal , which comprises determining an active sound signal when the average signal-to-noise ratio (SNR av ) is larger than the calculated threshold .
US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal (noise component) prevents updating of noise energy estimates when a music signal is detected .
US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (said time) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
US6266633B1
CLAIM 8
. A method for performing noise suppression and channel equalization of a noisy voice signal comprising the steps of : sampling said noisy voice signal at a predetermined sampling rate f s ;
segmenting said sampled voice signal into a plurality of frames having a predetermined number of samples per frame , over a predetermined temporal window ;
generating an N-point spectral sample representation of each of said sample signal frames ;
determining the magnitude of each of said N-point spectral samples and generating a histogram of the energy associated with each of said N-point spectral samples at a particular frequency ;
detecting a peak amplitude of said histogram which corresponds to a noise threshold N f associated with each said particular frequency ;
determining a channel frequency response C f associated with each said particular frequency by determining a geometric mean over all said spectral samples having magnitudes exceeding said noise threshold N f ;
subtracting from each of the magnitudes of the N point spectral samples the noise threshold N f to provide a noise suppressed sample sequence ;
applying blind deconvolution to said noise suppressed samples ;
transforming said deconvolved noise suppressed sampled sequence to a temporal representation ;
shifting said temporal sample sequence in time by a predetermined amount ;
and adding said time (noise character parameter) shifted temporal samples over a period corresponding to said predetermined temporal window to provide a suppressed noise voice signal .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal (noise component) in a current frame and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (negative value) indicative of an activity of the sound signal (noise component) .
US6266633B1
CLAIM 7
. The method according to claim 1 , wherein the step of subtracting N f from each of the magnitudes further comprises setting any negative value (activity prediction parameter) s of said noise suppressed sample sequence to zero prior to the step of applying blind deconvolution .

US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (negative value) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal (noise component) and the complementary non-stationarity parameter .
US6266633B1
CLAIM 7
. The method according to claim 1 , wherein the step of subtracting N f from each of the magnitudes further comprises setting any negative value (activity prediction parameter) s of said noise suppressed sample sequence to zero prior to the step of applying blind deconvolution .

US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (negative value) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US6266633B1
CLAIM 7
. The method according to claim 1 , wherein the step of subtracting N f from each of the magnitudes further comprises setting any negative value (activity prediction parameter) s of said noise suppressed sample sequence to zero prior to the step of applying blind deconvolution .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (said time) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US6266633B1
CLAIM 8
. A method for performing noise suppression and channel equalization of a noisy voice signal comprising the steps of : sampling said noisy voice signal at a predetermined sampling rate f s ;
segmenting said sampled voice signal into a plurality of frames having a predetermined number of samples per frame , over a predetermined temporal window ;
generating an N-point spectral sample representation of each of said sample signal frames ;
determining the magnitude of each of said N-point spectral samples and generating a histogram of the energy associated with each of said N-point spectral samples at a particular frequency ;
detecting a peak amplitude of said histogram which corresponds to a noise threshold N f associated with each said particular frequency ;
determining a channel frequency response C f associated with each said particular frequency by determining a geometric mean over all said spectral samples having magnitudes exceeding said noise threshold N f ;
subtracting from each of the magnitudes of the N point spectral samples the noise threshold N f to provide a noise suppressed sample sequence ;
applying blind deconvolution to said noise suppressed samples ;
transforming said deconvolved noise suppressed sampled sequence to a temporal representation ;
shifting said temporal sample sequence in time by a predetermined amount ;
and adding said time (noise character parameter) shifted temporal samples over a period corresponding to said predetermined temporal window to provide a suppressed noise voice signal .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (said time) inferior than a given fixed threshold .
US6266633B1
CLAIM 8
. A method for performing noise suppression and channel equalization of a noisy voice signal comprising the steps of : sampling said noisy voice signal at a predetermined sampling rate f s ;
segmenting said sampled voice signal into a plurality of frames having a predetermined number of samples per frame , over a predetermined temporal window ;
generating an N-point spectral sample representation of each of said sample signal frames ;
determining the magnitude of each of said N-point spectral samples and generating a histogram of the energy associated with each of said N-point spectral samples at a particular frequency ;
detecting a peak amplitude of said histogram which corresponds to a noise threshold N f associated with each said particular frequency ;
determining a channel frequency response C f associated with each said particular frequency by determining a geometric mean over all said spectral samples having magnitudes exceeding said noise threshold N f ;
subtracting from each of the magnitudes of the N point spectral samples the noise threshold N f to provide a noise suppressed sample sequence ;
applying blind deconvolution to said noise suppressed samples ;
transforming said deconvolved noise suppressed sampled sequence to a temporal representation ;
shifting said temporal sample sequence in time by a predetermined amount ;
and adding said time (noise character parameter) shifted temporal samples over a period corresponding to said predetermined temporal window to provide a suppressed noise voice signal .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal (noise component) using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map (probability density function) between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function (correlation map) of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal (noise component) using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map (probability density function) between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function (correlation map) of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal (noise component) in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map (probability density function) comprises : a filter for filtering the correlation map on a frequency bin (fast Fourier transform) by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US6266633B1
CLAIM 1
. A method for combining noise suppression and channel equalization in a preprocessor for enhancing the quality of a noisy input voice signal comprising : sampling said noisy voice signal at a predetermined sampling rate (frequency bin basis) f s ;
segmenting said sampled voice signal into a plurality of frames ;
transforming each of said frames into a magnitude and phase spectural sample representation as a function of a predetermined set of discrete frequencies f ;
determining a noise threshold N f associated with each frequency f ;
determining a channel frequency response C f associated with each frequency f according to said nose threshold N f ;
subtracting said noise threshold N f from each of the magnitudes of the spectral samples to provide a noise suppressed sample sequence ;
applying blind deconvolution to said noise suppressed samples ;
and transforming said deconvolved noise suppressed sampled sequence to a temporal representation to provide a noise reduced output signal indicative of said input voice signal ;
wherein said noise threshold N f of each frequency f is at least partially based upon data indicative of a spectral magnitude histogram .

US6266633B1
CLAIM 3
. The method according to claim 2 , wherein the step of transforming each of said frames to a magnitude and phase representation as a function of frequency comprises performing a 1024-point fast Fourier transform (frequency bin) (FFT) on each said frame to provide magnitude values M ft of said spectral samples where t represents the frame number (t=0 , 1 , . . . , 511) and f represents a particular frequency within said set of discrete frequencies .

US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component value associated with each frequency within said set of frequencies based on a probability density function (correlation map) of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 34
. A device as defined in claim 31 , further comprising a detector of strong tones in the sound signal (noise component) .
US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 35
. A device for detecting sound activity (noise component) in a sound signal (noise component) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 36
. A device for detecting sound activity (noise component) in a sound signal (noise component) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 37
. A device as defined in claim 36 , further comprising a signal-to-noise ratio (SNR)-based sound activity (noise component) detector .
US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity (noise component) detector comprises a comparator of an average signal (noise component) to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity (noise component) detector .
US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal (noise component) for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates .
US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .

US8990073B2
CLAIM 41
. A device as defined in claim 36 , further comprising a calculator of a spectral parameter used for detecting spectral changes and spectral attacks in the sound signal (noise component) .
US6266633B1
CLAIM 26
. An apparatus for performing noise suppression and channel equalization of input speech utterances comprising : fourier transform means for converting sampled speech frames into a spectural sequence representation of magnitude values corresponding to a predetermined set of frequencies ;
noise suppression means responsive to said magnitude values for determining a noise component (average signal, sound signal, sound activity, sound activity detector, detecting sound activity, sound signal prevents updating) value associated with each frequency within said set of frequencies based on a probability density function of the magnitude values at each frequency and subtracting the noise component value from said magnitude values to produce a noise suppressed spectral sequence ;
filter means responsive to said suppressed spectral sequence for performing channel equalization using blind deconvolution to provide a processed magnitude spectral sequence ;
inverse fourier transform means responsive to said processed magnitude spectral sequence for transforming said processed magnitude spectral sequence into a temporal output sequence indicative of said input speech utterances having noise suppressed and channel equalized characteristics .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6240386B1

Filed: 1998-11-24     Issued: 2001-05-29

Speech codec employing noise classification for noise compensation

(Original Assignee) Lakestar Semi Inc     (Current Assignee) MACOM Technology Solutions Holdings Inc

Jes Thyssen, Huan-Yu Su, Yang Gao, Adil Benyassine
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum (source encoding) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US6240386B1
CLAIM 8
. The speech codec of claim 1 , wherein the encoder performs at least a portion of the noise classification and at least a portion of the noise compensation through selection of one of a plurality of source encoding (current residual spectrum, residual error) approaches .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum (source encoding) comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US6240386B1
CLAIM 8
. The speech codec of claim 1 , wherein the encoder performs at least a portion of the noise classification and at least a portion of the noise compensation through selection of one of a plurality of source encoding (current residual spectrum, residual error) approaches .

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum (source encoding) comprises locating a maximum between each pair of two consecutive minima of the current residual spectrum .
US6240386B1
CLAIM 8
. The speech codec of claim 1 , wherein the encoder performs at least a portion of the noise classification and at least a portion of the noise compensation through selection of one of a plurality of source encoding (current residual spectrum, residual error) approaches .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum (source encoding) , calculating a normalized correlation value with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US6240386B1
CLAIM 8
. The speech codec of claim 1 , wherein the encoder performs at least a portion of the noise classification and at least a portion of the noise compensation through selection of one of a plurality of source encoding (current residual spectrum, residual error) approaches .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection further comprises updating the noise estimates for a next frame (when r) .
US6240386B1
CLAIM 6
. The speech codec of claim 1 , wherein at least one of the encoder and the decoder smoothes a gain when r (next frame) eproducing the speech signal .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame (when r) comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error (source encoding) energies .
US6240386B1
CLAIM 6
. The speech codec of claim 1 , wherein at least one of the encoder and the decoder smoothes a gain when r (next frame) eproducing the speech signal .

US6240386B1
CLAIM 8
. The speech codec of claim 1 , wherein the encoder performs at least a portion of the noise classification and at least a portion of the noise compensation through selection of one of a plurality of source encoding (current residual spectrum, residual error) approaches .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum (source encoding) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US6240386B1
CLAIM 8
. The speech codec of claim 1 , wherein the encoder performs at least a portion of the noise classification and at least a portion of the noise compensation through selection of one of a plurality of source encoding (current residual spectrum, residual error) approaches .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum (source encoding) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US6240386B1
CLAIM 8
. The speech codec of claim 1 , wherein the encoder performs at least a portion of the noise classification and at least a portion of the noise compensation through selection of one of a plurality of source encoding (current residual spectrum, residual error) approaches .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum (source encoding) comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US6240386B1
CLAIM 8
. The speech codec of claim 1 , wherein the encoder performs at least a portion of the noise classification and at least a portion of the noise compensation through selection of one of a plurality of source encoding (current residual spectrum, residual error) approaches .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6122610A

Filed: 1998-09-23     Issued: 2000-09-19

Noise suppression for low bitrate speech coder

(Original Assignee) Verance Corp     (Current Assignee) GCOMM Corp ; Verance Corp

Steven H. Isabelle
US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value (domain representations) with the previous residual spectrum , over frequency bins (high frequency components, spectral gain) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US6122610A
CLAIM 1
. A method for suppressing noise in an input signal that carries a combination of noise and speech , comprising the steps of : dividing said input signal into signal blocks ;
applying a Discrete Fourier Transform (DFT) to the signal blocks over a number of DFT bins to provide a complex-valued frequency domain representation of each block ;
converting the frequency domain representations (correlation value) of the signal blocks to magnitude-only signals ;
and averaging the magnitude-only signals across different frequency bands to provide an estimate of a short-time perceptual band spectrum of the input signal ;
wherein each of the different frequency bands is correlated with an associated plurality of the DFT bins ;
determining , at various points in time , whether said input signal is carrying noise only , or a combination of noise and speech , and , when the input signal is carrying noise only , using the corresponding estimated short-time perceptual band spectrum of the input signal to update an estimate of a long term perceptual band spectrum of the noise ;
determining a noise suppression frequency response based on said estimate of the long term perceptual band spectrum of the noise and the estimated short-time perceptual band spectrum of the input signal ;
and providing an all-pole time-domain filter in accordance with said noise suppression frequency response for time-domain shaping of a current block of the input signal to suppress noise therein .

US6122610A
CLAIM 2
. The method of claim 1 , comprising the further step of : pre-filtering said input signal prior to applying the DFT to emphasize high frequency components (frequency bins) thereof .

US6122610A
CLAIM 9
. An apparatus for suppressing noise in an input signal that carries a combination of noise and speech , comprising : a signal preprocessor for dividing said input signal into signal blocks ;
a Discrete Fourier transform (DFT) processor for processing said signal blocks over a number of DFT bins to provide a complex-valued frequency domain representation of each block ;
means for computing a magnitude of said complex-valued frequency domain representation to provide a frequency domain magnitude spectrum ;
an accumulator for accumulating said frequency domain magnitude spectrum into a perceptual-band spectrum comprising frequency bands of unequal width ;
wherein values of the frequency domain magnitude spectrum are accumulated from different frequency bands , each of which is correlated with an associated plurality of the DFT bins ;
a filter for filtering the perceptual-band spectrum to generate an estimate of a short-time perceptual-band spectrum comprising a current segment of the input signal ;
a speech/pause detector for determining whether said input signal is currently noise only or a combination of speech and noise ;
a noise spectrum estimator responsive to said speech/pause detector when the input signal is noise only for updating an estimate of a long term perceptual band spectrum of the noise based on the estimated short-time perceptual band spectrum of the input signal ;
a spectral gain (frequency bins) processor responsive to said noise spectrum estimator for determining a noise suppression frequency response ;
and a spectral shaping processor comprising an all-pole time-domain filter that is responsive to said spectral gain processor for time-domain shaping of a current block of the input signal to suppress noise therein .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (high frequency components, spectral gain) so as to produce a summed long-term correlation map .
US6122610A
CLAIM 2
. The method of claim 1 , comprising the further step of : pre-filtering said input signal prior to applying the DFT to emphasize high frequency components (frequency bins) thereof .

US6122610A
CLAIM 9
. An apparatus for suppressing noise in an input signal that carries a combination of noise and speech , comprising : a signal preprocessor for dividing said input signal into signal blocks ;
a Discrete Fourier transform (DFT) processor for processing said signal blocks over a number of DFT bins to provide a complex-valued frequency domain representation of each block ;
means for computing a magnitude of said complex-valued frequency domain representation to provide a frequency domain magnitude spectrum ;
an accumulator for accumulating said frequency domain magnitude spectrum into a perceptual-band spectrum comprising frequency bands of unequal width ;
wherein values of the frequency domain magnitude spectrum are accumulated from different frequency bands , each of which is correlated with an associated plurality of the DFT bins ;
a filter for filtering the perceptual-band spectrum to generate an estimate of a short-time perceptual-band spectrum comprising a current segment of the input signal ;
a speech/pause detector for determining whether said input signal is currently noise only or a combination of speech and noise ;
a noise spectrum estimator responsive to said speech/pause detector when the input signal is noise only for updating an estimate of a long term perceptual band spectrum of the noise based on the estimated short-time perceptual band spectrum of the input signal ;
a spectral gain (frequency bins) processor responsive to said noise spectrum estimator for determining a noise suppression frequency response ;
and a spectral shaping processor comprising an all-pole time-domain filter that is responsive to said spectral gain processor for time-domain shaping of a current block of the input signal to suppress noise therein .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map for frequency bins (high frequency components, spectral gain) having a magnitude that exceeds a given fixed threshold .
US6122610A
CLAIM 2
. The method of claim 1 , comprising the further step of : pre-filtering said input signal prior to applying the DFT to emphasize high frequency components (frequency bins) thereof .

US6122610A
CLAIM 9
. An apparatus for suppressing noise in an input signal that carries a combination of noise and speech , comprising : a signal preprocessor for dividing said input signal into signal blocks ;
a Discrete Fourier transform (DFT) processor for processing said signal blocks over a number of DFT bins to provide a complex-valued frequency domain representation of each block ;
means for computing a magnitude of said complex-valued frequency domain representation to provide a frequency domain magnitude spectrum ;
an accumulator for accumulating said frequency domain magnitude spectrum into a perceptual-band spectrum comprising frequency bands of unequal width ;
wherein values of the frequency domain magnitude spectrum are accumulated from different frequency bands , each of which is correlated with an associated plurality of the DFT bins ;
a filter for filtering the perceptual-band spectrum to generate an estimate of a short-time perceptual-band spectrum comprising a current segment of the input signal ;
a speech/pause detector for determining whether said input signal is currently noise only or a combination of speech and noise ;
a noise spectrum estimator responsive to said speech/pause detector when the input signal is noise only for updating an estimate of a long term perceptual band spectrum of the noise based on the estimated short-time perceptual band spectrum of the input signal ;
a spectral gain (frequency bins) processor responsive to said noise spectrum estimator for determining a noise suppression frequency response ;
and a spectral shaping processor comprising an all-pole time-domain filter that is responsive to said spectral gain processor for time-domain shaping of a current block of the input signal to suppress noise therein .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame and an energy of the sound signal in a previous frame , for frequency bands (different frequency band, frequency bands) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US6122610A
CLAIM 1
. A method for suppressing noise in an input signal that carries a combination of noise and speech , comprising the steps of : dividing said input signal into signal blocks ;
applying a Discrete Fourier Transform (DFT) to the signal blocks over a number of DFT bins to provide a complex-valued frequency domain representation of each block ;
converting the frequency domain representations of the signal blocks to magnitude-only signals ;
and averaging the magnitude-only signals across different frequency band (frequency bands (frequency bands, first frequency bands) , first frequency bands) s to provide an estimate of a short-time perceptual band spectrum of the input signal ;
wherein each of the different frequency bands is correlated with an associated plurality of the DFT bins ;
determining , at various points in time , whether said input signal is carrying noise only , or a combination of noise and speech , and , when the input signal is carrying noise only , using the corresponding estimated short-time perceptual band spectrum of the input signal to update an estimate of a long term perceptual band spectrum of the noise ;
determining a noise suppression frequency response based on said estimate of the long term perceptual band spectrum of the noise and the estimated short-time perceptual band spectrum of the input signal ;
and providing an all-pole time-domain filter in accordance with said noise suppression frequency response for time-domain shaping of a current block of the input signal to suppress noise therein .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (different frequency band, frequency bands) into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US6122610A
CLAIM 1
. A method for suppressing noise in an input signal that carries a combination of noise and speech , comprising the steps of : dividing said input signal into signal blocks ;
applying a Discrete Fourier Transform (DFT) to the signal blocks over a number of DFT bins to provide a complex-valued frequency domain representation of each block ;
converting the frequency domain representations of the signal blocks to magnitude-only signals ;
and averaging the magnitude-only signals across different frequency band (frequency bands (frequency bands, first frequency bands) , first frequency bands) s to provide an estimate of a short-time perceptual band spectrum of the input signal ;
wherein each of the different frequency bands is correlated with an associated plurality of the DFT bins ;
determining , at various points in time , whether said input signal is carrying noise only , or a combination of noise and speech , and , when the input signal is carrying noise only , using the corresponding estimated short-time perceptual band spectrum of the input signal to update an estimate of a long term perceptual band spectrum of the noise ;
determining a noise suppression frequency response based on said estimate of the long term perceptual band spectrum of the noise and the estimated short-time perceptual band spectrum of the input signal ;
and providing an all-pole time-domain filter (first frequency) in accordance with said noise suppression frequency response for time-domain shaping of a current block of the input signal to suppress noise therein .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (high frequency components, spectral gain) so as to produce a summed long-term correlation map .
US6122610A
CLAIM 2
. The method of claim 1 , comprising the further step of : pre-filtering said input signal prior to applying the DFT to emphasize high frequency components (frequency bins) thereof .

US6122610A
CLAIM 9
. An apparatus for suppressing noise in an input signal that carries a combination of noise and speech , comprising : a signal preprocessor for dividing said input signal into signal blocks ;
a Discrete Fourier transform (DFT) processor for processing said signal blocks over a number of DFT bins to provide a complex-valued frequency domain representation of each block ;
means for computing a magnitude of said complex-valued frequency domain representation to provide a frequency domain magnitude spectrum ;
an accumulator for accumulating said frequency domain magnitude spectrum into a perceptual-band spectrum comprising frequency bands of unequal width ;
wherein values of the frequency domain magnitude spectrum are accumulated from different frequency bands , each of which is correlated with an associated plurality of the DFT bins ;
a filter for filtering the perceptual-band spectrum to generate an estimate of a short-time perceptual-band spectrum comprising a current segment of the input signal ;
a speech/pause detector for determining whether said input signal is currently noise only or a combination of speech and noise ;
a noise spectrum estimator responsive to said speech/pause detector when the input signal is noise only for updating an estimate of a long term perceptual band spectrum of the noise based on the estimated short-time perceptual band spectrum of the input signal ;
a spectral gain (frequency bins) processor responsive to said noise spectrum estimator for determining a noise suppression frequency response ;
and a spectral shaping processor comprising an all-pole time-domain filter that is responsive to said spectral gain processor for time-domain shaping of a current block of the input signal to suppress noise therein .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6223090B1

Filed: 1998-08-24     Issued: 2001-04-24

Manikin positioning for acoustic measuring

(Original Assignee) US Air Force     (Current Assignee) US Air Force

Douglas S. Brungart
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (frequency interval, frequency spectrum) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (said motor) , and an initial value of the long term correlation map (frequency interval, frequency spectrum) .
US6223090B1
CLAIM 7
. A computer controlled closed-loop three-dimensional iterative positioning method for positioning an acoustic manikin for head-related transfer function measurements , said method comprising the steps of : providing a selectively positioned audio signal from a sound source ;
receiving said audio signal at first and second ears of said manikin ;
transforming time domain representations of said audio signal received by said manikin in a manikin selected axis first position thereof to frequency domain phase and amplitude values ;
computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point ;
determining a time delay from said phase difference of said computing step comprising the steps of : providing estimated error and interaural time delay values ;
generating a modified linear least square error between said estimated interaural time delay value at a selected frequency interval (frequency spectrum (frequency spectrum, term correlation map, frequency bins) , term correlation map, frequency bins) within a frequency spectrum and a pure interaural time delay value such that the angular error in each interval is modified to be within the range −180 degrees to 180 degrees ;
comparing estimated error and linear least square error , the smaller value being reset as estimated error and the associated interaural time delay reset as the estimated interaural time delay ;
repeating above at consecutive frequency intervals within said entire frequency spectrum ;
and generating an interaural time delay for describing position of said manikin ;
rotating said manikin about said selected axis relative to said sound source in directionally determined response to time delay determinations from said determining step ;
and repeating said transforming , said computing , said determining and said rotating steps until a preselected time delay representing optimal position alignment about said selected axis is obtained relative to said sound source .

US6223090B1
CLAIM 12
. A computer controlled closed-loop three-dimensional iterative positioning device for measuring near field head-related transfer functions on a manikin having left and right ears comprising : near field positioned audio signal sound source ;
first and second microphones connected in close proximity to said left and right ears for recording said audio signal ;
means for transforming said audio signal received at said left and right ears from time domain to frequency domain amplitude and phase ;
a signal analyzing device for electronically measuring phase difference between said left and right ears of said acoustic manikin ;
a motorized stand for securing said manikin ;
and a control computer electronically coupled to said motor (current frame) ized stand for calculating a time delay for reception of said audio signals at said left and right ears of said manikin in azimuth , roll and pitch , said control computer generating electronic signals responsive to said time delay and communicating said signals to said motorized stand for incrementally positioning said left and right ears equidistant from said sound source and repeating said incremental positioning within each azimuth , roll and pitch axis and repeating said incremental positioning between each azimuth , roll and pitch axis until a preselected time delay and optimal position is attained .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (frequency interval, frequency spectrum) of the sound signal in the current frame (said motor) ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US6223090B1
CLAIM 7
. A computer controlled closed-loop three-dimensional iterative positioning method for positioning an acoustic manikin for head-related transfer function measurements , said method comprising the steps of : providing a selectively positioned audio signal from a sound source ;
receiving said audio signal at first and second ears of said manikin ;
transforming time domain representations of said audio signal received by said manikin in a manikin selected axis first position thereof to frequency domain phase and amplitude values ;
computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point ;
determining a time delay from said phase difference of said computing step comprising the steps of : providing estimated error and interaural time delay values ;
generating a modified linear least square error between said estimated interaural time delay value at a selected frequency interval (frequency spectrum (frequency spectrum, term correlation map, frequency bins) , term correlation map, frequency bins) within a frequency spectrum and a pure interaural time delay value such that the angular error in each interval is modified to be within the range −180 degrees to 180 degrees ;
comparing estimated error and linear least square error , the smaller value being reset as estimated error and the associated interaural time delay reset as the estimated interaural time delay ;
repeating above at consecutive frequency intervals within said entire frequency spectrum ;
and generating an interaural time delay for describing position of said manikin ;
rotating said manikin about said selected axis relative to said sound source in directionally determined response to time delay determinations from said determining step ;
and repeating said transforming , said computing , said determining and said rotating steps until a preselected time delay representing optimal position alignment about said selected axis is obtained relative to said sound source .

US6223090B1
CLAIM 12
. A computer controlled closed-loop three-dimensional iterative positioning device for measuring near field head-related transfer functions on a manikin having left and right ears comprising : near field positioned audio signal sound source ;
first and second microphones connected in close proximity to said left and right ears for recording said audio signal ;
means for transforming said audio signal received at said left and right ears from time domain to frequency domain amplitude and phase ;
a signal analyzing device for electronically measuring phase difference between said left and right ears of said acoustic manikin ;
a motorized stand for securing said manikin ;
and a control computer electronically coupled to said motor (current frame) ized stand for calculating a time delay for reception of said audio signals at said left and right ears of said manikin in azimuth , roll and pitch , said control computer generating electronic signals responsive to said time delay and communicating said signals to said motorized stand for incrementally positioning said left and right ears equidistant from said sound source and repeating said incremental positioning within each azimuth , roll and pitch axis and repeating said incremental positioning between each azimuth , roll and pitch axis until a preselected time delay and optimal position is attained .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value (domain representations) with the previous residual spectrum , over frequency bins (frequency interval, frequency spectrum) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US6223090B1
CLAIM 1
. A computer controlled closed-loop three-dimensional iterative positioning method for positioning an acoustic manikin for near field head-related transfer function measurements , said method comprising the steps of : providing a selectively positioned audio signal from a sound source ;
receiving said audio signal at first and second ears of said manikin ;
transforming time domain representations (correlation value) of said audio signal received by said manikin in a manikin selected axis first position thereof to frequency domain phase and amplitude values ;
first computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point in an azimuth axis relative to said sound source wherein said phase reference point is said second ear of said manikin and further including rotating said manikin in azimuth such that a determined time delay between said first and second ears is minimized and repeating said transforming and first computing ;
second computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point wherein said phase reference point is said second ear of said manikin and further including rotating by 180 degrees said manikin about said azimuth axis and rotating said manikin about said selected roll axis such that said time delay between said first and second ears is minimized and repeating said transforming , and second computing steps ;
and third computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point wherein said phase reference point is said sound source and further including after said receiving step : ignoring said audio signal at said second ear of said manikin ;
after said computing step , rotating said acoustic manikin 180 degrees about said azimuth axis and ignoring said audio signal at said first ear of said manikin ;
computing from said frequency domain phase and amplitude values a phase difference between said second ear of said manikin and said sound source ;
computing phase difference between said sound source and said first ear before said rotating step and said sound source and said second ear after said rotating step ;
rotating said manikin about said selected pitch axis such that said time delay is minimized ;
and repeating said transforming and third computing steps .

US6223090B1
CLAIM 7
. A computer controlled closed-loop three-dimensional iterative positioning method for positioning an acoustic manikin for head-related transfer function measurements , said method comprising the steps of : providing a selectively positioned audio signal from a sound source ;
receiving said audio signal at first and second ears of said manikin ;
transforming time domain representations of said audio signal received by said manikin in a manikin selected axis first position thereof to frequency domain phase and amplitude values ;
computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point ;
determining a time delay from said phase difference of said computing step comprising the steps of : providing estimated error and interaural time delay values ;
generating a modified linear least square error between said estimated interaural time delay value at a selected frequency interval (frequency spectrum (frequency spectrum, term correlation map, frequency bins) , term correlation map, frequency bins) within a frequency spectrum and a pure interaural time delay value such that the angular error in each interval is modified to be within the range −180 degrees to 180 degrees ;
comparing estimated error and linear least square error , the smaller value being reset as estimated error and the associated interaural time delay reset as the estimated interaural time delay ;
repeating above at consecutive frequency intervals within said entire frequency spectrum ;
and generating an interaural time delay for describing position of said manikin ;
rotating said manikin about said selected axis relative to said sound source in directionally determined response to time delay determinations from said determining step ;
and repeating said transforming , said computing , said determining and said rotating steps until a preselected time delay representing optimal position alignment about said selected axis is obtained relative to said sound source .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin (fast Fourier transform) by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (frequency interval, frequency spectrum) so as to produce a summed long-term correlation map .
US6223090B1
CLAIM 7
. A computer controlled closed-loop three-dimensional iterative positioning method for positioning an acoustic manikin for head-related transfer function measurements , said method comprising the steps of : providing a selectively positioned audio signal from a sound source ;
receiving said audio signal at first and second ears of said manikin ;
transforming time domain representations of said audio signal received by said manikin in a manikin selected axis first position thereof to frequency domain phase and amplitude values ;
computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point ;
determining a time delay from said phase difference of said computing step comprising the steps of : providing estimated error and interaural time delay values ;
generating a modified linear least square error between said estimated interaural time delay value at a selected frequency interval (frequency spectrum (frequency spectrum, term correlation map, frequency bins) , term correlation map, frequency bins) within a frequency spectrum and a pure interaural time delay value such that the angular error in each interval is modified to be within the range −180 degrees to 180 degrees ;
comparing estimated error and linear least square error , the smaller value being reset as estimated error and the associated interaural time delay reset as the estimated interaural time delay ;
repeating above at consecutive frequency intervals within said entire frequency spectrum ;
and generating an interaural time delay for describing position of said manikin ;
rotating said manikin about said selected axis relative to said sound source in directionally determined response to time delay determinations from said determining step ;
and repeating said transforming , said computing , said determining and said rotating steps until a preselected time delay representing optimal position alignment about said selected axis is obtained relative to said sound source .

US6223090B1
CLAIM 14
. The computer controlled closed-loop three-dimensional iterative positioning device of claim 12 for measuring head-related transfer functions on a manikin having left and right ears wherein means for transforming includes means for performing a fast Fourier transform (frequency bin) .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map for frequency bins (frequency interval, frequency spectrum) having a magnitude that exceeds a given fixed threshold .
US6223090B1
CLAIM 7
. A computer controlled closed-loop three-dimensional iterative positioning method for positioning an acoustic manikin for head-related transfer function measurements , said method comprising the steps of : providing a selectively positioned audio signal from a sound source ;
receiving said audio signal at first and second ears of said manikin ;
transforming time domain representations of said audio signal received by said manikin in a manikin selected axis first position thereof to frequency domain phase and amplitude values ;
computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point ;
determining a time delay from said phase difference of said computing step comprising the steps of : providing estimated error and interaural time delay values ;
generating a modified linear least square error between said estimated interaural time delay value at a selected frequency interval (frequency spectrum (frequency spectrum, term correlation map, frequency bins) , term correlation map, frequency bins) within a frequency spectrum and a pure interaural time delay value such that the angular error in each interval is modified to be within the range −180 degrees to 180 degrees ;
comparing estimated error and linear least square error , the smaller value being reset as estimated error and the associated interaural time delay reset as the estimated interaural time delay ;
repeating above at consecutive frequency intervals within said entire frequency spectrum ;
and generating an interaural time delay for describing position of said manikin ;
rotating said manikin about said selected axis relative to said sound source in directionally determined response to time delay determinations from said determining step ;
and repeating said transforming , said computing , said determining and said rotating steps until a preselected time delay representing optimal position alignment about said selected axis is obtained relative to said sound source .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (near field) indicative of sound activity in the sound signal .
US6223090B1
CLAIM 1
. A computer controlled closed-loop three-dimensional iterative positioning method for positioning an acoustic manikin for near field (adaptive threshold) head-related transfer function measurements , said method comprising the steps of : providing a selectively positioned audio signal from a sound source ;
receiving said audio signal at first and second ears of said manikin ;
transforming time domain representations of said audio signal received by said manikin in a manikin selected axis first position thereof to frequency domain phase and amplitude values ;
first computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point in an azimuth axis relative to said sound source wherein said phase reference point is said second ear of said manikin and further including rotating said manikin in azimuth such that a determined time delay between said first and second ears is minimized and repeating said transforming and first computing ;
second computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point wherein said phase reference point is said second ear of said manikin and further including rotating by 180 degrees said manikin about said azimuth axis and rotating said manikin about said selected roll axis such that said time delay between said first and second ears is minimized and repeating said transforming , and second computing steps ;
and third computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point wherein said phase reference point is said sound source and further including after said receiving step : ignoring said audio signal at said second ear of said manikin ;
after said computing step , rotating said acoustic manikin 180 degrees about said azimuth axis and ignoring said audio signal at said first ear of said manikin ;
computing from said frequency domain phase and amplitude values a phase difference between said second ear of said manikin and said sound source ;
computing phase difference between said sound source and said first ear before said rotating step and said sound source and said second ear after said rotating step ;
rotating said manikin about said selected pitch axis such that said time delay is minimized ;
and repeating said transforming and third computing steps .

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (phase difference) ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US6223090B1
CLAIM 1
. A computer controlled closed-loop three-dimensional iterative positioning method for positioning an acoustic manikin for near field head-related transfer function measurements , said method comprising the steps of : providing a selectively positioned audio signal from a sound source ;
receiving said audio signal at first and second ears of said manikin ;
transforming time domain representations of said audio signal received by said manikin in a manikin selected axis first position thereof to frequency domain phase and amplitude values ;
first computing from said frequency domain phase and amplitude values a phase difference (background noise signal) between said first ear of said manikin and a phase reference point in an azimuth axis relative to said sound source wherein said phase reference point is said second ear of said manikin and further including rotating said manikin in azimuth such that a determined time delay between said first and second ears is minimized and repeating said transforming and first computing ;
second computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point wherein said phase reference point is said second ear of said manikin and further including rotating by 180 degrees said manikin about said azimuth axis and rotating said manikin about said selected roll axis such that said time delay between said first and second ears is minimized and repeating said transforming , and second computing steps ;
and third computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point wherein said phase reference point is said sound source and further including after said receiving step : ignoring said audio signal at said second ear of said manikin ;
after said computing step , rotating said acoustic manikin 180 degrees about said azimuth axis and ignoring said audio signal at said first ear of said manikin ;
computing from said frequency domain phase and amplitude values a phase difference between said second ear of said manikin and said sound source ;
computing phase difference between said sound source and said first ear before said rotating step and said sound source and said second ear after said rotating step ;
rotating said manikin about said selected pitch axis such that said time delay is minimized ;
and repeating said transforming and third computing steps .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (least square) .
US6223090B1
CLAIM 7
. A computer controlled closed-loop three-dimensional iterative positioning method for positioning an acoustic manikin for head-related transfer function measurements , said method comprising the steps of : providing a selectively positioned audio signal from a sound source ;
receiving said audio signal at first and second ears of said manikin ;
transforming time domain representations of said audio signal received by said manikin in a manikin selected axis first position thereof to frequency domain phase and amplitude values ;
computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point ;
determining a time delay from said phase difference of said computing step comprising the steps of : providing estimated error and interaural time delay values ;
generating a modified linear least square (SNR calculation) error between said estimated interaural time delay value at a selected frequency interval within a frequency spectrum and a pure interaural time delay value such that the angular error in each interval is modified to be within the range −180 degrees to 180 degrees ;
comparing estimated error and linear least square error , the smaller value being reset as estimated error and the associated interaural time delay reset as the estimated interaural time delay ;
repeating above at consecutive frequency intervals within said entire frequency spectrum ;
and generating an interaural time delay for describing position of said manikin ;
rotating said manikin about said selected axis relative to said sound source in directionally determined response to time delay determinations from said determining step ;
and repeating said transforming , said computing , said determining and said rotating steps until a preselected time delay representing optimal position alignment about said selected axis is obtained relative to said sound source .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability (selected time) , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order (includes means, first position) and a sixteenth order (includes means, first position) of linear prediction residual error energies .
US6223090B1
CLAIM 1
. A computer controlled closed-loop three-dimensional iterative positioning method for positioning an acoustic manikin for near field head-related transfer function measurements , said method comprising the steps of : providing a selectively positioned audio signal from a sound source ;
receiving said audio signal at first and second ears of said manikin ;
transforming time domain representations of said audio signal received by said manikin in a manikin selected axis first position (second order, sixteenth order) thereof to frequency domain phase and amplitude values ;
first computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point in an azimuth axis relative to said sound source wherein said phase reference point is said second ear of said manikin and further including rotating said manikin in azimuth such that a determined time delay between said first and second ears is minimized and repeating said transforming and first computing ;
second computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point wherein said phase reference point is said second ear of said manikin and further including rotating by 180 degrees said manikin about said azimuth axis and rotating said manikin about said selected roll axis such that said time delay between said first and second ears is minimized and repeating said transforming , and second computing steps ;
and third computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point wherein said phase reference point is said sound source and further including after said receiving step : ignoring said audio signal at said second ear of said manikin ;
after said computing step , rotating said acoustic manikin 180 degrees about said azimuth axis and ignoring said audio signal at said first ear of said manikin ;
computing from said frequency domain phase and amplitude values a phase difference between said second ear of said manikin and said sound source ;
computing phase difference between said sound source and said first ear before said rotating step and said sound source and said second ear after said rotating step ;
rotating said manikin about said selected pitch axis such that said time delay is minimized ;
and repeating said transforming and third computing steps .

US6223090B1
CLAIM 7
. A computer controlled closed-loop three-dimensional iterative positioning method for positioning an acoustic manikin for head-related transfer function measurements , said method comprising the steps of : providing a selectively positioned audio signal from a sound source ;
receiving said audio signal at first and second ears of said manikin ;
transforming time domain representations of said audio signal received by said manikin in a manikin selected axis first position thereof to frequency domain phase and amplitude values ;
computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point ;
determining a time delay from said phase difference of said computing step comprising the steps of : providing estimated error and interaural time delay values ;
generating a modified linear least square error between said estimated interaural time delay value at a selected frequency interval within a frequency spectrum and a pure interaural time delay value such that the angular error in each interval is modified to be within the range −180 degrees to 180 degrees ;
comparing estimated error and linear least square error , the smaller value being reset as estimated error and the associated interaural time delay reset as the estimated interaural time delay ;
repeating above at consecutive frequency intervals within said entire frequency spectrum ;
and generating an interaural time delay for describing position of said manikin ;
rotating said manikin about said selected axis relative to said sound source in directionally determined response to time delay determinations from said determining step ;
and repeating said transforming , said computing , said determining and said rotating steps until a preselected time (pitch stability) delay representing optimal position alignment about said selected axis is obtained relative to said sound source .

US6223090B1
CLAIM 14
. The computer controlled closed-loop three-dimensional iterative positioning device of claim 12 for measuring head-related transfer functions on a manikin having left and right ears wherein means for transforming includes means (second order, sixteenth order) for performing a fast Fourier transform .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character (second motor, said time) parameter (second motor, said time) in order to distinguish a music signal from a background noise signal (phase difference) and prevent update of noise energy estimates on the music signal .
US6223090B1
CLAIM 1
. A computer controlled closed-loop three-dimensional iterative positioning method for positioning an acoustic manikin for near field head-related transfer function measurements , said method comprising the steps of : providing a selectively positioned audio signal from a sound source ;
receiving said audio signal at first and second ears of said manikin ;
transforming time domain representations of said audio signal received by said manikin in a manikin selected axis first position thereof to frequency domain phase and amplitude values ;
first computing from said frequency domain phase and amplitude values a phase difference (background noise signal) between said first ear of said manikin and a phase reference point in an azimuth axis relative to said sound source wherein said phase reference point is said second ear of said manikin and further including rotating said manikin in azimuth such that a determined time delay between said first and second ears is minimized and repeating said transforming and first computing ;
second computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point wherein said phase reference point is said second ear of said manikin and further including rotating by 180 degrees said manikin about said azimuth axis and rotating said manikin about said selected roll axis such that said time (noise character parameter, noise character) delay between said first and second ears is minimized and repeating said transforming , and second computing steps ;
and third computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point wherein said phase reference point is said sound source and further including after said receiving step : ignoring said audio signal at said second ear of said manikin ;
after said computing step , rotating said acoustic manikin 180 degrees about said azimuth axis and ignoring said audio signal at said first ear of said manikin ;
computing from said frequency domain phase and amplitude values a phase difference between said second ear of said manikin and said sound source ;
computing phase difference between said sound source and said first ear before said rotating step and said sound source and said second ear after said rotating step ;
rotating said manikin about said selected pitch axis such that said time delay is minimized ;
and repeating said transforming and third computing steps .

US6223090B1
CLAIM 10
. The computer controlled closed-loop three-dimensional iterative positioning method of claim 1 for positioning an acoustic manikin for head-related transfer function measurements wherein said preselected time delay in said second motor (noise character parameter, noise character) ized rotating step is 5 microseconds .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame (said motor) energy and an average frame energy .
US6223090B1
CLAIM 12
. A computer controlled closed-loop three-dimensional iterative positioning device for measuring near field head-related transfer functions on a manikin having left and right ears comprising : near field positioned audio signal sound source ;
first and second microphones connected in close proximity to said left and right ears for recording said audio signal ;
means for transforming said audio signal received at said left and right ears from time domain to frequency domain amplitude and phase ;
a signal analyzing device for electronically measuring phase difference between said left and right ears of said acoustic manikin ;
a motorized stand for securing said manikin ;
and a control computer electronically coupled to said motor (current frame) ized stand for calculating a time delay for reception of said audio signals at said left and right ears of said manikin in azimuth , roll and pitch , said control computer generating electronic signals responsive to said time delay and communicating said signals to said motorized stand for incrementally positioning said left and right ears equidistant from said sound source and repeating said incremental positioning within each azimuth , roll and pitch axis and repeating said incremental positioning between each azimuth , roll and pitch axis until a preselected time delay and optimal position is attained .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame (said motor) and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US6223090B1
CLAIM 12
. A computer controlled closed-loop three-dimensional iterative positioning device for measuring near field head-related transfer functions on a manikin having left and right ears comprising : near field positioned audio signal sound source ;
first and second microphones connected in close proximity to said left and right ears for recording said audio signal ;
means for transforming said audio signal received at said left and right ears from time domain to frequency domain amplitude and phase ;
a signal analyzing device for electronically measuring phase difference between said left and right ears of said acoustic manikin ;
a motorized stand for securing said manikin ;
and a control computer electronically coupled to said motor (current frame) ized stand for calculating a time delay for reception of said audio signals at said left and right ears of said manikin in azimuth , roll and pitch , said control computer generating electronic signals responsive to said time delay and communicating said signals to said motorized stand for incrementally positioning said left and right ears equidistant from said sound source and repeating said incremental positioning within each azimuth , roll and pitch axis and repeating said incremental positioning between each azimuth , roll and pitch axis until a preselected time delay and optimal position is attained .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character (second motor, said time) parameter (second motor, said time) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US6223090B1
CLAIM 1
. A computer controlled closed-loop three-dimensional iterative positioning method for positioning an acoustic manikin for near field head-related transfer function measurements , said method comprising the steps of : providing a selectively positioned audio signal from a sound source ;
receiving said audio signal at first and second ears of said manikin ;
transforming time domain representations of said audio signal received by said manikin in a manikin selected axis first position thereof to frequency domain phase and amplitude values ;
first computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point in an azimuth axis relative to said sound source wherein said phase reference point is said second ear of said manikin and further including rotating said manikin in azimuth such that a determined time delay between said first and second ears is minimized and repeating said transforming and first computing ;
second computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point wherein said phase reference point is said second ear of said manikin and further including rotating by 180 degrees said manikin about said azimuth axis and rotating said manikin about said selected roll axis such that said time (noise character parameter, noise character) delay between said first and second ears is minimized and repeating said transforming , and second computing steps ;
and third computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point wherein said phase reference point is said sound source and further including after said receiving step : ignoring said audio signal at said second ear of said manikin ;
after said computing step , rotating said acoustic manikin 180 degrees about said azimuth axis and ignoring said audio signal at said first ear of said manikin ;
computing from said frequency domain phase and amplitude values a phase difference between said second ear of said manikin and said sound source ;
computing phase difference between said sound source and said first ear before said rotating step and said sound source and said second ear after said rotating step ;
rotating said manikin about said selected pitch axis such that said time delay is minimized ;
and repeating said transforming and third computing steps .

US6223090B1
CLAIM 10
. The computer controlled closed-loop three-dimensional iterative positioning method of claim 1 for positioning an acoustic manikin for head-related transfer function measurements wherein said preselected time delay in said second motor (noise character parameter, noise character) ized rotating step is 5 microseconds .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character (second motor, said time) parameter (second motor, said time) inferior than a given fixed threshold .
US6223090B1
CLAIM 1
. A computer controlled closed-loop three-dimensional iterative positioning method for positioning an acoustic manikin for near field head-related transfer function measurements , said method comprising the steps of : providing a selectively positioned audio signal from a sound source ;
receiving said audio signal at first and second ears of said manikin ;
transforming time domain representations of said audio signal received by said manikin in a manikin selected axis first position thereof to frequency domain phase and amplitude values ;
first computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point in an azimuth axis relative to said sound source wherein said phase reference point is said second ear of said manikin and further including rotating said manikin in azimuth such that a determined time delay between said first and second ears is minimized and repeating said transforming and first computing ;
second computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point wherein said phase reference point is said second ear of said manikin and further including rotating by 180 degrees said manikin about said azimuth axis and rotating said manikin about said selected roll axis such that said time (noise character parameter, noise character) delay between said first and second ears is minimized and repeating said transforming , and second computing steps ;
and third computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point wherein said phase reference point is said sound source and further including after said receiving step : ignoring said audio signal at said second ear of said manikin ;
after said computing step , rotating said acoustic manikin 180 degrees about said azimuth axis and ignoring said audio signal at said first ear of said manikin ;
computing from said frequency domain phase and amplitude values a phase difference between said second ear of said manikin and said sound source ;
computing phase difference between said sound source and said first ear before said rotating step and said sound source and said second ear after said rotating step ;
rotating said manikin about said selected pitch axis such that said time delay is minimized ;
and repeating said transforming and third computing steps .

US6223090B1
CLAIM 10
. The computer controlled closed-loop three-dimensional iterative positioning method of claim 1 for positioning an acoustic manikin for head-related transfer function measurements wherein said preselected time delay in said second motor (noise character parameter, noise character) ized rotating step is 5 microseconds .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (frequency interval, frequency spectrum) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (said motor) , and an initial value of the long-term correlation map .
US6223090B1
CLAIM 7
. A computer controlled closed-loop three-dimensional iterative positioning method for positioning an acoustic manikin for head-related transfer function measurements , said method comprising the steps of : providing a selectively positioned audio signal from a sound source ;
receiving said audio signal at first and second ears of said manikin ;
transforming time domain representations of said audio signal received by said manikin in a manikin selected axis first position thereof to frequency domain phase and amplitude values ;
computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point ;
determining a time delay from said phase difference of said computing step comprising the steps of : providing estimated error and interaural time delay values ;
generating a modified linear least square error between said estimated interaural time delay value at a selected frequency interval (frequency spectrum (frequency spectrum, term correlation map, frequency bins) , term correlation map, frequency bins) within a frequency spectrum and a pure interaural time delay value such that the angular error in each interval is modified to be within the range −180 degrees to 180 degrees ;
comparing estimated error and linear least square error , the smaller value being reset as estimated error and the associated interaural time delay reset as the estimated interaural time delay ;
repeating above at consecutive frequency intervals within said entire frequency spectrum ;
and generating an interaural time delay for describing position of said manikin ;
rotating said manikin about said selected axis relative to said sound source in directionally determined response to time delay determinations from said determining step ;
and repeating said transforming , said computing , said determining and said rotating steps until a preselected time delay representing optimal position alignment about said selected axis is obtained relative to said sound source .

US6223090B1
CLAIM 12
. A computer controlled closed-loop three-dimensional iterative positioning device for measuring near field head-related transfer functions on a manikin having left and right ears comprising : near field positioned audio signal sound source ;
first and second microphones connected in close proximity to said left and right ears for recording said audio signal ;
means for transforming said audio signal received at said left and right ears from time domain to frequency domain amplitude and phase ;
a signal analyzing device for electronically measuring phase difference between said left and right ears of said acoustic manikin ;
a motorized stand for securing said manikin ;
and a control computer electronically coupled to said motor (current frame) ized stand for calculating a time delay for reception of said audio signals at said left and right ears of said manikin in azimuth , roll and pitch , said control computer generating electronic signals responsive to said time delay and communicating said signals to said motorized stand for incrementally positioning said left and right ears equidistant from said sound source and repeating said incremental positioning within each azimuth , roll and pitch axis and repeating said incremental positioning between each azimuth , roll and pitch axis until a preselected time delay and optimal position is attained .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (frequency interval, frequency spectrum) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (said motor) , and an initial value of the long-term correlation map .
US6223090B1
CLAIM 7
. A computer controlled closed-loop three-dimensional iterative positioning method for positioning an acoustic manikin for head-related transfer function measurements , said method comprising the steps of : providing a selectively positioned audio signal from a sound source ;
receiving said audio signal at first and second ears of said manikin ;
transforming time domain representations of said audio signal received by said manikin in a manikin selected axis first position thereof to frequency domain phase and amplitude values ;
computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point ;
determining a time delay from said phase difference of said computing step comprising the steps of : providing estimated error and interaural time delay values ;
generating a modified linear least square error between said estimated interaural time delay value at a selected frequency interval (frequency spectrum (frequency spectrum, term correlation map, frequency bins) , term correlation map, frequency bins) within a frequency spectrum and a pure interaural time delay value such that the angular error in each interval is modified to be within the range −180 degrees to 180 degrees ;
comparing estimated error and linear least square error , the smaller value being reset as estimated error and the associated interaural time delay reset as the estimated interaural time delay ;
repeating above at consecutive frequency intervals within said entire frequency spectrum ;
and generating an interaural time delay for describing position of said manikin ;
rotating said manikin about said selected axis relative to said sound source in directionally determined response to time delay determinations from said determining step ;
and repeating said transforming , said computing , said determining and said rotating steps until a preselected time delay representing optimal position alignment about said selected axis is obtained relative to said sound source .

US6223090B1
CLAIM 12
. A computer controlled closed-loop three-dimensional iterative positioning device for measuring near field head-related transfer functions on a manikin having left and right ears comprising : near field positioned audio signal sound source ;
first and second microphones connected in close proximity to said left and right ears for recording said audio signal ;
means for transforming said audio signal received at said left and right ears from time domain to frequency domain amplitude and phase ;
a signal analyzing device for electronically measuring phase difference between said left and right ears of said acoustic manikin ;
a motorized stand for securing said manikin ;
and a control computer electronically coupled to said motor (current frame) ized stand for calculating a time delay for reception of said audio signals at said left and right ears of said manikin in azimuth , roll and pitch , said control computer generating electronic signals responsive to said time delay and communicating said signals to said motorized stand for incrementally positioning said left and right ears equidistant from said sound source and repeating said incremental positioning within each azimuth , roll and pitch axis and repeating said incremental positioning between each azimuth , roll and pitch axis until a preselected time delay and optimal position is attained .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (frequency interval, frequency spectrum) of the sound signal in the current frame (said motor) ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US6223090B1
CLAIM 7
. A computer controlled closed-loop three-dimensional iterative positioning method for positioning an acoustic manikin for head-related transfer function measurements , said method comprising the steps of : providing a selectively positioned audio signal from a sound source ;
receiving said audio signal at first and second ears of said manikin ;
transforming time domain representations of said audio signal received by said manikin in a manikin selected axis first position thereof to frequency domain phase and amplitude values ;
computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point ;
determining a time delay from said phase difference of said computing step comprising the steps of : providing estimated error and interaural time delay values ;
generating a modified linear least square error between said estimated interaural time delay value at a selected frequency interval (frequency spectrum (frequency spectrum, term correlation map, frequency bins) , term correlation map, frequency bins) within a frequency spectrum and a pure interaural time delay value such that the angular error in each interval is modified to be within the range −180 degrees to 180 degrees ;
comparing estimated error and linear least square error , the smaller value being reset as estimated error and the associated interaural time delay reset as the estimated interaural time delay ;
repeating above at consecutive frequency intervals within said entire frequency spectrum ;
and generating an interaural time delay for describing position of said manikin ;
rotating said manikin about said selected axis relative to said sound source in directionally determined response to time delay determinations from said determining step ;
and repeating said transforming , said computing , said determining and said rotating steps until a preselected time delay representing optimal position alignment about said selected axis is obtained relative to said sound source .

US6223090B1
CLAIM 12
. A computer controlled closed-loop three-dimensional iterative positioning device for measuring near field head-related transfer functions on a manikin having left and right ears comprising : near field positioned audio signal sound source ;
first and second microphones connected in close proximity to said left and right ears for recording said audio signal ;
means for transforming said audio signal received at said left and right ears from time domain to frequency domain amplitude and phase ;
a signal analyzing device for electronically measuring phase difference between said left and right ears of said acoustic manikin ;
a motorized stand for securing said manikin ;
and a control computer electronically coupled to said motor (current frame) ized stand for calculating a time delay for reception of said audio signals at said left and right ears of said manikin in azimuth , roll and pitch , said control computer generating electronic signals responsive to said time delay and communicating said signals to said motorized stand for incrementally positioning said left and right ears equidistant from said sound source and repeating said incremental positioning within each azimuth , roll and pitch axis and repeating said incremental positioning between each azimuth , roll and pitch axis until a preselected time delay and optimal position is attained .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin (fast Fourier transform) by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (frequency interval, frequency spectrum) so as to produce a summed long-term correlation map .
US6223090B1
CLAIM 7
. A computer controlled closed-loop three-dimensional iterative positioning method for positioning an acoustic manikin for head-related transfer function measurements , said method comprising the steps of : providing a selectively positioned audio signal from a sound source ;
receiving said audio signal at first and second ears of said manikin ;
transforming time domain representations of said audio signal received by said manikin in a manikin selected axis first position thereof to frequency domain phase and amplitude values ;
computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point ;
determining a time delay from said phase difference of said computing step comprising the steps of : providing estimated error and interaural time delay values ;
generating a modified linear least square error between said estimated interaural time delay value at a selected frequency interval (frequency spectrum (frequency spectrum, term correlation map, frequency bins) , term correlation map, frequency bins) within a frequency spectrum and a pure interaural time delay value such that the angular error in each interval is modified to be within the range −180 degrees to 180 degrees ;
comparing estimated error and linear least square error , the smaller value being reset as estimated error and the associated interaural time delay reset as the estimated interaural time delay ;
repeating above at consecutive frequency intervals within said entire frequency spectrum ;
and generating an interaural time delay for describing position of said manikin ;
rotating said manikin about said selected axis relative to said sound source in directionally determined response to time delay determinations from said determining step ;
and repeating said transforming , said computing , said determining and said rotating steps until a preselected time delay representing optimal position alignment about said selected axis is obtained relative to said sound source .

US6223090B1
CLAIM 14
. The computer controlled closed-loop three-dimensional iterative positioning device of claim 12 for measuring head-related transfer functions on a manikin having left and right ears wherein means for transforming includes means for performing a fast Fourier transform (frequency bin) .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (phase difference) ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US6223090B1
CLAIM 1
. A computer controlled closed-loop three-dimensional iterative positioning method for positioning an acoustic manikin for near field head-related transfer function measurements , said method comprising the steps of : providing a selectively positioned audio signal from a sound source ;
receiving said audio signal at first and second ears of said manikin ;
transforming time domain representations of said audio signal received by said manikin in a manikin selected axis first position thereof to frequency domain phase and amplitude values ;
first computing from said frequency domain phase and amplitude values a phase difference (background noise signal) between said first ear of said manikin and a phase reference point in an azimuth axis relative to said sound source wherein said phase reference point is said second ear of said manikin and further including rotating said manikin in azimuth such that a determined time delay between said first and second ears is minimized and repeating said transforming and first computing ;
second computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point wherein said phase reference point is said second ear of said manikin and further including rotating by 180 degrees said manikin about said azimuth axis and rotating said manikin about said selected roll axis such that said time delay between said first and second ears is minimized and repeating said transforming , and second computing steps ;
and third computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point wherein said phase reference point is said sound source and further including after said receiving step : ignoring said audio signal at said second ear of said manikin ;
after said computing step , rotating said acoustic manikin 180 degrees about said azimuth axis and ignoring said audio signal at said first ear of said manikin ;
computing from said frequency domain phase and amplitude values a phase difference between said second ear of said manikin and said sound source ;
computing phase difference between said sound source and said first ear before said rotating step and said sound source and said second ear after said rotating step ;
rotating said manikin about said selected pitch axis such that said time delay is minimized ;
and repeating said transforming and third computing steps .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal (phase difference) ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US6223090B1
CLAIM 1
. A computer controlled closed-loop three-dimensional iterative positioning method for positioning an acoustic manikin for near field head-related transfer function measurements , said method comprising the steps of : providing a selectively positioned audio signal from a sound source ;
receiving said audio signal at first and second ears of said manikin ;
transforming time domain representations of said audio signal received by said manikin in a manikin selected axis first position thereof to frequency domain phase and amplitude values ;
first computing from said frequency domain phase and amplitude values a phase difference (background noise signal) between said first ear of said manikin and a phase reference point in an azimuth axis relative to said sound source wherein said phase reference point is said second ear of said manikin and further including rotating said manikin in azimuth such that a determined time delay between said first and second ears is minimized and repeating said transforming and first computing ;
second computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point wherein said phase reference point is said second ear of said manikin and further including rotating by 180 degrees said manikin about said azimuth axis and rotating said manikin about said selected roll axis such that said time delay between said first and second ears is minimized and repeating said transforming , and second computing steps ;
and third computing from said frequency domain phase and amplitude values a phase difference between said first ear of said manikin and a phase reference point wherein said phase reference point is said sound source and further including after said receiving step : ignoring said audio signal at said second ear of said manikin ;
after said computing step , rotating said acoustic manikin 180 degrees about said azimuth axis and ignoring said audio signal at said first ear of said manikin ;
computing from said frequency domain phase and amplitude values a phase difference between said second ear of said manikin and said sound source ;
computing phase difference between said sound source and said first ear before said rotating step and said sound source and said second ear after said rotating step ;
rotating said manikin about said selected pitch axis such that said time delay is minimized ;
and repeating said transforming and third computing steps .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal (second microphones) to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US6223090B1
CLAIM 12
. A computer controlled closed-loop three-dimensional iterative positioning device for measuring near field head-related transfer functions on a manikin having left and right ears comprising : near field positioned audio signal sound source ;
first and second microphones (average signal) connected in close proximity to said left and right ears for recording said audio signal ;
means for transforming said audio signal received at said left and right ears from time domain to frequency domain amplitude and phase ;
a signal analyzing device for electronically measuring phase difference between said left and right ears of said acoustic manikin ;
a motorized stand for securing said manikin ;
and a control computer electronically coupled to said motorized stand for calculating a time delay for reception of said audio signals at said left and right ears of said manikin in azimuth , roll and pitch , said control computer generating electronic signals responsive to said time delay and communicating said signals to said motorized stand for incrementally positioning said left and right ears equidistant from said sound source and repeating said incremental positioning within each azimuth , roll and pitch axis and repeating said incremental positioning between each azimuth , roll and pitch axis until a preselected time delay and optimal position is attained .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character (second motor, said time) of the sound signal for distinguishing a music signal from a background noise signal (phase difference) and preventing update of noise energy estimates .
US6223090B1
CLAIM 7
. A computer controlled closed-loop three-dimensional iterative positioning method for positioning an acoustic manikin for head-related transfer function measurements , said method comprising the steps of : providing a selectively positioned audio signal from a sound source ;
receiving said audio signal at first and second ears of said manikin ;
transforming time domain representations of said audio signal received by said manikin in a manikin selected axis first position thereof to frequency domain phase and amplitude values ;
computing from said frequency domain phase and amplitude values a phase difference (background noise signal) between said first ear of said manikin and a phase reference point ;
determining a time delay from said phase difference of said computing step comprising the steps of : providing estimated error and interaural time delay values ;
generating a modified linear least square error between said estimated interaural time delay value at a selected frequency interval within a frequency spectrum and a pure interaural time delay value such that the angular error in each interval is modified to be within the range −180 degrees to 180 degrees ;
comparing estimated error and linear least square error , the smaller value being reset as estimated error and the associated interaural time delay reset as the estimated interaural time delay ;
repeating above at consecutive frequency intervals within said entire frequency spectrum ;
and generating an interaural time delay for describing position of said manikin ;
rotating said manikin about said selected axis relative to said sound source in directionally determined response to time delay determinations from said determining step ;
and repeating said transforming , said computing , said determining and said rotating steps until a preselected time delay representing optimal position alignment about said selected axis is obtained relative to said sound source .

US6223090B1
CLAIM 10
. The computer controlled closed-loop three-dimensional iterative positioning method of claim 1 for positioning an acoustic manikin for head-related transfer function measurements wherein said preselected time delay in said second motor (noise character parameter, noise character) ized rotating step is 5 microseconds .

US6223090B1
CLAIM 12
. A computer controlled closed-loop three-dimensional iterative positioning device for measuring near field head-related transfer functions on a manikin having left and right ears comprising : near field positioned audio signal sound source ;
first and second microphones connected in close proximity to said left and right ears for recording said audio signal ;
means for transforming said audio signal received at said left and right ears from time domain to frequency domain amplitude and phase ;
a signal analyzing device for electronically measuring phase difference between said left and right ears of said acoustic manikin ;
a motorized stand for securing said manikin ;
and a control computer electronically coupled to said motorized stand for calculating a time delay for reception of said audio signals at said left and right ears of said manikin in azimuth , roll and pitch , said control computer generating electronic signals responsive to said time (noise character parameter, noise character) delay and communicating said signals to said motorized stand for incrementally positioning said left and right ears equidistant from said sound source and repeating said incremental positioning within each azimuth , roll and pitch axis and repeating said incremental positioning between each azimuth , roll and pitch axis until a preselected time delay and optimal position is attained .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6173255B1

Filed: 1998-08-18     Issued: 2001-01-09

Synchronized overlap add voice processing using windows and one bit correlators

(Original Assignee) Lockheed Martin Corp     (Current Assignee) Lockheed Martin Corp ; Lockheed Martin Aerospace Corp

Dennis L. Wilson, James L. Wayman
US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (analog audio signal) from a background noise signal (analog audio signal) ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US6173255B1
CLAIM 9
. The method recited in claim 1 further comprising the steps of : decoding the compressed audio signal to produce a partially expanded digitized audio signal ;
decompressing the partially expanded digitized audio signal using a synchronized overlap add processor ;
differentially processing the decompressed partially expanded digitized audio signal to delay a sample thereof and add the delayed sample to a current sample of the decompressed digitized audio signal ;
and converting the decompressed digitized audio signal to an analog audio signal (music signal, background noise signal) .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error energies (current sample) .
US6173255B1
CLAIM 4
. The method recited in claim 1 further comprising the step of processing the residual output signal using a differential processor that delays a sample of the audio signal , and subtracts the delayed sample from a current sample (linear prediction residual error energies) of the audio signal .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal prevents updating of noise energy estimates when a music signal (analog audio signal) is detected .
US6173255B1
CLAIM 9
. The method recited in claim 1 further comprising the steps of : decoding the compressed audio signal to produce a partially expanded digitized audio signal ;
decompressing the partially expanded digitized audio signal using a synchronized overlap add processor ;
differentially processing the decompressed partially expanded digitized audio signal to delay a sample thereof and add the delayed sample to a current sample of the decompressed digitized audio signal ;
and converting the decompressed digitized audio signal to an analog audio signal (music signal, background noise signal) .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal (analog audio signal) from a background noise signal (analog audio signal) and prevent update of noise energy estimates on the music signal .
US6173255B1
CLAIM 9
. The method recited in claim 1 further comprising the steps of : decoding the compressed audio signal to produce a partially expanded digitized audio signal ;
decompressing the partially expanded digitized audio signal using a synchronized overlap add processor ;
differentially processing the decompressed partially expanded digitized audio signal to delay a sample thereof and add the delayed sample to a current sample of the decompressed digitized audio signal ;
and converting the decompressed digitized audio signal to an analog audio signal (music signal, background noise signal) .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (analog audio signal) from a background noise signal (analog audio signal) ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US6173255B1
CLAIM 9
. The method recited in claim 1 further comprising the steps of : decoding the compressed audio signal to produce a partially expanded digitized audio signal ;
decompressing the partially expanded digitized audio signal using a synchronized overlap add processor ;
differentially processing the decompressed partially expanded digitized audio signal to delay a sample thereof and add the delayed sample to a current sample of the decompressed digitized audio signal ;
and converting the decompressed digitized audio signal to an analog audio signal (music signal, background noise signal) .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal (analog audio signal) from a background noise signal (analog audio signal) ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US6173255B1
CLAIM 9
. The method recited in claim 1 further comprising the steps of : decoding the compressed audio signal to produce a partially expanded digitized audio signal ;
decompressing the partially expanded digitized audio signal using a synchronized overlap add processor ;
differentially processing the decompressed partially expanded digitized audio signal to delay a sample thereof and add the delayed sample to a current sample of the decompressed digitized audio signal ;
and converting the decompressed digitized audio signal to an analog audio signal (music signal, background noise signal) .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal (analog audio signal) from a background noise signal (analog audio signal) and preventing update of noise energy estimates .
US6173255B1
CLAIM 9
. The method recited in claim 1 further comprising the steps of : decoding the compressed audio signal to produce a partially expanded digitized audio signal ;
decompressing the partially expanded digitized audio signal using a synchronized overlap add processor ;
differentially processing the decompressed partially expanded digitized audio signal to delay a sample thereof and add the delayed sample to a current sample of the decompressed digitized audio signal ;
and converting the decompressed digitized audio signal to an analog audio signal (music signal, background noise signal) .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6449586B1

Filed: 1998-07-31     Issued: 2002-09-10

Control method of adaptive array and adaptive array apparatus

(Original Assignee) NEC Corp     (Current Assignee) NEC Corp

Osamu Hoshuyama
US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (least square) .
US6449586B1
CLAIM 17
. A control method for an adaptive array apparatus employing an adaptive filter and an iterative least square (SNR calculation) algorithm for receiving a particular signal source as a target signal source , among a plurality of signal sources , comprising the steps of : deriving a first indicative value relating to an amplitude of an output signal of a first beam former having higher sensitivity with respect to said target signal source than a sensitivity with respect to other signal sources ;
deriving a second indicative value relating to an amplitude of an output signal of a second beam former having lower sensitivity with respect to said target signal source than a sensitivity with respect to other signal sources ;
and determining a step size of an adaptive algorithm in said adaptive filter and a forgetting constant on the basis of said first and second indicative values .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection further comprises updating the noise estimates (adaptive filters) for a next frame .
US6449586B1
CLAIM 14
. An adaptive array apparatus having a generalized side lobe canceller type construction and having an adaptive filter for receiving a specific signal source as a target signal source among a plurality of signal sources , comprising : means for deriving a first indicative value relating to an amplitude of an output signal of a first beam former having higher sensitivity which respect to said target signal source than a sensitivity with respect to other signal sources ;
means for deriving a second indicative value relating to an amplitude of an output signal of a second beam former having lower sensitivity with respect to said target signal source than a sensitivity with respect to other signal sources ;
means for comparing said first indicative value and a value derived by multiplying said second indicative value with a constant ;
and means for determining step sizes of adaptive algorithms in adaptive filters (noise estimates, noise estimator) provided in a multi-input canceller and a blocking matrix on the basis of the result of the comparison .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error energies (other signals) .
US6449586B1
CLAIM 19
. An adaptive array apparatus which receives a plurality of input signals and outputs a target signal among the plurality of signals , the apparatus comprising : a first beam former which receives the plurality of input signals and produces a first output signal , the first beam former having higher sensitivity with respect to the target signal than a sensitivity with respect to other signals (linear prediction residual error energies) ;
a blocking matrix including a second beam former , the second beam former receives the plurality of input signals and produces a second output signal , the second beam former having higher sensitivity with respect to the target signal than a sensitivity with respect to other signals ;
the blocking matrix further including a first set of adaptive filters which receive the input signals and output adaptive filter output signals corresponding to beam formers having lower sensitivity with respect to , the target signal than a sensitivity with respect to other signals ;
the blocking matrix further including a first step size control circuit which compares the second output signal with the adaptive filter outputs signals and determines step sizes of adaptive algorithms in the first set of adaptive filters in the blocking matrix ;
a multi-input canceller including a second step size control circuit , the second step size control circuit compares the output of the second beam former and the adaptive filter output signals , and determines step sizes of adaptive algorithms in a second set of adaptive filters in the multi-input canceller ;
a subtractor which subtracts the outputs from the second set of adaptive filters of the multi-input canceller from the first outputs the target signal .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (non-linear function) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
US6449586B1
CLAIM 5
. An adaptive array apparatus having a generalized side lobe canceller type construction and having an adaptive filter for receiving a specific signal source as a target signal source among a plurality of signal sources , comprising : means for deriving a first indicative value relating to an amplitude of an output signal of a first beam former having higher sensitivity with respect to said target signal source than a sensitivity with respect to other signal sources ;
means for deriving a second indicative value relating to an amplitude of an output signal of a second beam former having lower sensitivity with respect to said target signal source than a sensitivity with respect to other signal sources ;
means for deriving a step size determining value proportional to a value derived by converting a division of said first indicative value by said second indicative value with a non-linear function (noise character parameter) ;
and means for determining a step size of an adaptive algorithm in an adaptive filter provided in a multi-input canceller on the basis of said step size determining value .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (non-linear function) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group (absolute value) of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US6449586B1
CLAIM 3
. An adaptive array apparatus as set forth in claim 1 , wherein each of said first and second indicative values relating to said signal amplitude is an average value of an absolute value (second group) of the signal .

US6449586B1
CLAIM 5
. An adaptive array apparatus having a generalized side lobe canceller type construction and having an adaptive filter for receiving a specific signal source as a target signal source among a plurality of signal sources , comprising : means for deriving a first indicative value relating to an amplitude of an output signal of a first beam former having higher sensitivity with respect to said target signal source than a sensitivity with respect to other signal sources ;
means for deriving a second indicative value relating to an amplitude of an output signal of a second beam former having lower sensitivity with respect to said target signal source than a sensitivity with respect to other signal sources ;
means for deriving a step size determining value proportional to a value derived by converting a division of said first indicative value by said second indicative value with a non-linear function (noise character parameter) ;
and means for determining a step size of an adaptive algorithm in an adaptive filter provided in a multi-input canceller on the basis of said step size determining value .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (non-linear function) inferior than a given fixed threshold .
US6449586B1
CLAIM 5
. An adaptive array apparatus having a generalized side lobe canceller type construction and having an adaptive filter for receiving a specific signal source as a target signal source among a plurality of signal sources , comprising : means for deriving a first indicative value relating to an amplitude of an output signal of a first beam former having higher sensitivity with respect to said target signal source than a sensitivity with respect to other signal sources ;
means for deriving a second indicative value relating to an amplitude of an output signal of a second beam former having lower sensitivity with respect to said target signal source than a sensitivity with respect to other signal sources ;
means for deriving a step size determining value proportional to a value derived by converting a division of said first indicative value by said second indicative value with a non-linear function (noise character parameter) ;
and means for determining a step size of an adaptive algorithm in an adaptive filter provided in a multi-input canceller on the basis of said step size determining value .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator (adaptive filters) for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector .
US6449586B1
CLAIM 14
. An adaptive array apparatus having a generalized side lobe canceller type construction and having an adaptive filter for receiving a specific signal source as a target signal source among a plurality of signal sources , comprising : means for deriving a first indicative value relating to an amplitude of an output signal of a first beam former having higher sensitivity which respect to said target signal source than a sensitivity with respect to other signal sources ;
means for deriving a second indicative value relating to an amplitude of an output signal of a second beam former having lower sensitivity with respect to said target signal source than a sensitivity with respect to other signal sources ;
means for comparing said first indicative value and a value derived by multiplying said second indicative value with a constant ;
and means for determining step sizes of adaptive algorithms in adaptive filters (noise estimates, noise estimator) provided in a multi-input canceller and a blocking matrix on the basis of the result of the comparison .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
JPH1198090A

Filed: 1998-07-24     Issued: 1999-04-09

音声符号化/復号化装置

(Original Assignee) Nec Corp; 日本電気株式会社     

Kiyoko Tanaka, 聖子 田中
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum (スペクトル特性) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (のフィルタ, の情報) , the correlation map of a current frame , and an initial value of the long term correlation map .
JPH1198090A
CLAIM 1
【請求項1】 入力音声信号に対して有声・無声区間を 判別して有声・無声に応じた識別制御信号を出力する有 声・無声判別部と、前記識別制御信号により有声のとき に入力される前記入力音声信号に対して線形予測分析法 に基づいて合成フィルタのフィルタ (update factor) 係数を算出すること でLPC(Linear Predictive Co ding)パラメータを取得すると共に、該LPCパラ メータをLSP(Line Spectrum Pai r)パラメータに換算するLPC分析部と、前記識別制 御信号により有声から無声に切り替わったときに直前の 有声時における前記LPCパラメータを一時蓄積するL PC蓄積部と、前記LPCパラメータに基づいて無声の ときの雑音特性を有声のときの雑音特性に近付けて前記 線形予測分析法に供する背景雑音を生成するための濾波 を行うLPFと、前記LSPパラメータに基づいて符号 化処理を行って符号化音声信号又は雑音信号を出力する 高能率符号化処理部と、前記識別制御信号に応じて有声 のときに前記符号化音声信号,無声ときに前記雑音信号 をそれぞれ出力符号化信号として切り替え送出するスイ ッチ制御部とを備えたことを特徴とする音声符号化装 置。

JPH1198090A
CLAIM 7
【請求項7】 請求項1〜6の何れか一つに記載の音声 符号化装置において、前記LPFは、送信者による発声 を示す有声又は無発声を示す無声を識別するVOX(V oice Operated Transmitte r)機能を有すると共に、CELP(Code−boo k Excited Linear Predicti on)方式に基づいて無声に伴う雑音のスペクトル特性 (current residual spectrum) を音声に伴う雑音のスペクトル特性に近似して前記背景 雑音を生成出力することを特徴とする音声符号化装置。

JPH1198090A
CLAIM 13
【請求項13】 請求項11又は12記載の音声復号化 装置において、前記高能率符号化処理部は、雑音を含む 音声以外の情報 (update factor) に対して前記LPCパラメータによる合 成フィルタを通してコードブックの照合を行うことを特 徴とする音声復号化装置。

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum (スペクトル特性) comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
JPH1198090A
CLAIM 7
【請求項7】 請求項1〜6の何れか一つに記載の音声 符号化装置において、前記LPFは、送信者による発声 を示す有声又は無発声を示す無声を識別するVOX(V oice Operated Transmitte r)機能を有すると共に、CELP(Code−boo k Excited Linear Predicti on)方式に基づいて無声に伴う雑音のスペクトル特性 (current residual spectrum) を音声に伴う雑音のスペクトル特性に近似して前記背景 雑音を生成出力することを特徴とする音声符号化装置。

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum (スペクトル特性) comprises locating a maximum between each pair of two consecutive minima of the current residual spectrum .
JPH1198090A
CLAIM 7
【請求項7】 請求項1〜6の何れか一つに記載の音声 符号化装置において、前記LPFは、送信者による発声 を示す有声又は無発声を示す無声を識別するVOX(V oice Operated Transmitte r)機能を有すると共に、CELP(Code−boo k Excited Linear Predicti on)方式に基づいて無声に伴う雑音のスペクトル特性 (current residual spectrum) を音声に伴う雑音のスペクトル特性に近似して前記背景 雑音を生成出力することを特徴とする音声符号化装置。

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum (スペクトル特性) , calculating a normalized correlation value with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
JPH1198090A
CLAIM 7
【請求項7】 請求項1〜6の何れか一つに記載の音声 符号化装置において、前記LPFは、送信者による発声 を示す有声又は無発声を示す無声を識別するVOX(V oice Operated Transmitte r)機能を有すると共に、CELP(Code−boo k Excited Linear Predicti on)方式に基づいて無声に伴う雑音のスペクトル特性 (current residual spectrum) を音声に伴う雑音のスペクトル特性に近似して前記背景 雑音を生成出力することを特徴とする音声符号化装置。

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound (音声以外) signal is detected .
JPH1198090A
CLAIM 3
【請求項3】 請求項2記載の音声符号化装置におい て、前記高能率符号化処理部は、雑音を含む音声以外 (tonal sound) の 情報に対して前記LPCパラメータによる合成フィルタ を通して前記コードブックの照合を行うことを特徴とす る音声符号化装置。

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (信号又) .
JPH1198090A
CLAIM 1
【請求項1】 入力音声信号に対して有声・無声区間を 判別して有声・無声に応じた識別制御信号を出力する有 声・無声判別部と、前記識別制御信号により有声のとき に入力される前記入力音声信号に対して線形予測分析法 に基づいて合成フィルタのフィルタ係数を算出すること でLPC(Linear Predictive Co ding)パラメータを取得すると共に、該LPCパラ メータをLSP(Line Spectrum Pai r)パラメータに換算するLPC分析部と、前記識別制 御信号により有声から無声に切り替わったときに直前の 有声時における前記LPCパラメータを一時蓄積するL PC蓄積部と、前記LPCパラメータに基づいて無声の ときの雑音特性を有声のときの雑音特性に近付けて前記 線形予測分析法に供する背景雑音を生成するための濾波 を行うLPFと、前記LSPパラメータに基づいて符号 化処理を行って符号化音声信号又 (SNR calculation) は雑音信号を出力する 高能率符号化処理部と、前記識別制御信号に応じて有声 のときに前記符号化音声信号,無声ときに前記雑音信号 をそれぞれ出力符号化信号として切り替え送出するスイ ッチ制御部とを備えたことを特徴とする音声符号化装 置。

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame energy and an average frame (平均化, 記復号) energy .
JPH1198090A
CLAIM 11
【請求項11】 請求項1〜8の何れか一つに記載の音 声符号化装置からの出力符号化信号を入力符号化信号と して入力した上で復号化する音声復号化装置であって、 前記入力符号化信号に対して前記ユニークワード制御信 号に基づいて有声・無声を判別して有声・無声に応じた 制御信号を出力する有声・無声判別部と、前記制御信号 により無声のときに前記入力符号化信号の背景雑音を蓄 積する背景雑音更新部と、前記制御信号により有声のと きに前記入力符号化信号をLPC復号化するLPC復号 部と、前記LPC復号化に際してのLPCパラメータを 蓄積するLPC蓄積部と、前記LSPパラメータに基づ いて復号化処理を行って復号化音声信号又は雑音信号を 出力する高能率復号化処理部と、前記LPCパラメータ に基づいて無声のときに前記雑音信号を濾波して背景雑 音として出力するHPFと、有声のときに前記復号 (average signal, average frame) 化音 声信号,無声のときに前記背景雑音をそれぞれ切り替え 送出するスイッチ制御部とを備えたことを特徴とする音 声復号化装置。

JPH1198090A
CLAIM 16
【請求項16】 請求項15記載の音声復号化装置にお いて、前記背景雑音制御部は、前記背景雑音更新部によ る前記背景雑音の更新に際して前記音声符号化装置側か ら一定周期で送信される該背景雑音の1フレーム分に基 づいて直前に送られたデータとの間で平均化 (average signal, average frame) した平均値 を算出することを特徴とする音声復号化装置。

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum (スペクトル特性) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (のフィルタ, の情報) , the correlation map of a current frame , and an initial value of the long-term correlation map .
JPH1198090A
CLAIM 1
【請求項1】 入力音声信号に対して有声・無声区間を 判別して有声・無声に応じた識別制御信号を出力する有 声・無声判別部と、前記識別制御信号により有声のとき に入力される前記入力音声信号に対して線形予測分析法 に基づいて合成フィルタのフィルタ (update factor) 係数を算出すること でLPC(Linear Predictive Co ding)パラメータを取得すると共に、該LPCパラ メータをLSP(Line Spectrum Pai r)パラメータに換算するLPC分析部と、前記識別制 御信号により有声から無声に切り替わったときに直前の 有声時における前記LPCパラメータを一時蓄積するL PC蓄積部と、前記LPCパラメータに基づいて無声の ときの雑音特性を有声のときの雑音特性に近付けて前記 線形予測分析法に供する背景雑音を生成するための濾波 を行うLPFと、前記LSPパラメータに基づいて符号 化処理を行って符号化音声信号又は雑音信号を出力する 高能率符号化処理部と、前記識別制御信号に応じて有声 のときに前記符号化音声信号,無声ときに前記雑音信号 をそれぞれ出力符号化信号として切り替え送出するスイ ッチ制御部とを備えたことを特徴とする音声符号化装 置。

JPH1198090A
CLAIM 7
【請求項7】 請求項1〜6の何れか一つに記載の音声 符号化装置において、前記LPFは、送信者による発声 を示す有声又は無発声を示す無声を識別するVOX(V oice Operated Transmitte r)機能を有すると共に、CELP(Code−boo k Excited Linear Predicti on)方式に基づいて無声に伴う雑音のスペクトル特性 (current residual spectrum) を音声に伴う雑音のスペクトル特性に近似して前記背景 雑音を生成出力することを特徴とする音声符号化装置。

JPH1198090A
CLAIM 13
【請求項13】 請求項11又は12記載の音声復号化 装置において、前記高能率符号化処理部は、雑音を含む 音声以外の情報 (update factor) に対して前記LPCパラメータによる合 成フィルタを通してコードブックの照合を行うことを特 徴とする音声復号化装置。

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum (スペクトル特性) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor (のフィルタ, の情報) , the correlation map of a current frame , and an initial value of the long-term correlation map .
JPH1198090A
CLAIM 1
【請求項1】 入力音声信号に対して有声・無声区間を 判別して有声・無声に応じた識別制御信号を出力する有 声・無声判別部と、前記識別制御信号により有声のとき に入力される前記入力音声信号に対して線形予測分析法 に基づいて合成フィルタのフィルタ (update factor) 係数を算出すること でLPC(Linear Predictive Co ding)パラメータを取得すると共に、該LPCパラ メータをLSP(Line Spectrum Pai r)パラメータに換算するLPC分析部と、前記識別制 御信号により有声から無声に切り替わったときに直前の 有声時における前記LPCパラメータを一時蓄積するL PC蓄積部と、前記LPCパラメータに基づいて無声の ときの雑音特性を有声のときの雑音特性に近付けて前記 線形予測分析法に供する背景雑音を生成するための濾波 を行うLPFと、前記LSPパラメータに基づいて符号 化処理を行って符号化音声信号又は雑音信号を出力する 高能率符号化処理部と、前記識別制御信号に応じて有声 のときに前記符号化音声信号,無声ときに前記雑音信号 をそれぞれ出力符号化信号として切り替え送出するスイ ッチ制御部とを備えたことを特徴とする音声符号化装 置。

JPH1198090A
CLAIM 7
【請求項7】 請求項1〜6の何れか一つに記載の音声 符号化装置において、前記LPFは、送信者による発声 を示す有声又は無発声を示す無声を識別するVOX(V oice Operated Transmitte r)機能を有すると共に、CELP(Code−boo k Excited Linear Predicti on)方式に基づいて無声に伴う雑音のスペクトル特性 (current residual spectrum) を音声に伴う雑音のスペクトル特性に近似して前記背景 雑音を生成出力することを特徴とする音声符号化装置。

JPH1198090A
CLAIM 13
【請求項13】 請求項11又は12記載の音声復号化 装置において、前記高能率符号化処理部は、雑音を含む 音声以外の情報 (update factor) に対して前記LPCパラメータによる合 成フィルタを通してコードブックの照合を行うことを特 徴とする音声復号化装置。

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum (スペクトル特性) comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
JPH1198090A
CLAIM 7
【請求項7】 請求項1〜6の何れか一つに記載の音声 符号化装置において、前記LPFは、送信者による発声 を示す有声又は無発声を示す無声を識別するVOX(V oice Operated Transmitte r)機能を有すると共に、CELP(Code−boo k Excited Linear Predicti on)方式に基づいて無声に伴う雑音のスペクトル特性 (current residual spectrum) を音声に伴う雑音のスペクトル特性に近似して前記背景 雑音を生成出力することを特徴とする音声符号化装置。

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator (フレーム) of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
JPH1198090A
CLAIM 9
【請求項9】 入力音声信号に対して有声・無声区間を 判別して有声・無声に応じた識別制御信号を出力する有 声・無声判別部と、前記入力音声信号に対して線形予測 分析法に基づいて合成フィルタのフィルタ係数を算出す ることでLPC(Linear Predictive Coding)パラメータを取得すると共に、該LP CパラメータをLSP(Line Spectrum Pair)パラメータに換算するLPC分析部と、前記 LPC分析部で算出される前記フィルタ係数の合成フィ ルタと前記LPCパラメータを符号化するLPC用コー ドブック及びその他の音声特性パラメータを符号化する 励振コードブックによる2種のコードブックを含むコー ドブック照合処理部とを有すると共に、該LPCパラメ ータに基づいた符号化処理として、該フィルタ係数を該 LPC用コードブックのパラメータで符号化した後、該 合成フィルタで該励振コードブックのベクトルをフィル タリングして前記入力音声信号との差を最小にするよう に該励振コードブックを検索するコードブック照合処理 を行って得られる符号化音声信号を出力する高能率符号 化処理部と、前記有声・無声の識別制御信号に応じたユ ニークワード制御信号を生成出力するユニークワード発 生部と、前記無声の識別制御信号に応じて前記無声時の LPCパラメータを所定のフレーム (tonal stability tonal stability estimator) フォーマットに符号 化変換した無声時パラメータ符号化信号を出力する無声 符号化変換部と、前記識別制御信号に応じて前記符号化 音声信号,前記ユニークワード制御信号,並びに前記無 声時パラメータ符号化信号による背景雑音を前記有声・ 無声別に出力符号化信号として切り替え送出するスイッ チ制御部とを備えたことを特徴とする音声符号化装置。

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal (平均化, 記復号) to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
JPH1198090A
CLAIM 11
【請求項11】 請求項1〜8の何れか一つに記載の音 声符号化装置からの出力符号化信号を入力符号化信号と して入力した上で復号化する音声復号化装置であって、 前記入力符号化信号に対して前記ユニークワード制御信 号に基づいて有声・無声を判別して有声・無声に応じた 制御信号を出力する有声・無声判別部と、前記制御信号 により無声のときに前記入力符号化信号の背景雑音を蓄 積する背景雑音更新部と、前記制御信号により有声のと きに前記入力符号化信号をLPC復号化するLPC復号 部と、前記LPC復号化に際してのLPCパラメータを 蓄積するLPC蓄積部と、前記LSPパラメータに基づ いて復号化処理を行って復号化音声信号又は雑音信号を 出力する高能率復号化処理部と、前記LPCパラメータ に基づいて無声のときに前記雑音信号を濾波して背景雑 音として出力するHPFと、有声のときに前記復号 (average signal, average frame) 化音 声信号,無声のときに前記背景雑音をそれぞれ切り替え 送出するスイッチ制御部とを備えたことを特徴とする音 声復号化装置。

JPH1198090A
CLAIM 16
【請求項16】 請求項15記載の音声復号化装置にお いて、前記背景雑音制御部は、前記背景雑音更新部によ る前記背景雑音の更新に際して前記音声符号化装置側か ら一定周期で送信される該背景雑音の1フレーム分に基 づいて直前に送られたデータとの間で平均化 (average signal, average frame) した平均値 を算出することを特徴とする音声復号化装置。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US5990405A

Filed: 1998-07-08     Issued: 1999-11-23

System and method for generating and controlling a simulated musical concert experience

(Original Assignee) Gibson Guitar Corp     (Current Assignee) Bank of America NA ; Gibson Brands Inc

Don R. Auten, Richard T. Akers, Richard Gembar
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (audio reproduction) using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US5990405A
CLAIM 29
. The apparatus of claim 28 further comprising a bypass circuit controlled by a bypass switch and operatively connected to the control circuit , the switch having a bypass position in which the bypass circuit inhibits generation of the controlled instrument track signal and allows audio reproduction (sound signal) of the audio signals generated by the musical instrument during playback of video track and the concert sound track .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal (audio reproduction) in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US5990405A
CLAIM 29
. The apparatus of claim 28 further comprising a bypass circuit controlled by a bypass switch and operatively connected to the control circuit , the switch having a bypass position in which the bypass circuit inhibits generation of the controlled instrument track signal and allows audio reproduction (sound signal) of the audio signals generated by the musical instrument during playback of video track and the concert sound track .

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum comprises locating a maximum between each pair of two consecutive minima (left channels) of the current residual spectrum .
US5990405A
CLAIM 21
. A system for allowing a player using a guitar to control simulated participation in a musical concert during synchronous playback of a pre-recorded concert video track , pre-recorded left and right concert sound tracks , and a separate pre-recorded guitar track , the system comprising : a . an audio/video playback device adapted to play the pre-recorded video track through a video source output in synchronization with playback of the pre-recorded left and right concert sound tracks through left and right channel source outputs and the pre-recorded guitar track through a guitar track source output ;
b . a video display connected to the video source output ;
c . an audio interface box having an instrument input connected to an instrument output on the guitar , an instrument audio output , a guitar track input , and a controlled guitar track output ;
d . an audio mixer having a mixer source input connected to the left and right channel source outputs and to the guitar track source output , a mixer instrument input connected to the instrument audio output , a guitar track output connected to the guitar track input on the interface box and adapted to output the pre-recorded guitar track , a controlled guitar track input connected to the controlled guitar track output ;
and a mixer audio output having right and left channels (two consecutive minima) , the mixer audio output providing a system audio signal responsive to instrument audio signals at the mixer instrument input , to the guitar track , and to the left and right sound tracks ;
e . left and right audio speakers connected to respective left and right channels of the mixer audio output ;
f . the interface box further comprising a guitar channel control circuit operable to control a signal level of the guitar track at the controlled guitar track output in response to variation in instrument audio signals generated at the instrument audio output when the guitar is played ;
and g . whereby the player can hear the left and right pre-recorded concert sound tracks and the guitar track while viewing the video track and can control a sound volume of the guitar track by playing the guitar .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins between two consecutive minima (left channels) in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US5990405A
CLAIM 21
. A system for allowing a player using a guitar to control simulated participation in a musical concert during synchronous playback of a pre-recorded concert video track , pre-recorded left and right concert sound tracks , and a separate pre-recorded guitar track , the system comprising : a . an audio/video playback device adapted to play the pre-recorded video track through a video source output in synchronization with playback of the pre-recorded left and right concert sound tracks through left and right channel source outputs and the pre-recorded guitar track through a guitar track source output ;
b . a video display connected to the video source output ;
c . an audio interface box having an instrument input connected to an instrument output on the guitar , an instrument audio output , a guitar track input , and a controlled guitar track output ;
d . an audio mixer having a mixer source input connected to the left and right channel source outputs and to the guitar track source output , a mixer instrument input connected to the instrument audio output , a guitar track output connected to the guitar track input on the interface box and adapted to output the pre-recorded guitar track , a controlled guitar track input connected to the controlled guitar track output ;
and a mixer audio output having right and left channels (two consecutive minima) , the mixer audio output providing a system audio signal responsive to instrument audio signals at the mixer instrument input , to the guitar track , and to the left and right sound tracks ;
e . left and right audio speakers connected to respective left and right channels of the mixer audio output ;
f . the interface box further comprising a guitar channel control circuit operable to control a signal level of the guitar track at the controlled guitar track output in response to variation in instrument audio signals generated at the instrument audio output when the guitar is played ;
and g . whereby the player can hear the left and right pre-recorded concert sound tracks and the guitar track while viewing the video track and can control a sound volume of the guitar track by playing the guitar .

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (audio reproduction) .
US5990405A
CLAIM 29
. The apparatus of claim 28 further comprising a bypass circuit controlled by a bypass switch and operatively connected to the control circuit , the switch having a bypass position in which the bypass circuit inhibits generation of the controlled instrument track signal and allows audio reproduction (sound signal) of the audio signals generated by the musical instrument during playback of video track and the concert sound track .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (audio reproduction) comprises searching in the correlation map for frequency bins having a magnitude that exceeds a given fixed threshold .
US5990405A
CLAIM 29
. The apparatus of claim 28 further comprising a bypass circuit controlled by a bypass switch and operatively connected to the control circuit , the switch having a bypass position in which the bypass circuit inhibits generation of the controlled instrument track signal and allows audio reproduction (sound signal) of the audio signals generated by the musical instrument during playback of video track and the concert sound track .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (audio reproduction) comprises comparing the summed long-term correlation map with an adaptive threshold (video source, back device) indicative of sound activity (signal level) in the sound signal .
US5990405A
CLAIM 1
. A system for electronically simulating participation by a user in a pre-recorded musical performance comprising : a . a musical instrument , the musical instrument generating an instrument audio signal at an instrument audio output , the instrument audio signal varying in response to operation of the instrument by the user of the system ;
b . a video source (adaptive threshold) providing a source video signal at a source video output , the source video signal representing a video portion of the pre-recorded musical performance ;
c . a video display responsive to the source video signal whereby the user can view the video portion of the pre-recorded musical performance on the video display ;
d . an audio source providing a source audio signal at a source audio output , the source audio signal representing an audio portion of the pre-recorded musical performance , the audio portion including an instrument sound track containing pre-recorded musical sounds that would be generated by the musical instrument in the pre-recorded musical performance ;
e . a system interface device having a first audio input electrically connected to the instrument audio output , a second audio input electrically connected to the source audio output , and a first interface audio output ;
f . the system interface device including a source audio control circuit responsive to the instrument audio signal , whereby a characteristic of the source audio signal is controlled in response to operation of the musical instrument by the user to provide a controlled source audio signal at the first interface audio output ;
and g . an audio playback transducer responsive to the controlled source audio signal such that the user can listen to the audio portion of the pre-recorded musical performance on the transducer , in synchronization with the video portion .

US5990405A
CLAIM 2
. The system of claim 1 whereby the characteristic of the source audio signal controlled by the source audio control circuit is a source audio signal level (sound activity, sound activity detection, detecting sound activity) .

US5990405A
CLAIM 13
. A system for simulating participation of a user playing a musical instrument in a pre-recorded musical performance having audio and video portions , the musical instrument producing instrument audio signals at an instrument audio output when the instrument is played , comprising : a . a source playback device (adaptive threshold) for playback of the audio and video portions of the pre-recorded musical performance through corresponding source audio and source video outputs ;
b . a source audio control device for controlling one or more characteristics of the audio portion of the pre-recorded musical performance during playback , the source audio control means operably connected to the source audio output and to the instrument audio output and having a controlled audio output ;
and c . the source audio control device is responsive to the instrument audio signals whereby at least one characteristic of the audio portion of the pre-recorded musical performance is controlled by playing of the musical instrument by the user .

US5990405A
CLAIM 29
. The apparatus of claim 28 further comprising a bypass circuit controlled by a bypass switch and operatively connected to the control circuit , the switch having a bypass position in which the bypass circuit inhibits generation of the controlled instrument track signal and allows audio reproduction (sound signal) of the audio signals generated by the musical instrument during playback of video track and the concert sound track .

US8990073B2
CLAIM 10
. A method for detecting sound activity (signal level) in a sound signal (audio reproduction) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US5990405A
CLAIM 2
. The system of claim 1 whereby the characteristic of the source audio signal controlled by the source audio control circuit is a source audio signal level (sound activity, sound activity detection, detecting sound activity) .

US5990405A
CLAIM 29
. The apparatus of claim 28 further comprising a bypass circuit controlled by a bypass switch and operatively connected to the control circuit , the switch having a bypass position in which the bypass circuit inhibits generation of the controlled instrument track signal and allows audio reproduction (sound signal) of the audio signals generated by the musical instrument during playback of video track and the concert sound track .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound signal (audio reproduction) is detected .
US5990405A
CLAIM 29
. The apparatus of claim 28 further comprising a bypass circuit controlled by a bypass switch and operatively connected to the control circuit , the switch having a bypass position in which the bypass circuit inhibits generation of the controlled instrument track signal and allows audio reproduction (sound signal) of the audio signals generated by the musical instrument during playback of video track and the concert sound track .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity (signal level) in the sound signal (audio reproduction) further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
US5990405A
CLAIM 2
. The system of claim 1 whereby the characteristic of the source audio signal controlled by the source audio control circuit is a source audio signal level (sound activity, sound activity detection, detecting sound activity) .

US5990405A
CLAIM 29
. The apparatus of claim 28 further comprising a bypass circuit controlled by a bypass switch and operatively connected to the control circuit , the switch having a bypass position in which the bypass circuit inhibits generation of the controlled instrument track signal and allows audio reproduction (sound signal) of the audio signals generated by the musical instrument during playback of video track and the concert sound track .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (signal level) detection comprises detecting the sound signal (audio reproduction) based on a frequency dependent signal-to-noise ratio (SNR) .
US5990405A
CLAIM 2
. The system of claim 1 whereby the characteristic of the source audio signal controlled by the source audio control circuit is a source audio signal level (sound activity, sound activity detection, detecting sound activity) .

US5990405A
CLAIM 29
. The apparatus of claim 28 further comprising a bypass circuit controlled by a bypass switch and operatively connected to the control circuit , the switch having a bypass position in which the bypass circuit inhibits generation of the controlled instrument track signal and allows audio reproduction (sound signal) of the audio signals generated by the musical instrument during playback of video track and the concert sound track .

US8990073B2
CLAIM 14
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (signal level) detection comprises comparing an average signal-to-noise ratio (SNR av ) to a threshold calculated as a function of a long-term signal-to-noise ratio (SNR LT ) .
US5990405A
CLAIM 2
. The system of claim 1 whereby the characteristic of the source audio signal controlled by the source audio control circuit is a source audio signal level (sound activity, sound activity detection, detecting sound activity) .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity (signal level) detection in the sound signal (audio reproduction) further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
US5990405A
CLAIM 2
. The system of claim 1 whereby the characteristic of the source audio signal controlled by the source audio control circuit is a source audio signal level (sound activity, sound activity detection, detecting sound activity) .

US5990405A
CLAIM 29
. The apparatus of claim 28 further comprising a bypass circuit controlled by a bypass switch and operatively connected to the control circuit , the switch having a bypass position in which the bypass circuit inhibits generation of the controlled instrument track signal and allows audio reproduction (sound signal) of the audio signals generated by the musical instrument during playback of video track and the concert sound track .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity (signal level) detection further comprises updating the noise estimates for a next frame (audio outputs) .
US5990405A
CLAIM 2
. The system of claim 1 whereby the characteristic of the source audio signal controlled by the source audio control circuit is a source audio signal level (sound activity, sound activity detection, detecting sound activity) .

US5990405A
CLAIM 20
. The system of claim 13 further comprising an audio mixer , the mixer operably connected between the source audio , instrument audio , and controlled audio outputs (next frame) and the left and right speakers .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame (audio outputs) comprises calculating an update decision based on at least one of a pitch stability (control device) , a voicing , a non-stationarity parameter of the sound signal (audio reproduction) and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
US5990405A
CLAIM 13
. A system for simulating participation of a user playing a musical instrument in a pre-recorded musical performance having audio and video portions , the musical instrument producing instrument audio signals at an instrument audio output when the instrument is played , comprising : a . a source playback device for playback of the audio and video portions of the pre-recorded musical performance through corresponding source audio and source video outputs ;
b . a source audio control device (pitch stability) for controlling one or more characteristics of the audio portion of the pre-recorded musical performance during playback , the source audio control means operably connected to the source audio output and to the instrument audio output and having a controlled audio output ;
and c . the source audio control device is responsive to the instrument audio signals whereby at least one characteristic of the audio portion of the pre-recorded musical performance is controlled by playing of the musical instrument by the user .

US5990405A
CLAIM 20
. The system of claim 13 further comprising an audio mixer , the mixer operably connected between the source audio , instrument audio , and controlled audio outputs (next frame) and the left and right speakers .

US5990405A
CLAIM 29
. The apparatus of claim 28 further comprising a bypass circuit controlled by a bypass switch and operatively connected to the control circuit , the switch having a bypass position in which the bypass circuit inhibits generation of the controlled instrument track signal and allows audio reproduction (sound signal) of the audio signals generated by the musical instrument during playback of video track and the concert sound track .

US8990073B2
CLAIM 18
. A method as defined in claim 14 , comprising classifying the sound signal (audio reproduction) as one of an inactive sound signal and active sound signal , which comprises determining an inactive sound signal when the average signal-to-noise ratio (SNR av ) is inferior to the calculated threshold .
US5990405A
CLAIM 29
. The apparatus of claim 28 further comprising a bypass circuit controlled by a bypass switch and operatively connected to the control circuit , the switch having a bypass position in which the bypass circuit inhibits generation of the controlled instrument track signal and allows audio reproduction (sound signal) of the audio signals generated by the musical instrument during playback of video track and the concert sound track .

US8990073B2
CLAIM 19
. A method as defined in claim 14 , comprising classifying the sound signal (audio reproduction) as one of an inactive sound signal and active sound signal , which comprises determining an active sound signal when the average signal-to-noise ratio (SNR av ) is larger than the calculated threshold .
US5990405A
CLAIM 29
. The apparatus of claim 28 further comprising a bypass circuit controlled by a bypass switch and operatively connected to the control circuit , the switch having a bypass position in which the bypass circuit inhibits generation of the controlled instrument track signal and allows audio reproduction (sound signal) of the audio signals generated by the musical instrument during playback of video track and the concert sound track .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal (audio reproduction) prevents updating of noise energy estimates when a music signal is detected .
US5990405A
CLAIM 29
. The apparatus of claim 28 further comprising a bypass circuit controlled by a bypass switch and operatively connected to the control circuit , the switch having a bypass position in which the bypass circuit inhibits generation of the controlled instrument track signal and allows audio reproduction (sound signal) of the audio signals generated by the musical instrument during playback of video track and the concert sound track .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal (audio reproduction) in a current frame and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US5990405A
CLAIM 29
. The apparatus of claim 28 further comprising a bypass circuit controlled by a bypass switch and operatively connected to the control circuit , the switch having a bypass position in which the bypass circuit inhibits generation of the controlled instrument track signal and allows audio reproduction (sound signal) of the audio signals generated by the musical instrument during playback of video track and the concert sound track .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter indicative of an activity of the sound signal (audio reproduction) .
US5990405A
CLAIM 29
. The apparatus of claim 28 further comprising a bypass circuit controlled by a bypass switch and operatively connected to the control circuit , the switch having a bypass position in which the bypass circuit inhibits generation of the controlled instrument track signal and allows audio reproduction (sound signal) of the audio signals generated by the musical instrument during playback of video track and the concert sound track .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal (audio reproduction) and the complementary non-stationarity parameter .
US5990405A
CLAIM 29
. The apparatus of claim 28 further comprising a bypass circuit controlled by a bypass switch and operatively connected to the control circuit , the switch having a bypass position in which the bypass circuit inhibits generation of the controlled instrument track signal and allows audio reproduction (sound signal) of the audio signals generated by the musical instrument during playback of video track and the concert sound track .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal (audio reproduction) using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US5990405A
CLAIM 29
. The apparatus of claim 28 further comprising a bypass circuit controlled by a bypass switch and operatively connected to the control circuit , the switch having a bypass position in which the bypass circuit inhibits generation of the controlled instrument track signal and allows audio reproduction (sound signal) of the audio signals generated by the musical instrument during playback of video track and the concert sound track .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal (audio reproduction) using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US5990405A
CLAIM 29
. The apparatus of claim 28 further comprising a bypass circuit controlled by a bypass switch and operatively connected to the control circuit , the switch having a bypass position in which the bypass circuit inhibits generation of the controlled instrument track signal and allows audio reproduction (sound signal) of the audio signals generated by the musical instrument during playback of video track and the concert sound track .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal (audio reproduction) in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US5990405A
CLAIM 29
. The apparatus of claim 28 further comprising a bypass circuit controlled by a bypass switch and operatively connected to the control circuit , the switch having a bypass position in which the bypass circuit inhibits generation of the controlled instrument track signal and allows audio reproduction (sound signal) of the audio signals generated by the musical instrument during playback of video track and the concert sound track .

US8990073B2
CLAIM 34
. A device as defined in claim 31 , further comprising a detector of strong tones in the sound signal (audio reproduction) .
US5990405A
CLAIM 29
. The apparatus of claim 28 further comprising a bypass circuit controlled by a bypass switch and operatively connected to the control circuit , the switch having a bypass position in which the bypass circuit inhibits generation of the controlled instrument track signal and allows audio reproduction (sound signal) of the audio signals generated by the musical instrument during playback of video track and the concert sound track .

US8990073B2
CLAIM 35
. A device for detecting sound activity (signal level) in a sound signal (audio reproduction) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US5990405A
CLAIM 2
. The system of claim 1 whereby the characteristic of the source audio signal controlled by the source audio control circuit is a source audio signal level (sound activity, sound activity detection, detecting sound activity) .

US5990405A
CLAIM 29
. The apparatus of claim 28 further comprising a bypass circuit controlled by a bypass switch and operatively connected to the control circuit , the switch having a bypass position in which the bypass circuit inhibits generation of the controlled instrument track signal and allows audio reproduction (sound signal) of the audio signals generated by the musical instrument during playback of video track and the concert sound track .

US8990073B2
CLAIM 36
. A device for detecting sound activity (signal level) in a sound signal (audio reproduction) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US5990405A
CLAIM 2
. The system of claim 1 whereby the characteristic of the source audio signal controlled by the source audio control circuit is a source audio signal level (sound activity, sound activity detection, detecting sound activity) .

US5990405A
CLAIM 29
. The apparatus of claim 28 further comprising a bypass circuit controlled by a bypass switch and operatively connected to the control circuit , the switch having a bypass position in which the bypass circuit inhibits generation of the controlled instrument track signal and allows audio reproduction (sound signal) of the audio signals generated by the musical instrument during playback of video track and the concert sound track .

US8990073B2
CLAIM 37
. A device as defined in claim 36 , further comprising a signal-to-noise ratio (SNR)-based sound activity (signal level) detector .
US5990405A
CLAIM 2
. The system of claim 1 whereby the characteristic of the source audio signal controlled by the source audio control circuit is a source audio signal level (sound activity, sound activity detection, detecting sound activity) .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity (signal level) detector comprises a comparator of an average signal to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US5990405A
CLAIM 2
. The system of claim 1 whereby the characteristic of the source audio signal controlled by the source audio control circuit is a source audio signal level (sound activity, sound activity detection, detecting sound activity) .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity (signal level) detector .
US5990405A
CLAIM 2
. The system of claim 1 whereby the characteristic of the source audio signal controlled by the source audio control circuit is a source audio signal level (sound activity, sound activity detection, detecting sound activity) .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal (audio reproduction) for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates .
US5990405A
CLAIM 29
. The apparatus of claim 28 further comprising a bypass circuit controlled by a bypass switch and operatively connected to the control circuit , the switch having a bypass position in which the bypass circuit inhibits generation of the controlled instrument track signal and allows audio reproduction (sound signal) of the audio signals generated by the musical instrument during playback of video track and the concert sound track .

US8990073B2
CLAIM 41
. A device as defined in claim 36 , further comprising a calculator of a spectral parameter used for detecting spectral changes and spectral attacks in the sound signal (audio reproduction) .
US5990405A
CLAIM 29
. The apparatus of claim 28 further comprising a bypass circuit controlled by a bypass switch and operatively connected to the control circuit , the switch having a bypass position in which the bypass circuit inhibits generation of the controlled instrument track signal and allows audio reproduction (sound signal) of the audio signals generated by the musical instrument during playback of video track and the concert sound track .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6137349A

Filed: 1998-07-02     Issued: 2000-10-24

Filter combination for sampling rate conversion

(Original Assignee) TDK Micronas GmbH     (Current Assignee) TDK Micronas GmbH

Andreas Menkhoff, Herbert Alrutz
US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis (sampling rate) ;

and summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US6137349A
CLAIM 1
. A filter combination circuit for performing a sampling rate (frequency bin basis) conversion of an input data sequence (d1) into an output data sequence , comprising : an input low-pass filter (1) receiving said input data sequence (d1) for outputting a first data sequence (d4) in response to a digitization clock frequency of a clock signal (f1) , said input low-pass filter having an attenuation characteristic (tp1) of at least one first attenuation value (a1) substantially corresponding to between one-half and 1 . 5 times digitization clock frequency (f1) ;
a time-invariant interpolation filter (2) responsive to said first data sequence (d4) of said input low-pass filter and said digitization clock frequency (fi) for increasing a number of samples from that of the input data sequence (d1) by an integral factor to provide a second data sequence (d5) , and having an attenuation characteristic (tp2) comprising at least one second attenuation value (a2) substantially corresponding to the frequency of the digitization clock (f1) and at least one third attenuation value (a3) substantially corresponding to the between one-half and 1 . 5 times the frequency of the digitization clock ;
and a time-varying interpolation filter (3) for interpolating said second data sequence (d5) provided at an output of the time-invariant interpolation filter (2) , in response to said clock signal for suppressing signal components at twice the frequency of the digitization clock (f1) to provide said output data sequence , said time-varying filter having an attenuation characteristic (tp3) of at least one fourth attenuation value (a4) substantially corresponding to twice the frequency of the clock signal (f1) ;
wherein , said input low pass filter (1) , time-invariant interpolation filter (2) and time-variant interpolation filter (3) are coupled in series .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order (second order) and a sixteenth order of linear prediction residual error energies .
US6137349A
CLAIM 10
. A method for performing a sampling rate conversion of an input data sequence (d1) into an output data sequence (d2) comprising the steps of : low-pass filtering said input data sequence (d1) using a second order (second order) low-pass filter having a transfer function hz=(1-z -1) 2 and having an attenuation characteristic of at least one first attenuation value substantially corresponding to between one half and 1 . 5 times a sampling frequency (f1) of a first clock signal to provide a first data sequence (d4) ;
doubling the number of samples of said input data sequence (d1) by filtering said first data sequence (d4) using a time invariant interpolation filter to provide a second data sequence (d5) corresponding to said first clock signal ;
performing a time-varying interpolation of said second data sequence (d5) using a linear interpolator having at least one attenuation value (a4) substantially corresponding to twice the sampling frequency (f1) to produce a third data sequence (d6) ;
reducing the sampling rate of said third data sequence (d6) by an integral factor greater than 1 to produce a fourth data sequence (d3) , and buffering said fourth data sequence (d3) according to said sampling frequency (f1) and a second clock signal (f2) for providing said output data sequence at a sampling rate corresponding to said second clock signal (f2) .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (said time) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
US6137349A
CLAIM 1
. A filter combination circuit for performing a sampling rate conversion of an input data sequence (d1) into an output data sequence , comprising : an input low-pass filter (1) receiving said input data sequence (d1) for outputting a first data sequence (d4) in response to a digitization clock frequency of a clock signal (f1) , said input low-pass filter having an attenuation characteristic (tp1) of at least one first attenuation value (a1) substantially corresponding to between one-half and 1 . 5 times digitization clock frequency (f1) ;
a time-invariant interpolation filter (2) responsive to said first data sequence (d4) of said input low-pass filter and said digitization clock frequency (fi) for increasing a number of samples from that of the input data sequence (d1) by an integral factor to provide a second data sequence (d5) , and having an attenuation characteristic (tp2) comprising at least one second attenuation value (a2) substantially corresponding to the frequency of the digitization clock (f1) and at least one third attenuation value (a3) substantially corresponding to the between one-half and 1 . 5 times the frequency of the digitization clock ;
and a time-varying interpolation filter (3) for interpolating said second data sequence (d5) provided at an output of the time-invariant interpolation filter (2) , in response to said clock signal for suppressing signal components at twice the frequency of the digitization clock (f1) to provide said output data sequence , said time (noise character parameter) -varying filter having an attenuation characteristic (tp3) of at least one fourth attenuation value (a4) substantially corresponding to twice the frequency of the clock signal (f1) ;
wherein , said input low pass filter (1) , time-invariant interpolation filter (2) and time-variant interpolation filter (3) are coupled in series .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame energy (input low pass filter, fourth data) and an average frame energy .
US6137349A
CLAIM 1
. A filter combination circuit for performing a sampling rate conversion of an input data sequence (d1) into an output data sequence , comprising : an input low-pass filter (1) receiving said input data sequence (d1) for outputting a first data sequence (d4) in response to a digitization clock frequency of a clock signal (f1) , said input low-pass filter having an attenuation characteristic (tp1) of at least one first attenuation value (a1) substantially corresponding to between one-half and 1 . 5 times digitization clock frequency (f1) ;
a time-invariant interpolation filter (2) responsive to said first data sequence (d4) of said input low-pass filter and said digitization clock frequency (fi) for increasing a number of samples from that of the input data sequence (d1) by an integral factor to provide a second data sequence (d5) , and having an attenuation characteristic (tp2) comprising at least one second attenuation value (a2) substantially corresponding to the frequency of the digitization clock (f1) and at least one third attenuation value (a3) substantially corresponding to the between one-half and 1 . 5 times the frequency of the digitization clock ;
and a time-varying interpolation filter (3) for interpolating said second data sequence (d5) provided at an output of the time-invariant interpolation filter (2) , in response to said clock signal for suppressing signal components at twice the frequency of the digitization clock (f1) to provide said output data sequence , said time-varying filter having an attenuation characteristic (tp3) of at least one fourth attenuation value (a4) substantially corresponding to twice the frequency of the clock signal (f1) ;
wherein , said input low pass filter (second energy, current frame energy, second energy values) (1) , time-invariant interpolation filter (2) and time-variant interpolation filter (3) are coupled in series .

US6137349A
CLAIM 10
. A method for performing a sampling rate conversion of an input data sequence (d1) into an output data sequence (d2) comprising the steps of : low-pass filtering said input data sequence (d1) using a second order low-pass filter having a transfer function hz=(1-z -1) 2 and having an attenuation characteristic of at least one first attenuation value substantially corresponding to between one half and 1 . 5 times a sampling frequency (f1) of a first clock signal to provide a first data sequence (d4) ;
doubling the number of samples of said input data sequence (d1) by filtering said first data sequence (d4) using a time invariant interpolation filter to provide a second data sequence (d5) corresponding to said first clock signal ;
performing a time-varying interpolation of said second data sequence (d5) using a linear interpolator having at least one attenuation value (a4) substantially corresponding to twice the sampling frequency (f1) to produce a third data sequence (d6) ;
reducing the sampling rate of said third data sequence (d6) by an integral factor greater than 1 to produce a fourth data (second energy, current frame energy, second energy values) sequence (d3) , and buffering said fourth data sequence (d3) according to said sampling frequency (f1) and a second clock signal (f2) for providing said output data sequence at a sampling rate corresponding to said second clock signal (f2) .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (said time) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency (one second) bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy (input low pass filter, fourth data) value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US6137349A
CLAIM 1
. A filter combination circuit for performing a sampling rate conversion of an input data sequence (d1) into an output data sequence , comprising : an input low-pass filter (1) receiving said input data sequence (d1) for outputting a first data sequence (d4) in response to a digitization clock frequency of a clock signal (f1) , said input low-pass filter having an attenuation characteristic (tp1) of at least one first attenuation value (a1) substantially corresponding to between one-half and 1 . 5 times digitization clock frequency (f1) ;
a time-invariant interpolation filter (2) responsive to said first data sequence (d4) of said input low-pass filter and said digitization clock frequency (fi) for increasing a number of samples from that of the input data sequence (d1) by an integral factor to provide a second data sequence (d5) , and having an attenuation characteristic (tp2) comprising at least one second (first frequency) attenuation value (a2) substantially corresponding to the frequency of the digitization clock (f1) and at least one third attenuation value (a3) substantially corresponding to the between one-half and 1 . 5 times the frequency of the digitization clock ;
and a time-varying interpolation filter (3) for interpolating said second data sequence (d5) provided at an output of the time-invariant interpolation filter (2) , in response to said clock signal for suppressing signal components at twice the frequency of the digitization clock (f1) to provide said output data sequence , said time (noise character parameter) -varying filter having an attenuation characteristic (tp3) of at least one fourth attenuation value (a4) substantially corresponding to twice the frequency of the clock signal (f1) ;
wherein , said input low pass filter (second energy, current frame energy, second energy values) (1) , time-invariant interpolation filter (2) and time-variant interpolation filter (3) are coupled in series .

US6137349A
CLAIM 10
. A method for performing a sampling rate conversion of an input data sequence (d1) into an output data sequence (d2) comprising the steps of : low-pass filtering said input data sequence (d1) using a second order low-pass filter having a transfer function hz=(1-z -1) 2 and having an attenuation characteristic of at least one first attenuation value substantially corresponding to between one half and 1 . 5 times a sampling frequency (f1) of a first clock signal to provide a first data sequence (d4) ;
doubling the number of samples of said input data sequence (d1) by filtering said first data sequence (d4) using a time invariant interpolation filter to provide a second data sequence (d5) corresponding to said first clock signal ;
performing a time-varying interpolation of said second data sequence (d5) using a linear interpolator having at least one attenuation value (a4) substantially corresponding to twice the sampling frequency (f1) to produce a third data sequence (d6) ;
reducing the sampling rate of said third data sequence (d6) by an integral factor greater than 1 to produce a fourth data (second energy, current frame energy, second energy values) sequence (d3) , and buffering said fourth data sequence (d3) according to said sampling frequency (f1) and a second clock signal (f2) for providing said output data sequence at a sampling rate corresponding to said second clock signal (f2) .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (said time) inferior than a given fixed threshold .
US6137349A
CLAIM 1
. A filter combination circuit for performing a sampling rate conversion of an input data sequence (d1) into an output data sequence , comprising : an input low-pass filter (1) receiving said input data sequence (d1) for outputting a first data sequence (d4) in response to a digitization clock frequency of a clock signal (f1) , said input low-pass filter having an attenuation characteristic (tp1) of at least one first attenuation value (a1) substantially corresponding to between one-half and 1 . 5 times digitization clock frequency (f1) ;
a time-invariant interpolation filter (2) responsive to said first data sequence (d4) of said input low-pass filter and said digitization clock frequency (fi) for increasing a number of samples from that of the input data sequence (d1) by an integral factor to provide a second data sequence (d5) , and having an attenuation characteristic (tp2) comprising at least one second attenuation value (a2) substantially corresponding to the frequency of the digitization clock (f1) and at least one third attenuation value (a3) substantially corresponding to the between one-half and 1 . 5 times the frequency of the digitization clock ;
and a time-varying interpolation filter (3) for interpolating said second data sequence (d5) provided at an output of the time-invariant interpolation filter (2) , in response to said clock signal for suppressing signal components at twice the frequency of the digitization clock (f1) to provide said output data sequence , said time (noise character parameter) -varying filter having an attenuation characteristic (tp3) of at least one fourth attenuation value (a4) substantially corresponding to twice the frequency of the clock signal (f1) ;
wherein , said input low pass filter (1) , time-invariant interpolation filter (2) and time-variant interpolation filter (3) are coupled in series .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis (sampling rate) ;

and an adder for summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US6137349A
CLAIM 1
. A filter combination circuit for performing a sampling rate (frequency bin basis) conversion of an input data sequence (d1) into an output data sequence , comprising : an input low-pass filter (1) receiving said input data sequence (d1) for outputting a first data sequence (d4) in response to a digitization clock frequency of a clock signal (f1) , said input low-pass filter having an attenuation characteristic (tp1) of at least one first attenuation value (a1) substantially corresponding to between one-half and 1 . 5 times digitization clock frequency (f1) ;
a time-invariant interpolation filter (2) responsive to said first data sequence (d4) of said input low-pass filter and said digitization clock frequency (fi) for increasing a number of samples from that of the input data sequence (d1) by an integral factor to provide a second data sequence (d5) , and having an attenuation characteristic (tp2) comprising at least one second attenuation value (a2) substantially corresponding to the frequency of the digitization clock (f1) and at least one third attenuation value (a3) substantially corresponding to the between one-half and 1 . 5 times the frequency of the digitization clock ;
and a time-varying interpolation filter (3) for interpolating said second data sequence (d5) provided at an output of the time-invariant interpolation filter (2) , in response to said clock signal for suppressing signal components at twice the frequency of the digitization clock (f1) to provide said output data sequence , said time-varying filter having an attenuation characteristic (tp3) of at least one fourth attenuation value (a4) substantially corresponding to twice the frequency of the clock signal (f1) ;
wherein , said input low pass filter (1) , time-invariant interpolation filter (2) and time-variant interpolation filter (3) are coupled in series .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal (one half) to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US6137349A
CLAIM 10
. A method for performing a sampling rate conversion of an input data sequence (d1) into an output data sequence (d2) comprising the steps of : low-pass filtering said input data sequence (d1) using a second order low-pass filter having a transfer function hz=(1-z -1) 2 and having an attenuation characteristic of at least one first attenuation value substantially corresponding to between one half (average signal) and 1 . 5 times a sampling frequency (f1) of a first clock signal to provide a first data sequence (d4) ;
doubling the number of samples of said input data sequence (d1) by filtering said first data sequence (d4) using a time invariant interpolation filter to provide a second data sequence (d5) corresponding to said first clock signal ;
performing a time-varying interpolation of said second data sequence (d5) using a linear interpolator having at least one attenuation value (a4) substantially corresponding to twice the sampling frequency (f1) to produce a third data sequence (d6) ;
reducing the sampling rate of said third data sequence (d6) by an integral factor greater than 1 to produce a fourth data sequence (d3) , and buffering said fourth data sequence (d3) according to said sampling frequency (f1) and a second clock signal (f2) for providing said output data sequence at a sampling rate corresponding to said second clock signal (f2) .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6061456A

Filed: 1998-06-03     Issued: 2000-05-09

Noise cancellation apparatus

(Original Assignee) Andrea Electronics Corp     (Current Assignee) Andrea Electronics Corp

Douglas Andrea, Martin Topf
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (electric signals) using a frequency spectrum (filter characteristic, phase response) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US6061456A
CLAIM 4
. An active noise reduction apparatus for reducing ambient noise in the vicinity of an eardrum , comprising : a housing for receiving an input audio signal ;
an output transducer located in the housing ;
an input transducer for detecting and reducing ambient noise , the input transducer not located in substantially the same plane as the output transducer ;
signal processing means to reduce the ambient noise detected by the input transducer ;
an acoustic means including an acoustic waveguide for transmitting the input audio signal to the eardrum without disturbance of the ambient noise and having low pass filter characteristic (frequency spectrum) s with a zero phase shift over a desired bandwidth to isolate the output transducer from the input transducer for channeling the input audio signal representing substantial speech between the output transducer and the eardrum , wherein a quiet zone is created to isolate sound transmitted from the input transducer .

US6061456A
CLAIM 6
. An active noise reduction apparatus comprising : a housing having an earphone ;
microphone means mounted in the earphone facing towards an ear of a user for detecting unwanted ambient noise ;
means to convert the noise to electric signals (sound signal, sound signal prevents updating) ;
phase shifting and attenuation means connected to the microphone to provide an inverted anti-noise signal ;
an output transducer substantially out of plane with the microphone means for transmitting the audio signal to the user' ;
s ear ;
means for preventing mechanical vibration induced low frequency disturbances from being transmitted to the output transducer ;
an acoustic waveguide isolating the microphone means from the output transducer for creating a quiet zone in close proximity to the output transducer and thereby excluding the unwanted ambient noise from reaching the user' ;
s ear .

US6061456A
CLAIM 14
. A method for calibrating an active noise reduction apparatus including a housing comprising a speaker to produce an acoustic anti-noise signal in the housing , a microphone to detect an external noise signal , and an amplitude adjustment means to calibrate the acoustic anti-noise signal to create a quiet zone in the housing for operation with an independent electrical assembly , wherein the apparatus is calibrated separately from the electrical assembly , the method comprising the steps of : inputting the external noise signal received by the microphone to produce an anti-noise signal ;
transmitting to the speaker the anti-noise signal having an equal gain and opposite phase response (frequency spectrum) to the external noise signal detected by the microphone ;
and balancing the gain and phase response of the anti-noise signal by the amplitude adjustment means located in the noise reduction apparatus to match the gain and phase response of the external noise signal to yield a theoretical zero in the quiet zone .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (filter characteristic, phase response) of the sound signal (electric signals) in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US6061456A
CLAIM 4
. An active noise reduction apparatus for reducing ambient noise in the vicinity of an eardrum , comprising : a housing for receiving an input audio signal ;
an output transducer located in the housing ;
an input transducer for detecting and reducing ambient noise , the input transducer not located in substantially the same plane as the output transducer ;
signal processing means to reduce the ambient noise detected by the input transducer ;
an acoustic means including an acoustic waveguide for transmitting the input audio signal to the eardrum without disturbance of the ambient noise and having low pass filter characteristic (frequency spectrum) s with a zero phase shift over a desired bandwidth to isolate the output transducer from the input transducer for channeling the input audio signal representing substantial speech between the output transducer and the eardrum , wherein a quiet zone is created to isolate sound transmitted from the input transducer .

US6061456A
CLAIM 6
. An active noise reduction apparatus comprising : a housing having an earphone ;
microphone means mounted in the earphone facing towards an ear of a user for detecting unwanted ambient noise ;
means to convert the noise to electric signals (sound signal, sound signal prevents updating) ;
phase shifting and attenuation means connected to the microphone to provide an inverted anti-noise signal ;
an output transducer substantially out of plane with the microphone means for transmitting the audio signal to the user' ;
s ear ;
means for preventing mechanical vibration induced low frequency disturbances from being transmitted to the output transducer ;
an acoustic waveguide isolating the microphone means from the output transducer for creating a quiet zone in close proximity to the output transducer and thereby excluding the unwanted ambient noise from reaching the user' ;
s ear .

US6061456A
CLAIM 14
. A method for calibrating an active noise reduction apparatus including a housing comprising a speaker to produce an acoustic anti-noise signal in the housing , a microphone to detect an external noise signal , and an amplitude adjustment means to calibrate the acoustic anti-noise signal to create a quiet zone in the housing for operation with an independent electrical assembly , wherein the apparatus is calibrated separately from the electrical assembly , the method comprising the steps of : inputting the external noise signal received by the microphone to produce an anti-noise signal ;
transmitting to the speaker the anti-noise signal having an equal gain and opposite phase response (frequency spectrum) to the external noise signal detected by the microphone ;
and balancing the gain and phase response of the anti-noise signal by the amplitude adjustment means located in the noise reduction apparatus to match the gain and phase response of the external noise signal to yield a theoretical zero in the quiet zone .

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (electric signals) .
US6061456A
CLAIM 6
. An active noise reduction apparatus comprising : a housing having an earphone ;
microphone means mounted in the earphone facing towards an ear of a user for detecting unwanted ambient noise ;
means to convert the noise to electric signals (sound signal, sound signal prevents updating) ;
phase shifting and attenuation means connected to the microphone to provide an inverted anti-noise signal ;
an output transducer substantially out of plane with the microphone means for transmitting the audio signal to the user' ;
s ear ;
means for preventing mechanical vibration induced low frequency disturbances from being transmitted to the output transducer ;
an acoustic waveguide isolating the microphone means from the output transducer for creating a quiet zone in close proximity to the output transducer and thereby excluding the unwanted ambient noise from reaching the user' ;
s ear .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (electric signals) comprises searching in the correlation map for frequency bins having a magnitude that exceeds a given fixed threshold .
US6061456A
CLAIM 6
. An active noise reduction apparatus comprising : a housing having an earphone ;
microphone means mounted in the earphone facing towards an ear of a user for detecting unwanted ambient noise ;
means to convert the noise to electric signals (sound signal, sound signal prevents updating) ;
phase shifting and attenuation means connected to the microphone to provide an inverted anti-noise signal ;
an output transducer substantially out of plane with the microphone means for transmitting the audio signal to the user' ;
s ear ;
means for preventing mechanical vibration induced low frequency disturbances from being transmitted to the output transducer ;
an acoustic waveguide isolating the microphone means from the output transducer for creating a quiet zone in close proximity to the output transducer and thereby excluding the unwanted ambient noise from reaching the user' ;
s ear .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (electric signals) comprises comparing the summed long-term correlation map with an adaptive threshold indicative of sound activity in the sound signal .
US6061456A
CLAIM 6
. An active noise reduction apparatus comprising : a housing having an earphone ;
microphone means mounted in the earphone facing towards an ear of a user for detecting unwanted ambient noise ;
means to convert the noise to electric signals (sound signal, sound signal prevents updating) ;
phase shifting and attenuation means connected to the microphone to provide an inverted anti-noise signal ;
an output transducer substantially out of plane with the microphone means for transmitting the audio signal to the user' ;
s ear ;
means for preventing mechanical vibration induced low frequency disturbances from being transmitted to the output transducer ;
an acoustic waveguide isolating the microphone means from the output transducer for creating a quiet zone in close proximity to the output transducer and thereby excluding the unwanted ambient noise from reaching the user' ;
s ear .

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal (electric signals) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US6061456A
CLAIM 6
. An active noise reduction apparatus comprising : a housing having an earphone ;
microphone means mounted in the earphone facing towards an ear of a user for detecting unwanted ambient noise ;
means to convert the noise to electric signals (sound signal, sound signal prevents updating) ;
phase shifting and attenuation means connected to the microphone to provide an inverted anti-noise signal ;
an output transducer substantially out of plane with the microphone means for transmitting the audio signal to the user' ;
s ear ;
means for preventing mechanical vibration induced low frequency disturbances from being transmitted to the output transducer ;
an acoustic waveguide isolating the microphone means from the output transducer for creating a quiet zone in close proximity to the output transducer and thereby excluding the unwanted ambient noise from reaching the user' ;
s ear .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound signal (electric signals) is detected .
US6061456A
CLAIM 6
. An active noise reduction apparatus comprising : a housing having an earphone ;
microphone means mounted in the earphone facing towards an ear of a user for detecting unwanted ambient noise ;
means to convert the noise to electric signals (sound signal, sound signal prevents updating) ;
phase shifting and attenuation means connected to the microphone to provide an inverted anti-noise signal ;
an output transducer substantially out of plane with the microphone means for transmitting the audio signal to the user' ;
s ear ;
means for preventing mechanical vibration induced low frequency disturbances from being transmitted to the output transducer ;
an acoustic waveguide isolating the microphone means from the output transducer for creating a quiet zone in close proximity to the output transducer and thereby excluding the unwanted ambient noise from reaching the user' ;
s ear .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity in the sound signal (electric signals) further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
US6061456A
CLAIM 6
. An active noise reduction apparatus comprising : a housing having an earphone ;
microphone means mounted in the earphone facing towards an ear of a user for detecting unwanted ambient noise ;
means to convert the noise to electric signals (sound signal, sound signal prevents updating) ;
phase shifting and attenuation means connected to the microphone to provide an inverted anti-noise signal ;
an output transducer substantially out of plane with the microphone means for transmitting the audio signal to the user' ;
s ear ;
means for preventing mechanical vibration induced low frequency disturbances from being transmitted to the output transducer ;
an acoustic waveguide isolating the microphone means from the output transducer for creating a quiet zone in close proximity to the output transducer and thereby excluding the unwanted ambient noise from reaching the user' ;
s ear .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection comprises detecting the sound signal (electric signals) based on a frequency dependent signal-to-noise ratio (SNR) .
US6061456A
CLAIM 6
. An active noise reduction apparatus comprising : a housing having an earphone ;
microphone means mounted in the earphone facing towards an ear of a user for detecting unwanted ambient noise ;
means to convert the noise to electric signals (sound signal, sound signal prevents updating) ;
phase shifting and attenuation means connected to the microphone to provide an inverted anti-noise signal ;
an output transducer substantially out of plane with the microphone means for transmitting the audio signal to the user' ;
s ear ;
means for preventing mechanical vibration induced low frequency disturbances from being transmitted to the output transducer ;
an acoustic waveguide isolating the microphone means from the output transducer for creating a quiet zone in close proximity to the output transducer and thereby excluding the unwanted ambient noise from reaching the user' ;
s ear .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal (electric signals) further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
US6061456A
CLAIM 6
. An active noise reduction apparatus comprising : a housing having an earphone ;
microphone means mounted in the earphone facing towards an ear of a user for detecting unwanted ambient noise ;
means to convert the noise to electric signals (sound signal, sound signal prevents updating) ;
phase shifting and attenuation means connected to the microphone to provide an inverted anti-noise signal ;
an output transducer substantially out of plane with the microphone means for transmitting the audio signal to the user' ;
s ear ;
means for preventing mechanical vibration induced low frequency disturbances from being transmitted to the output transducer ;
an acoustic waveguide isolating the microphone means from the output transducer for creating a quiet zone in close proximity to the output transducer and thereby excluding the unwanted ambient noise from reaching the user' ;
s ear .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal (electric signals) and a ratio between a second order and a sixteenth order of linear prediction (ambient noise) residual error energies .
US6061456A
CLAIM 4
. An active noise reduction apparatus for reducing ambient noise (linear prediction) in the vicinity of an eardrum , comprising : a housing for receiving an input audio signal ;
an output transducer located in the housing ;
an input transducer for detecting and reducing ambient noise , the input transducer not located in substantially the same plane as the output transducer ;
signal processing means to reduce the ambient noise detected by the input transducer ;
an acoustic means including an acoustic waveguide for transmitting the input audio signal to the eardrum without disturbance of the ambient noise and having low pass filter characteristics with a zero phase shift over a desired bandwidth to isolate the output transducer from the input transducer for channeling the input audio signal representing substantial speech between the output transducer and the eardrum , wherein a quiet zone is created to isolate sound transmitted from the input transducer .

US6061456A
CLAIM 6
. An active noise reduction apparatus comprising : a housing having an earphone ;
microphone means mounted in the earphone facing towards an ear of a user for detecting unwanted ambient noise ;
means to convert the noise to electric signals (sound signal, sound signal prevents updating) ;
phase shifting and attenuation means connected to the microphone to provide an inverted anti-noise signal ;
an output transducer substantially out of plane with the microphone means for transmitting the audio signal to the user' ;
s ear ;
means for preventing mechanical vibration induced low frequency disturbances from being transmitted to the output transducer ;
an acoustic waveguide isolating the microphone means from the output transducer for creating a quiet zone in close proximity to the output transducer and thereby excluding the unwanted ambient noise from reaching the user' ;
s ear .

US8990073B2
CLAIM 18
. A method as defined in claim 14 , comprising classifying the sound signal (electric signals) as one of an inactive sound signal and active sound signal , which comprises determining an inactive sound signal when the average signal-to-noise ratio (SNR av ) is inferior to the calculated threshold .
US6061456A
CLAIM 6
. An active noise reduction apparatus comprising : a housing having an earphone ;
microphone means mounted in the earphone facing towards an ear of a user for detecting unwanted ambient noise ;
means to convert the noise to electric signals (sound signal, sound signal prevents updating) ;
phase shifting and attenuation means connected to the microphone to provide an inverted anti-noise signal ;
an output transducer substantially out of plane with the microphone means for transmitting the audio signal to the user' ;
s ear ;
means for preventing mechanical vibration induced low frequency disturbances from being transmitted to the output transducer ;
an acoustic waveguide isolating the microphone means from the output transducer for creating a quiet zone in close proximity to the output transducer and thereby excluding the unwanted ambient noise from reaching the user' ;
s ear .

US8990073B2
CLAIM 19
. A method as defined in claim 14 , comprising classifying the sound signal (electric signals) as one of an inactive sound signal and active sound signal , which comprises determining an active sound signal when the average signal-to-noise ratio (SNR av ) is larger than the calculated threshold .
US6061456A
CLAIM 6
. An active noise reduction apparatus comprising : a housing having an earphone ;
microphone means mounted in the earphone facing towards an ear of a user for detecting unwanted ambient noise ;
means to convert the noise to electric signals (sound signal, sound signal prevents updating) ;
phase shifting and attenuation means connected to the microphone to provide an inverted anti-noise signal ;
an output transducer substantially out of plane with the microphone means for transmitting the audio signal to the user' ;
s ear ;
means for preventing mechanical vibration induced low frequency disturbances from being transmitted to the output transducer ;
an acoustic waveguide isolating the microphone means from the output transducer for creating a quiet zone in close proximity to the output transducer and thereby excluding the unwanted ambient noise from reaching the user' ;
s ear .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal (electric signals) prevents updating of noise energy estimates when a music signal is detected .
US6061456A
CLAIM 6
. An active noise reduction apparatus comprising : a housing having an earphone ;
microphone means mounted in the earphone facing towards an ear of a user for detecting unwanted ambient noise ;
means to convert the noise to electric signals (sound signal, sound signal prevents updating) ;
phase shifting and attenuation means connected to the microphone to provide an inverted anti-noise signal ;
an output transducer substantially out of plane with the microphone means for transmitting the audio signal to the user' ;
s ear ;
means for preventing mechanical vibration induced low frequency disturbances from being transmitted to the output transducer ;
an acoustic waveguide isolating the microphone means from the output transducer for creating a quiet zone in close proximity to the output transducer and thereby excluding the unwanted ambient noise from reaching the user' ;
s ear .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character (reduction system) parameter in order to distinguish a music signal from a background noise signal and prevent update (determined angle) of noise energy estimates on the music signal .
US6061456A
CLAIM 1
. A transducer for use in a noise cancellation apparatus for reducing background noise comprising : a housing having first microphone means for receiving a first acoustic sound composed of speech originating from an operator operating said apparatus and background noise , and for converting said first acoustic sound to a first signal , and second microphone means arranged at a predetermined angle (prevent update) . O slashed . in close proximity with respect to said first microphone means for receiving a second acoustic sound composed of substantially said background noise and for converting said second acoustic sound to a second signal ;
said first and second microphones are connected to a differential amplifier means of the noise cancellation apparatus so as to obtain a signal representing substantially speech ;
the amplifier means for receiving acoustic sounds from each microphone and having a first terminal and a second terminal , wherein the second terminal is grounded ;
a transistor means for receiving and amplifying an AC signal representative of the audio input from each microphone ;
and means for filtering the amplified AC signal from the DC signal , so that the DC signal powers the amplifier means .

US6061456A
CLAIM 8
. A noise reduction system (noise character) for use with an active noise cancellation apparatus comprising : a pick-up microphone located in the headset for detecting noise signals to convert to electrical signals ;
a speaker located in the headset having a acoustic means with low pass filter characteristics with a zero phase shift over a desired bandwidth ;
an audio transmission signal ;
means for electrically rejecting vibrations of the electrical signal ;
a variable gain/control means for inverting the noise signal to produce an anti noise-signal ;
the acoustic means for filtering out mechanical vibration induced low frequency disturbances from reaching the speaker ;
acoustic summing means to combine the anti-noise signal and the noise signal to produce a quiet zone in the acoustic means ;
means for transmitting the audio signal to the speaker ;
and means for maintaining phase agreement between the noise signal and the anti-noise signal of the speaker .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal (electric signals) in a current frame and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US6061456A
CLAIM 6
. An active noise reduction apparatus comprising : a housing having an earphone ;
microphone means mounted in the earphone facing towards an ear of a user for detecting unwanted ambient noise ;
means to convert the noise to electric signals (sound signal, sound signal prevents updating) ;
phase shifting and attenuation means connected to the microphone to provide an inverted anti-noise signal ;
an output transducer substantially out of plane with the microphone means for transmitting the audio signal to the user' ;
s ear ;
means for preventing mechanical vibration induced low frequency disturbances from being transmitted to the output transducer ;
an acoustic waveguide isolating the microphone means from the output transducer for creating a quiet zone in close proximity to the output transducer and thereby excluding the unwanted ambient noise from reaching the user' ;
s ear .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter indicative of an activity of the sound signal (electric signals) .
US6061456A
CLAIM 6
. An active noise reduction apparatus comprising : a housing having an earphone ;
microphone means mounted in the earphone facing towards an ear of a user for detecting unwanted ambient noise ;
means to convert the noise to electric signals (sound signal, sound signal prevents updating) ;
phase shifting and attenuation means connected to the microphone to provide an inverted anti-noise signal ;
an output transducer substantially out of plane with the microphone means for transmitting the audio signal to the user' ;
s ear ;
means for preventing mechanical vibration induced low frequency disturbances from being transmitted to the output transducer ;
an acoustic waveguide isolating the microphone means from the output transducer for creating a quiet zone in close proximity to the output transducer and thereby excluding the unwanted ambient noise from reaching the user' ;
s ear .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal (electric signals) and the complementary non-stationarity parameter .
US6061456A
CLAIM 6
. An active noise reduction apparatus comprising : a housing having an earphone ;
microphone means mounted in the earphone facing towards an ear of a user for detecting unwanted ambient noise ;
means to convert the noise to electric signals (sound signal, sound signal prevents updating) ;
phase shifting and attenuation means connected to the microphone to provide an inverted anti-noise signal ;
an output transducer substantially out of plane with the microphone means for transmitting the audio signal to the user' ;
s ear ;
means for preventing mechanical vibration induced low frequency disturbances from being transmitted to the output transducer ;
an acoustic waveguide isolating the microphone means from the output transducer for creating a quiet zone in close proximity to the output transducer and thereby excluding the unwanted ambient noise from reaching the user' ;
s ear .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character (reduction system) parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US6061456A
CLAIM 8
. A noise reduction system (noise character) for use with an active noise cancellation apparatus comprising : a pick-up microphone located in the headset for detecting noise signals to convert to electrical signals ;
a speaker located in the headset having a acoustic means with low pass filter characteristics with a zero phase shift over a desired bandwidth ;
an audio transmission signal ;
means for electrically rejecting vibrations of the electrical signal ;
a variable gain/control means for inverting the noise signal to produce an anti noise-signal ;
the acoustic means for filtering out mechanical vibration induced low frequency disturbances from reaching the speaker ;
acoustic summing means to combine the anti-noise signal and the noise signal to produce a quiet zone in the acoustic means ;
means for transmitting the audio signal to the speaker ;
and means for maintaining phase agreement between the noise signal and the anti-noise signal of the speaker .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character (reduction system) parameter inferior than a given fixed threshold .
US6061456A
CLAIM 8
. A noise reduction system (noise character) for use with an active noise cancellation apparatus comprising : a pick-up microphone located in the headset for detecting noise signals to convert to electrical signals ;
a speaker located in the headset having a acoustic means with low pass filter characteristics with a zero phase shift over a desired bandwidth ;
an audio transmission signal ;
means for electrically rejecting vibrations of the electrical signal ;
a variable gain/control means for inverting the noise signal to produce an anti noise-signal ;
the acoustic means for filtering out mechanical vibration induced low frequency disturbances from reaching the speaker ;
acoustic summing means to combine the anti-noise signal and the noise signal to produce a quiet zone in the acoustic means ;
means for transmitting the audio signal to the speaker ;
and means for maintaining phase agreement between the noise signal and the anti-noise signal of the speaker .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal (electric signals) using a frequency spectrum (filter characteristic, phase response) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US6061456A
CLAIM 4
. An active noise reduction apparatus for reducing ambient noise in the vicinity of an eardrum , comprising : a housing for receiving an input audio signal ;
an output transducer located in the housing ;
an input transducer for detecting and reducing ambient noise , the input transducer not located in substantially the same plane as the output transducer ;
signal processing means to reduce the ambient noise detected by the input transducer ;
an acoustic means including an acoustic waveguide for transmitting the input audio signal to the eardrum without disturbance of the ambient noise and having low pass filter characteristic (frequency spectrum) s with a zero phase shift over a desired bandwidth to isolate the output transducer from the input transducer for channeling the input audio signal representing substantial speech between the output transducer and the eardrum , wherein a quiet zone is created to isolate sound transmitted from the input transducer .

US6061456A
CLAIM 6
. An active noise reduction apparatus comprising : a housing having an earphone ;
microphone means mounted in the earphone facing towards an ear of a user for detecting unwanted ambient noise ;
means to convert the noise to electric signals (sound signal, sound signal prevents updating) ;
phase shifting and attenuation means connected to the microphone to provide an inverted anti-noise signal ;
an output transducer substantially out of plane with the microphone means for transmitting the audio signal to the user' ;
s ear ;
means for preventing mechanical vibration induced low frequency disturbances from being transmitted to the output transducer ;
an acoustic waveguide isolating the microphone means from the output transducer for creating a quiet zone in close proximity to the output transducer and thereby excluding the unwanted ambient noise from reaching the user' ;
s ear .

US6061456A
CLAIM 14
. A method for calibrating an active noise reduction apparatus including a housing comprising a speaker to produce an acoustic anti-noise signal in the housing , a microphone to detect an external noise signal , and an amplitude adjustment means to calibrate the acoustic anti-noise signal to create a quiet zone in the housing for operation with an independent electrical assembly , wherein the apparatus is calibrated separately from the electrical assembly , the method comprising the steps of : inputting the external noise signal received by the microphone to produce an anti-noise signal ;
transmitting to the speaker the anti-noise signal having an equal gain and opposite phase response (frequency spectrum) to the external noise signal detected by the microphone ;
and balancing the gain and phase response of the anti-noise signal by the amplitude adjustment means located in the noise reduction apparatus to match the gain and phase response of the external noise signal to yield a theoretical zero in the quiet zone .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal (electric signals) using a frequency spectrum (filter characteristic, phase response) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US6061456A
CLAIM 4
. An active noise reduction apparatus for reducing ambient noise in the vicinity of an eardrum , comprising : a housing for receiving an input audio signal ;
an output transducer located in the housing ;
an input transducer for detecting and reducing ambient noise , the input transducer not located in substantially the same plane as the output transducer ;
signal processing means to reduce the ambient noise detected by the input transducer ;
an acoustic means including an acoustic waveguide for transmitting the input audio signal to the eardrum without disturbance of the ambient noise and having low pass filter characteristic (frequency spectrum) s with a zero phase shift over a desired bandwidth to isolate the output transducer from the input transducer for channeling the input audio signal representing substantial speech between the output transducer and the eardrum , wherein a quiet zone is created to isolate sound transmitted from the input transducer .

US6061456A
CLAIM 6
. An active noise reduction apparatus comprising : a housing having an earphone ;
microphone means mounted in the earphone facing towards an ear of a user for detecting unwanted ambient noise ;
means to convert the noise to electric signals (sound signal, sound signal prevents updating) ;
phase shifting and attenuation means connected to the microphone to provide an inverted anti-noise signal ;
an output transducer substantially out of plane with the microphone means for transmitting the audio signal to the user' ;
s ear ;
means for preventing mechanical vibration induced low frequency disturbances from being transmitted to the output transducer ;
an acoustic waveguide isolating the microphone means from the output transducer for creating a quiet zone in close proximity to the output transducer and thereby excluding the unwanted ambient noise from reaching the user' ;
s ear .

US6061456A
CLAIM 14
. A method for calibrating an active noise reduction apparatus including a housing comprising a speaker to produce an acoustic anti-noise signal in the housing , a microphone to detect an external noise signal , and an amplitude adjustment means to calibrate the acoustic anti-noise signal to create a quiet zone in the housing for operation with an independent electrical assembly , wherein the apparatus is calibrated separately from the electrical assembly , the method comprising the steps of : inputting the external noise signal received by the microphone to produce an anti-noise signal ;
transmitting to the speaker the anti-noise signal having an equal gain and opposite phase response (frequency spectrum) to the external noise signal detected by the microphone ;
and balancing the gain and phase response of the anti-noise signal by the amplitude adjustment means located in the noise reduction apparatus to match the gain and phase response of the external noise signal to yield a theoretical zero in the quiet zone .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (filter characteristic, phase response) of the sound signal (electric signals) in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US6061456A
CLAIM 4
. An active noise reduction apparatus for reducing ambient noise in the vicinity of an eardrum , comprising : a housing for receiving an input audio signal ;
an output transducer located in the housing ;
an input transducer for detecting and reducing ambient noise , the input transducer not located in substantially the same plane as the output transducer ;
signal processing means to reduce the ambient noise detected by the input transducer ;
an acoustic means including an acoustic waveguide for transmitting the input audio signal to the eardrum without disturbance of the ambient noise and having low pass filter characteristic (frequency spectrum) s with a zero phase shift over a desired bandwidth to isolate the output transducer from the input transducer for channeling the input audio signal representing substantial speech between the output transducer and the eardrum , wherein a quiet zone is created to isolate sound transmitted from the input transducer .

US6061456A
CLAIM 6
. An active noise reduction apparatus comprising : a housing having an earphone ;
microphone means mounted in the earphone facing towards an ear of a user for detecting unwanted ambient noise ;
means to convert the noise to electric signals (sound signal, sound signal prevents updating) ;
phase shifting and attenuation means connected to the microphone to provide an inverted anti-noise signal ;
an output transducer substantially out of plane with the microphone means for transmitting the audio signal to the user' ;
s ear ;
means for preventing mechanical vibration induced low frequency disturbances from being transmitted to the output transducer ;
an acoustic waveguide isolating the microphone means from the output transducer for creating a quiet zone in close proximity to the output transducer and thereby excluding the unwanted ambient noise from reaching the user' ;
s ear .

US6061456A
CLAIM 14
. A method for calibrating an active noise reduction apparatus including a housing comprising a speaker to produce an acoustic anti-noise signal in the housing , a microphone to detect an external noise signal , and an amplitude adjustment means to calibrate the acoustic anti-noise signal to create a quiet zone in the housing for operation with an independent electrical assembly , wherein the apparatus is calibrated separately from the electrical assembly , the method comprising the steps of : inputting the external noise signal received by the microphone to produce an anti-noise signal ;
transmitting to the speaker the anti-noise signal having an equal gain and opposite phase response (frequency spectrum) to the external noise signal detected by the microphone ;
and balancing the gain and phase response of the anti-noise signal by the amplitude adjustment means located in the noise reduction apparatus to match the gain and phase response of the external noise signal to yield a theoretical zero in the quiet zone .

US8990073B2
CLAIM 34
. A device as defined in claim 31 , further comprising a detector of strong tones in the sound signal (electric signals) .
US6061456A
CLAIM 6
. An active noise reduction apparatus comprising : a housing having an earphone ;
microphone means mounted in the earphone facing towards an ear of a user for detecting unwanted ambient noise ;
means to convert the noise to electric signals (sound signal, sound signal prevents updating) ;
phase shifting and attenuation means connected to the microphone to provide an inverted anti-noise signal ;
an output transducer substantially out of plane with the microphone means for transmitting the audio signal to the user' ;
s ear ;
means for preventing mechanical vibration induced low frequency disturbances from being transmitted to the output transducer ;
an acoustic waveguide isolating the microphone means from the output transducer for creating a quiet zone in close proximity to the output transducer and thereby excluding the unwanted ambient noise from reaching the user' ;
s ear .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal (electric signals) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US6061456A
CLAIM 6
. An active noise reduction apparatus comprising : a housing having an earphone ;
microphone means mounted in the earphone facing towards an ear of a user for detecting unwanted ambient noise ;
means to convert the noise to electric signals (sound signal, sound signal prevents updating) ;
phase shifting and attenuation means connected to the microphone to provide an inverted anti-noise signal ;
an output transducer substantially out of plane with the microphone means for transmitting the audio signal to the user' ;
s ear ;
means for preventing mechanical vibration induced low frequency disturbances from being transmitted to the output transducer ;
an acoustic waveguide isolating the microphone means from the output transducer for creating a quiet zone in close proximity to the output transducer and thereby excluding the unwanted ambient noise from reaching the user' ;
s ear .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal (electric signals) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US6061456A
CLAIM 6
. An active noise reduction apparatus comprising : a housing having an earphone ;
microphone means mounted in the earphone facing towards an ear of a user for detecting unwanted ambient noise ;
means to convert the noise to electric signals (sound signal, sound signal prevents updating) ;
phase shifting and attenuation means connected to the microphone to provide an inverted anti-noise signal ;
an output transducer substantially out of plane with the microphone means for transmitting the audio signal to the user' ;
s ear ;
means for preventing mechanical vibration induced low frequency disturbances from being transmitted to the output transducer ;
an acoustic waveguide isolating the microphone means from the output transducer for creating a quiet zone in close proximity to the output transducer and thereby excluding the unwanted ambient noise from reaching the user' ;
s ear .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal (second microphones) to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US6061456A
CLAIM 1
. A transducer for use in a noise cancellation apparatus for reducing background noise comprising : a housing having first microphone means for receiving a first acoustic sound composed of speech originating from an operator operating said apparatus and background noise , and for converting said first acoustic sound to a first signal , and second microphone means arranged at a predetermined angle . O slashed . in close proximity with respect to said first microphone means for receiving a second acoustic sound composed of substantially said background noise and for converting said second acoustic sound to a second signal ;
said first and second microphones (average signal) are connected to a differential amplifier means of the noise cancellation apparatus so as to obtain a signal representing substantially speech ;
the amplifier means for receiving acoustic sounds from each microphone and having a first terminal and a second terminal , wherein the second terminal is grounded ;
a transistor means for receiving and amplifying an AC signal representative of the audio input from each microphone ;
and means for filtering the amplified AC signal from the DC signal , so that the DC signal powers the amplifier means .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character (reduction system) of the sound signal (electric signals) for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates .
US6061456A
CLAIM 6
. An active noise reduction apparatus comprising : a housing having an earphone ;
microphone means mounted in the earphone facing towards an ear of a user for detecting unwanted ambient noise ;
means to convert the noise to electric signals (sound signal, sound signal prevents updating) ;
phase shifting and attenuation means connected to the microphone to provide an inverted anti-noise signal ;
an output transducer substantially out of plane with the microphone means for transmitting the audio signal to the user' ;
s ear ;
means for preventing mechanical vibration induced low frequency disturbances from being transmitted to the output transducer ;
an acoustic waveguide isolating the microphone means from the output transducer for creating a quiet zone in close proximity to the output transducer and thereby excluding the unwanted ambient noise from reaching the user' ;
s ear .

US6061456A
CLAIM 8
. A noise reduction system (noise character) for use with an active noise cancellation apparatus comprising : a pick-up microphone located in the headset for detecting noise signals to convert to electrical signals ;
a speaker located in the headset having a acoustic means with low pass filter characteristics with a zero phase shift over a desired bandwidth ;
an audio transmission signal ;
means for electrically rejecting vibrations of the electrical signal ;
a variable gain/control means for inverting the noise signal to produce an anti noise-signal ;
the acoustic means for filtering out mechanical vibration induced low frequency disturbances from reaching the speaker ;
acoustic summing means to combine the anti-noise signal and the noise signal to produce a quiet zone in the acoustic means ;
means for transmitting the audio signal to the speaker ;
and means for maintaining phase agreement between the noise signal and the anti-noise signal of the speaker .

US8990073B2
CLAIM 41
. A device as defined in claim 36 , further comprising a calculator of a spectral parameter used for detecting spectral changes and spectral attacks in the sound signal (electric signals) .
US6061456A
CLAIM 6
. An active noise reduction apparatus comprising : a housing having an earphone ;
microphone means mounted in the earphone facing towards an ear of a user for detecting unwanted ambient noise ;
means to convert the noise to electric signals (sound signal, sound signal prevents updating) ;
phase shifting and attenuation means connected to the microphone to provide an inverted anti-noise signal ;
an output transducer substantially out of plane with the microphone means for transmitting the audio signal to the user' ;
s ear ;
means for preventing mechanical vibration induced low frequency disturbances from being transmitted to the output transducer ;
an acoustic waveguide isolating the microphone means from the output transducer for creating a quiet zone in close proximity to the output transducer and thereby excluding the unwanted ambient noise from reaching the user' ;
s ear .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6108626A

Filed: 1998-05-14     Issued: 2000-08-22

Object oriented audio coding

(Original Assignee) Robert Bosch GmbH; Centro Studi e Laboratori Telecomunicazioni SpA (CSELT)     (Current Assignee) CSELT- CENTRO STUDI E LABORATORI TELECOMUNICAZIONI SpA ; Robert Bosch GmbH ; Centro Studi e Laboratori Telecomunicazioni SpA (CSELT) ; Nuance Communications Inc

Luca Cellario, Michele Festa, Jorg Muller, Daniele Sereno
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (linear prediction analysis) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (given frequency) , and an initial value of the long term correlation map .
US6108626A
CLAIM 4
. A method as claimed in claim 1 characterized in that the first algorithms include linear prediction analysis (frequency spectrum) coding algorithms at least for signals of a lower set of frequency bands , and shape/gain vector quantization coding algorithms for signals of higher frequency bands and for signals where linear prediction is not exploited .

US6108626A
CLAIM 11
. A method as claimed in claim 1 , characterized in that the selection of the frequency bands to be submitted to at least the first coding step , the selection of the bands for which also second coding steps are to be performed and the number of second coding steps for a given frequency (current frame) band are determined in dependency of the bandwidth and bit rate desired for the coded signal and on requirements of a user equipment (US) and of a system (SY) in which the coded signal is exploited , independently of the bandwidth and sampling frequency of the signal to be coded , on a frame per frame basis .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (linear prediction analysis) of the sound signal in the current frame (given frequency) ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US6108626A
CLAIM 4
. A method as claimed in claim 1 characterized in that the first algorithms include linear prediction analysis (frequency spectrum) coding algorithms at least for signals of a lower set of frequency bands , and shape/gain vector quantization coding algorithms for signals of higher frequency bands and for signals where linear prediction is not exploited .

US6108626A
CLAIM 11
. A method as claimed in claim 1 , characterized in that the selection of the frequency bands to be submitted to at least the first coding step , the selection of the bands for which also second coding steps are to be performed and the number of second coding steps for a given frequency (current frame) band are determined in dependency of the bandwidth and bit rate desired for the coded signal and on requirements of a user equipment (US) and of a system (SY) in which the coded signal is exploited , independently of the bandwidth and sampling frequency of the signal to be coded , on a frame per frame basis .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (sampling means) indicative of sound activity in the sound signal .
US6108626A
CLAIM 33
. An apparatus as claimed in claim 32 , characterized in that said encoder (AC) receives a signal sampled at any arbitrary input sampling frequency from 8 to 64 kHz , and further comprises means (MXU) for upsampling said signal at an internal sampling frequency which is the power of 2 immediately higher than the input sampling frequency ;
said upsampling means (adaptive threshold) (MXU) being disabled for input sampling frequencies of 8 , 16 and 32 kHz .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates (said second set) for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order (following steps) and a sixteenth order of linear prediction (classification results, speech signal) residual error energies .
US6108626A
CLAIM 10
. A method as claimed in claim 1 , characterized in that , for speech signal (linear prediction, activity prediction parameter, noise character parameter) processing , the coding phase comprises the steps of : selecting a standard-defined speech coding algorithm as first coding algorithm for a whole set of frequency bands ;
building the basic layer with the core information generated by submitting the speech signal to the standard-defined algorithm ;
and building a coded signal corresponding to one of said intermediate layers or to the total layer , so as to obtain a coded signal upgraded with respect to the standard-defined coded signal ;
and in that the decoding phase comprises the steps of a) decoding the only basic layer , or b) decoding the whole of the coded signal , depending on the availability of decoding algorithms and/or the quality to be attained for the decoded signal .

US6108626A
CLAIM 12
. A method as claimed in claim 1 , characterized in that the selection of the frequency bands to be submitted to the first coding step is carried out by the following operations : a) determining a total bandwidth allocable to the coded signal for the available bit rate ;
b) determining the energy associated to each band included in said bandwidth , and comparing said energy with a respective first energy threshold (residual error) ;
c) enabling insertion of core information for all bands of which the energy exceeds the respective threshold .

US6108626A
CLAIM 21
. A method according to claim 1 , characterized in that said object bit streams are made up by packets of bits produced by individual coding steps and said macro-object bit stream (OB11 . . . 0821 --) comprises : a first group of overhead bits (OVH1 , OVH2) containing information regarding the classification results (linear prediction, activity prediction parameter, noise character parameter) and the frequency bands being submitted to at least the first coding step ;
the packets of the core information ;
and , if second coding steps have been performed , a second group of overhead bit (OVH3) containing information regarding the number of coding steps performed for the different frequency bands having been submitted to at least the first coding step , and the packets of the enhancement information blocks ;
and in that bit streams of different macro-objects (OB11 . . . 0821) coded in the frame are transmitted in sequence , the transmission being preceded by a configuration phase in which a further group of overhead bits (OVHO) is transmitted , which group contains all service information necessary for the configuration of a decoder (AD) .

US6108626A
CLAIM 23
. A method as claimed in claim 22 , characterized in that said scaling comprises the following steps (second order) : a1) determining a bandwidth allocable in the frame to the or each macro-object for a desired bit rate ;
b1) eliminating bit packets relevant to frequency bands which cause an exceeding of said bandwidth ;
c1) if the residual bit rate exceeds the desired bit rate , eliminating one block of enhancement information for each band , starting from the band with the highest frequency , until the desired bit rate is attained or the core information only is left , the elimination being cyclically repeated , if necessary ;
d1) if the residual bit rate at the end of step c1) still exceeds the desired bit rate , eliminating core packets of one or more frequency bands , starting from the highest frequency one .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (classification results, speech signal) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates (said second set) on the music signal .
US6108626A
CLAIM 10
. A method as claimed in claim 1 , characterized in that , for speech signal (linear prediction, activity prediction parameter, noise character parameter) processing , the coding phase comprises the steps of : selecting a standard-defined speech coding algorithm as first coding algorithm for a whole set of frequency bands ;
building the basic layer with the core information generated by submitting the speech signal to the standard-defined algorithm ;
and building a coded signal corresponding to one of said intermediate layers or to the total layer , so as to obtain a coded signal upgraded with respect to the standard-defined coded signal ;
and in that the decoding phase comprises the steps of a) decoding the only basic layer , or b) decoding the whole of the coded signal , depending on the availability of decoding algorithms and/or the quality to be attained for the decoded signal .

US6108626A
CLAIM 21
. A method according to claim 1 , characterized in that said object bit streams are made up by packets of bits produced by individual coding steps and said macro-object bit stream (OB11 . . . 0821 --) comprises : a first group of overhead bits (OVH1 , OVH2) containing information regarding the classification results (linear prediction, activity prediction parameter, noise character parameter) and the frequency bands being submitted to at least the first coding step ;
the packets of the core information ;
and , if second coding steps have been performed , a second group of overhead bit (OVH3) containing information regarding the number of coding steps performed for the different frequency bands having been submitted to at least the first coding step , and the packets of the enhancement information blocks ;
and in that bit streams of different macro-objects (OB11 . . . 0821) coded in the frame are transmitted in sequence , the transmission being preceded by a configuration phase in which a further group of overhead bits (OVHO) is transmitted , which group contains all service information necessary for the configuration of a decoder (AD) .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame (given frequency) energy and an average frame energy .
US6108626A
CLAIM 11
. A method as claimed in claim 1 , characterized in that the selection of the frequency bands to be submitted to at least the first coding step , the selection of the bands for which also second coding steps are to be performed and the number of second coding steps for a given frequency (current frame) band are determined in dependency of the bandwidth and bit rate desired for the coded signal and on requirements of a user equipment (US) and of a system (SY) in which the coded signal is exploited , independently of the bandwidth and sampling frequency of the signal to be coded , on a frame per frame basis .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame (given frequency) and an energy of the sound signal in a previous frame , for frequency bands (different frequency band, predetermined bandwidth, same frequency band) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US6108626A
CLAIM 8
. A method as claimed in claim 1 , characterized in that said frequency bands have a predetermined bandwidth (frequency bands, first frequency bands, frequency dependent signal) , independently of a sampling frequency of the signal to be coded .

US6108626A
CLAIM 11
. A method as claimed in claim 1 , characterized in that the selection of the frequency bands to be submitted to at least the first coding step , the selection of the bands for which also second coding steps are to be performed and the number of second coding steps for a given frequency (current frame) band are determined in dependency of the bandwidth and bit rate desired for the coded signal and on requirements of a user equipment (US) and of a system (SY) in which the coded signal is exploited , independently of the bandwidth and sampling frequency of the signal to be coded , on a frame per frame basis .

US6108626A
CLAIM 21
. A method according to claim 1 , characterized in that said object bit streams are made up by packets of bits produced by individual coding steps and said macro-object bit stream (OB11 . . . 0821 --) comprises : a first group of overhead bits (OVH1 , OVH2) containing information regarding the classification results and the frequency bands being submitted to at least the first coding step ;
the packets of the core information ;
and , if second coding steps have been performed , a second group of overhead bit (OVH3) containing information regarding the number of coding steps performed for the different frequency band (frequency bands, first frequency bands, frequency dependent signal) s having been submitted to at least the first coding step , and the packets of the enhancement information blocks ;
and in that bit streams of different macro-objects (OB11 . . . 0821) coded in the frame are transmitted in sequence , the transmission being preceded by a configuration phase in which a further group of overhead bits (OVHO) is transmitted , which group contains all service information necessary for the configuration of a decoder (AD) .

US6108626A
CLAIM 38
. An apparatus as claimed in claim 32 , characterized in that the second coding units (LEC , HEC) associated with a frequency band code a quantization error obtained as a result of the application of the first coding algorithm to signals in the same frequency band (frequency bands, first frequency bands, frequency dependent signal) .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (classification results, speech signal) indicative of an activity of the sound signal .
US6108626A
CLAIM 10
. A method as claimed in claim 1 , characterized in that , for speech signal (linear prediction, activity prediction parameter, noise character parameter) processing , the coding phase comprises the steps of : selecting a standard-defined speech coding algorithm as first coding algorithm for a whole set of frequency bands ;
building the basic layer with the core information generated by submitting the speech signal to the standard-defined algorithm ;
and building a coded signal corresponding to one of said intermediate layers or to the total layer , so as to obtain a coded signal upgraded with respect to the standard-defined coded signal ;
and in that the decoding phase comprises the steps of a) decoding the only basic layer , or b) decoding the whole of the coded signal , depending on the availability of decoding algorithms and/or the quality to be attained for the decoded signal .

US6108626A
CLAIM 21
. A method according to claim 1 , characterized in that said object bit streams are made up by packets of bits produced by individual coding steps and said macro-object bit stream (OB11 . . . 0821 --) comprises : a first group of overhead bits (OVH1 , OVH2) containing information regarding the classification results (linear prediction, activity prediction parameter, noise character parameter) and the frequency bands being submitted to at least the first coding step ;
the packets of the core information ;
and , if second coding steps have been performed , a second group of overhead bit (OVH3) containing information regarding the number of coding steps performed for the different frequency bands having been submitted to at least the first coding step , and the packets of the enhancement information blocks ;
and in that bit streams of different macro-objects (OB11 . . . 0821) coded in the frame are transmitted in sequence , the transmission being preceded by a configuration phase in which a further group of overhead bits (OVHO) is transmitted , which group contains all service information necessary for the configuration of a decoder (AD) .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (classification results, speech signal) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
US6108626A
CLAIM 10
. A method as claimed in claim 1 , characterized in that , for speech signal (linear prediction, activity prediction parameter, noise character parameter) processing , the coding phase comprises the steps of : selecting a standard-defined speech coding algorithm as first coding algorithm for a whole set of frequency bands ;
building the basic layer with the core information generated by submitting the speech signal to the standard-defined algorithm ;
and building a coded signal corresponding to one of said intermediate layers or to the total layer , so as to obtain a coded signal upgraded with respect to the standard-defined coded signal ;
and in that the decoding phase comprises the steps of a) decoding the only basic layer , or b) decoding the whole of the coded signal , depending on the availability of decoding algorithms and/or the quality to be attained for the decoded signal .

US6108626A
CLAIM 21
. A method according to claim 1 , characterized in that said object bit streams are made up by packets of bits produced by individual coding steps and said macro-object bit stream (OB11 . . . 0821 --) comprises : a first group of overhead bits (OVH1 , OVH2) containing information regarding the classification results (linear prediction, activity prediction parameter, noise character parameter) and the frequency bands being submitted to at least the first coding step ;
the packets of the core information ;
and , if second coding steps have been performed , a second group of overhead bit (OVH3) containing information regarding the number of coding steps performed for the different frequency bands having been submitted to at least the first coding step , and the packets of the enhancement information blocks ;
and in that bit streams of different macro-objects (OB11 . . . 0821) coded in the frame are transmitted in sequence , the transmission being preceded by a configuration phase in which a further group of overhead bits (OVHO) is transmitted , which group contains all service information necessary for the configuration of a decoder (AD) .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates (said second set) is prevented in response to having simultaneously the activity prediction parameter (classification results, speech signal) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US6108626A
CLAIM 10
. A method as claimed in claim 1 , characterized in that , for speech signal (linear prediction, activity prediction parameter, noise character parameter) processing , the coding phase comprises the steps of : selecting a standard-defined speech coding algorithm as first coding algorithm for a whole set of frequency bands ;
building the basic layer with the core information generated by submitting the speech signal to the standard-defined algorithm ;
and building a coded signal corresponding to one of said intermediate layers or to the total layer , so as to obtain a coded signal upgraded with respect to the standard-defined coded signal ;
and in that the decoding phase comprises the steps of a) decoding the only basic layer , or b) decoding the whole of the coded signal , depending on the availability of decoding algorithms and/or the quality to be attained for the decoded signal .

US6108626A
CLAIM 21
. A method according to claim 1 , characterized in that said object bit streams are made up by packets of bits produced by individual coding steps and said macro-object bit stream (OB11 . . . 0821 --) comprises : a first group of overhead bits (OVH1 , OVH2) containing information regarding the classification results (linear prediction, activity prediction parameter, noise character parameter) and the frequency bands being submitted to at least the first coding step ;
the packets of the core information ;
and , if second coding steps have been performed , a second group of overhead bit (OVH3) containing information regarding the number of coding steps performed for the different frequency bands having been submitted to at least the first coding step , and the packets of the enhancement information blocks ;
and in that bit streams of different macro-objects (OB11 . . . 0821) coded in the frame are transmitted in sequence , the transmission being preceded by a configuration phase in which a further group of overhead bits (OVHO) is transmitted , which group contains all service information necessary for the configuration of a decoder (AD) .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (classification results, speech signal) comprises : dividing a plurality of frequency bands (different frequency band, predetermined bandwidth, same frequency band) into a first group (first group) of a certain number of first frequency bands and a second group (second group, steps a) of a rest of the frequency bands ;

calculating a first energy (first energy) value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US6108626A
CLAIM 8
. A method as claimed in claim 1 , characterized in that said frequency bands have a predetermined bandwidth (frequency bands, first frequency bands, frequency dependent signal) , independently of a sampling frequency of the signal to be coded .

US6108626A
CLAIM 10
. A method as claimed in claim 1 , characterized in that , for speech signal (linear prediction, activity prediction parameter, noise character parameter) processing , the coding phase comprises the steps of : selecting a standard-defined speech coding algorithm as first coding algorithm for a whole set of frequency bands ;
building the basic layer with the core information generated by submitting the speech signal to the standard-defined algorithm ;
and building a coded signal corresponding to one of said intermediate layers or to the total layer , so as to obtain a coded signal upgraded with respect to the standard-defined coded signal ;
and in that the decoding phase comprises the steps of a) decoding the only basic layer , or b) decoding the whole of the coded signal , depending on the availability of decoding algorithms and/or the quality to be attained for the decoded signal .

US6108626A
CLAIM 11
. A method as claimed in claim 1 , characterized in that the selection of the frequency bands to be submitted to at least the first coding step , the selection of the bands for which also second coding steps a (second group) re to be performed and the number of second coding steps for a given frequency band are determined in dependency of the bandwidth and bit rate desired for the coded signal and on requirements of a user equipment (US) and of a system (SY) in which the coded signal is exploited , independently of the bandwidth and sampling frequency of the signal to be coded , on a frame per frame basis .

US6108626A
CLAIM 12
. A method as claimed in claim 1 , characterized in that the selection of the frequency bands to be submitted to the first coding step is carried out by the following operations : a) determining a total bandwidth allocable to the coded signal for the available bit rate ;
b) determining the energy associated to each band included in said bandwidth , and comparing said energy with a respective first energy (first energy) threshold ;
c) enabling insertion of core information for all bands of which the energy exceeds the respective threshold .

US6108626A
CLAIM 21
. A method according to claim 1 , characterized in that said object bit streams are made up by packets of bits produced by individual coding steps and said macro-object bit stream (OB11 . . . 0821 --) comprises : a first group (first group) of overhead bits (OVH1 , OVH2) containing information regarding the classification results (linear prediction, activity prediction parameter, noise character parameter) and the frequency bands being submitted to at least the first coding step ;
the packets of the core information ;
and , if second coding steps have been performed , a second group (second group) of overhead bit (OVH3) containing information regarding the number of coding steps performed for the different frequency band (frequency bands, first frequency bands, frequency dependent signal) s having been submitted to at least the first coding step , and the packets of the enhancement information blocks ;
and in that bit streams of different macro-objects (OB11 . . . 0821) coded in the frame are transmitted in sequence , the transmission being preceded by a configuration phase in which a further group of overhead bits (OVHO) is transmitted , which group contains all service information necessary for the configuration of a decoder (AD) .

US6108626A
CLAIM 38
. An apparatus as claimed in claim 32 , characterized in that the second coding units (LEC , HEC) associated with a frequency band code a quantization error obtained as a result of the application of the first coding algorithm to signals in the same frequency band (frequency bands, first frequency bands, frequency dependent signal) .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates (said second set) is prevented in response to having the noise character parameter (classification results, speech signal) inferior than a given fixed threshold .
US6108626A
CLAIM 10
. A method as claimed in claim 1 , characterized in that , for speech signal (linear prediction, activity prediction parameter, noise character parameter) processing , the coding phase comprises the steps of : selecting a standard-defined speech coding algorithm as first coding algorithm for a whole set of frequency bands ;
building the basic layer with the core information generated by submitting the speech signal to the standard-defined algorithm ;
and building a coded signal corresponding to one of said intermediate layers or to the total layer , so as to obtain a coded signal upgraded with respect to the standard-defined coded signal ;
and in that the decoding phase comprises the steps of a) decoding the only basic layer , or b) decoding the whole of the coded signal , depending on the availability of decoding algorithms and/or the quality to be attained for the decoded signal .

US6108626A
CLAIM 21
. A method according to claim 1 , characterized in that said object bit streams are made up by packets of bits produced by individual coding steps and said macro-object bit stream (OB11 . . . 0821 --) comprises : a first group of overhead bits (OVH1 , OVH2) containing information regarding the classification results (linear prediction, activity prediction parameter, noise character parameter) and the frequency bands being submitted to at least the first coding step ;
the packets of the core information ;
and , if second coding steps have been performed , a second group of overhead bit (OVH3) containing information regarding the number of coding steps performed for the different frequency bands having been submitted to at least the first coding step , and the packets of the enhancement information blocks ;
and in that bit streams of different macro-objects (OB11 . . . 0821) coded in the frame are transmitted in sequence , the transmission being preceded by a configuration phase in which a further group of overhead bits (OVHO) is transmitted , which group contains all service information necessary for the configuration of a decoder (AD) .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (linear prediction analysis) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (given frequency) , and an initial value of the long-term correlation map .
US6108626A
CLAIM 4
. A method as claimed in claim 1 characterized in that the first algorithms include linear prediction analysis (frequency spectrum) coding algorithms at least for signals of a lower set of frequency bands , and shape/gain vector quantization coding algorithms for signals of higher frequency bands and for signals where linear prediction is not exploited .

US6108626A
CLAIM 11
. A method as claimed in claim 1 , characterized in that the selection of the frequency bands to be submitted to at least the first coding step , the selection of the bands for which also second coding steps are to be performed and the number of second coding steps for a given frequency (current frame) band are determined in dependency of the bandwidth and bit rate desired for the coded signal and on requirements of a user equipment (US) and of a system (SY) in which the coded signal is exploited , independently of the bandwidth and sampling frequency of the signal to be coded , on a frame per frame basis .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (linear prediction analysis) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (given frequency) , and an initial value of the long-term correlation map .
US6108626A
CLAIM 4
. A method as claimed in claim 1 characterized in that the first algorithms include linear prediction analysis (frequency spectrum) coding algorithms at least for signals of a lower set of frequency bands , and shape/gain vector quantization coding algorithms for signals of higher frequency bands and for signals where linear prediction is not exploited .

US6108626A
CLAIM 11
. A method as claimed in claim 1 , characterized in that the selection of the frequency bands to be submitted to at least the first coding step , the selection of the bands for which also second coding steps are to be performed and the number of second coding steps for a given frequency (current frame) band are determined in dependency of the bandwidth and bit rate desired for the coded signal and on requirements of a user equipment (US) and of a system (SY) in which the coded signal is exploited , independently of the bandwidth and sampling frequency of the signal to be coded , on a frame per frame basis .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (linear prediction analysis) of the sound signal in the current frame (given frequency) ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US6108626A
CLAIM 4
. A method as claimed in claim 1 characterized in that the first algorithms include linear prediction analysis (frequency spectrum) coding algorithms at least for signals of a lower set of frequency bands , and shape/gain vector quantization coding algorithms for signals of higher frequency bands and for signals where linear prediction is not exploited .

US6108626A
CLAIM 11
. A method as claimed in claim 1 , characterized in that the selection of the frequency bands to be submitted to at least the first coding step , the selection of the bands for which also second coding steps are to be performed and the number of second coding steps for a given frequency (current frame) band are determined in dependency of the bandwidth and bit rate desired for the coded signal and on requirements of a user equipment (US) and of a system (SY) in which the coded signal is exploited , independently of the bandwidth and sampling frequency of the signal to be coded , on a frame per frame basis .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates (said second set) in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector .
US6108626A
CLAIM 2
. A method as claimed in claim 1 , characterized in that the first and second algorithms are independently selected for different band (updating noise energy estimates) s .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US5983139A

Filed: 1998-04-28     Issued: 1999-11-09

Cochlear implant system

(Original Assignee) MED EL Elektromedizinische Geraete GmbH     (Current Assignee) MED EL Elektromedizinische Geraete GmbH

Clemens Zierhofer
US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability (selected time) , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
US5983139A
CLAIM 5
. A cochlear implant system as in claim 1 , wherein the low-pass FIR filter is further comprised of : an input filter to convolve the digital sequence to produce a multi-level sequence having a plurality of allowable values ;
a peripheral filter to convolve the multi-level sequence to produce the low-pass vector ;
an output stage including at least one output counter to downsample the low-pass vector at selected time (pitch stability) s ;
and a low-pass random access memory (RAM) to sequentially store the downsampled low-pass vector .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands (binary sequence) and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US5983139A
CLAIM 3
. A cochlear implant system as in claim 1 , wherein the digital sequence is a two-level binary sequence (first frequency bands) .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal (one half) to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US5983139A
CLAIM 12
. A cochlear implant system as in claim 11 , wherein the ALU estimates the value of the square root of the sum of two squares by : determining the greater of the roots of the two squares and the lesser of the roots of the two squares ;
calculating a sum of one half (average signal) the lesser of the roots of the two squares and one half a product of the greater of the roots of the two squares and the square root of three ;
and selecting whichever is larger between the greater of the roots of the two squares and the calculated sum .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US7016507B1

Filed: 1998-04-16     Issued: 2006-03-21

Method and apparatus for noise reduction particularly in hearing aids

(Original Assignee) Ami Semiconductor Inc     (Current Assignee) Ami Semiconductor Inc ; BANK OF NOVA SCOTIA

Robert Brennan
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (second noise, noise filter) using a frequency spectrum (time frames) of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US7016507B1
CLAIM 12
. A method as claimed in claim 8 , wherein the attenuation function (H(f)) is calculated at successive time frames (frequency spectrum) , and the attenuation function (H(f)) is calculated in accordance with the following equation : G n (f)=(1=γ) H (f)+γ G n-1 (f) wherein G n (f) and G n-1 (f) are the smoothed attenuation functions at the n' ;
th and (n−1) ' ;
th time frames , and γ is a forgetting factor .

US7016507B1
CLAIM 24
. An apparatus as claimed in claim 23 , wherein the input signal contains speech and the main noise reduction unit comprises : (1) a detector connected to said input and providing a detection signal indicative of the presence of speech ;
(2) magnitude means for determining the magnitude spectrum of the input signal (|X(f)|) , with both the detector and the magnitude means being connected to the input of the apparatus ;
(3) spectral estimate means for generating a noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and being connected to the detector and to the input of the apparatus ;
(4) a noise filter (sound signal, music signal) calculation unit connected to the spectral estimate means and the magnitude means , for receiving the noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and magnitude spectrum of the input signal (|X(f)|) and calculating an attenuation function (H(f)) ;
and , (5) a multiplication unit coupled to the noise filter calculation unit and the input signal for producing the noise reduced signal .

US7016507B1
CLAIM 32
. An apparatus , for reducing noise in an input signal , the apparatus including an input for receiving the input signal , the apparatus comprising : (a) a compression circuit for receiving a compression control signal and generating an amplification control signal in response ;
(b) an amplification unit for receiving an input amplification signal and the amplification control signal and generating an output signal with compression and reduced noise under the control of the amplification control signal ;
(c) an auxiliary noise reduction unit connected to the input for generating an auxiliary noise reduced signal , the compression control signal being the auxiliary noise reduced signal ;
and , (d) a main noise reduction unit connected to the input and the amplification unit for receiving the input signal and generating a noise reduced signal , the input amplification signal being the noise reduced signal ;
wherein , the main noise reduction unit employs a first noise reduction algorithm and the auxiliary noise reduction unit employs a second noise (sound signal, music signal) reduction algorithm , the second noise reduction algorithm being adapted to attack noise more aggressively than the first noise reduction algorithm .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum (time frames) of the sound signal (second noise, noise filter) in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US7016507B1
CLAIM 12
. A method as claimed in claim 8 , wherein the attenuation function (H(f)) is calculated at successive time frames (frequency spectrum) , and the attenuation function (H(f)) is calculated in accordance with the following equation : G n (f)=(1=γ) H (f)+γ G n-1 (f) wherein G n (f) and G n-1 (f) are the smoothed attenuation functions at the n' ;
th and (n−1) ' ;
th time frames , and γ is a forgetting factor .

US7016507B1
CLAIM 24
. An apparatus as claimed in claim 23 , wherein the input signal contains speech and the main noise reduction unit comprises : (1) a detector connected to said input and providing a detection signal indicative of the presence of speech ;
(2) magnitude means for determining the magnitude spectrum of the input signal (|X(f)|) , with both the detector and the magnitude means being connected to the input of the apparatus ;
(3) spectral estimate means for generating a noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and being connected to the detector and to the input of the apparatus ;
(4) a noise filter (sound signal, music signal) calculation unit connected to the spectral estimate means and the magnitude means , for receiving the noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and magnitude spectrum of the input signal (|X(f)|) and calculating an attenuation function (H(f)) ;
and , (5) a multiplication unit coupled to the noise filter calculation unit and the input signal for producing the noise reduced signal .

US7016507B1
CLAIM 32
. An apparatus , for reducing noise in an input signal , the apparatus including an input for receiving the input signal , the apparatus comprising : (a) a compression circuit for receiving a compression control signal and generating an amplification control signal in response ;
(b) an amplification unit for receiving an input amplification signal and the amplification control signal and generating an output signal with compression and reduced noise under the control of the amplification control signal ;
(c) an auxiliary noise reduction unit connected to the input for generating an auxiliary noise reduced signal , the compression control signal being the auxiliary noise reduced signal ;
and , (d) a main noise reduction unit connected to the input and the amplification unit for receiving the input signal and generating a noise reduced signal , the input amplification signal being the noise reduced signal ;
wherein , the main noise reduction unit employs a first noise reduction algorithm and the auxiliary noise reduction unit employs a second noise (sound signal, music signal) reduction algorithm , the second noise reduction algorithm being adapted to attack noise more aggressively than the first noise reduction algorithm .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin (periodic signal) by frequency bin basis ;

and summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US7016507B1
CLAIM 19
. A method as claimed in claim 1 , wherein detecting the presence or absence of speech comprises : (1) taking a block of the input signal and performing an auto-correlation on that block to form a correlated signal ;
and , (2) checking the correlated signal for the presence of a periodic signal (frequency bin) having a pitch corresponding to that for a desired audio signal .

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (second noise, noise filter) .
US7016507B1
CLAIM 24
. An apparatus as claimed in claim 23 , wherein the input signal contains speech and the main noise reduction unit comprises : (1) a detector connected to said input and providing a detection signal indicative of the presence of speech ;
(2) magnitude means for determining the magnitude spectrum of the input signal (|X(f)|) , with both the detector and the magnitude means being connected to the input of the apparatus ;
(3) spectral estimate means for generating a noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and being connected to the detector and to the input of the apparatus ;
(4) a noise filter (sound signal, music signal) calculation unit connected to the spectral estimate means and the magnitude means , for receiving the noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and magnitude spectrum of the input signal (|X(f)|) and calculating an attenuation function (H(f)) ;
and , (5) a multiplication unit coupled to the noise filter calculation unit and the input signal for producing the noise reduced signal .

US7016507B1
CLAIM 32
. An apparatus , for reducing noise in an input signal , the apparatus including an input for receiving the input signal , the apparatus comprising : (a) a compression circuit for receiving a compression control signal and generating an amplification control signal in response ;
(b) an amplification unit for receiving an input amplification signal and the amplification control signal and generating an output signal with compression and reduced noise under the control of the amplification control signal ;
(c) an auxiliary noise reduction unit connected to the input for generating an auxiliary noise reduced signal , the compression control signal being the auxiliary noise reduced signal ;
and , (d) a main noise reduction unit connected to the input and the amplification unit for receiving the input signal and generating a noise reduced signal , the input amplification signal being the noise reduced signal ;
wherein , the main noise reduction unit employs a first noise reduction algorithm and the auxiliary noise reduction unit employs a second noise (sound signal, music signal) reduction algorithm , the second noise reduction algorithm being adapted to attack noise more aggressively than the first noise reduction algorithm .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (second noise, noise filter) comprises searching in the correlation map for frequency bins having a magnitude that exceeds a given fixed threshold .
US7016507B1
CLAIM 24
. An apparatus as claimed in claim 23 , wherein the input signal contains speech and the main noise reduction unit comprises : (1) a detector connected to said input and providing a detection signal indicative of the presence of speech ;
(2) magnitude means for determining the magnitude spectrum of the input signal (|X(f)|) , with both the detector and the magnitude means being connected to the input of the apparatus ;
(3) spectral estimate means for generating a noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and being connected to the detector and to the input of the apparatus ;
(4) a noise filter (sound signal, music signal) calculation unit connected to the spectral estimate means and the magnitude means , for receiving the noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and magnitude spectrum of the input signal (|X(f)|) and calculating an attenuation function (H(f)) ;
and , (5) a multiplication unit coupled to the noise filter calculation unit and the input signal for producing the noise reduced signal .

US7016507B1
CLAIM 32
. An apparatus , for reducing noise in an input signal , the apparatus including an input for receiving the input signal , the apparatus comprising : (a) a compression circuit for receiving a compression control signal and generating an amplification control signal in response ;
(b) an amplification unit for receiving an input amplification signal and the amplification control signal and generating an output signal with compression and reduced noise under the control of the amplification control signal ;
(c) an auxiliary noise reduction unit connected to the input for generating an auxiliary noise reduced signal , the compression control signal being the auxiliary noise reduced signal ;
and , (d) a main noise reduction unit connected to the input and the amplification unit for receiving the input signal and generating a noise reduced signal , the input amplification signal being the noise reduced signal ;
wherein , the main noise reduction unit employs a first noise reduction algorithm and the auxiliary noise reduction unit employs a second noise (sound signal, music signal) reduction algorithm , the second noise reduction algorithm being adapted to attack noise more aggressively than the first noise reduction algorithm .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (second noise, noise filter) comprises comparing the summed long-term correlation map with an adaptive threshold indicative of sound activity (high frequencies) in the sound signal .
US7016507B1
CLAIM 10
. A method as claimed in claim 9 , wherein the oversubtraction factor β is divided by a preemphasis function P(f) to give a modified oversubtraction factor {circumflex over (β)}(f) , the preemphasis function being such as to reduce {circumflex over (β)}(f) at high frequencies (sound activity) , and thereby reduce attenuation at high frequencies .

US7016507B1
CLAIM 24
. An apparatus as claimed in claim 23 , wherein the input signal contains speech and the main noise reduction unit comprises : (1) a detector connected to said input and providing a detection signal indicative of the presence of speech ;
(2) magnitude means for determining the magnitude spectrum of the input signal (|X(f)|) , with both the detector and the magnitude means being connected to the input of the apparatus ;
(3) spectral estimate means for generating a noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and being connected to the detector and to the input of the apparatus ;
(4) a noise filter (sound signal, music signal) calculation unit connected to the spectral estimate means and the magnitude means , for receiving the noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and magnitude spectrum of the input signal (|X(f)|) and calculating an attenuation function (H(f)) ;
and , (5) a multiplication unit coupled to the noise filter calculation unit and the input signal for producing the noise reduced signal .

US7016507B1
CLAIM 32
. An apparatus , for reducing noise in an input signal , the apparatus including an input for receiving the input signal , the apparatus comprising : (a) a compression circuit for receiving a compression control signal and generating an amplification control signal in response ;
(b) an amplification unit for receiving an input amplification signal and the amplification control signal and generating an output signal with compression and reduced noise under the control of the amplification control signal ;
(c) an auxiliary noise reduction unit connected to the input for generating an auxiliary noise reduced signal , the compression control signal being the auxiliary noise reduced signal ;
and , (d) a main noise reduction unit connected to the input and the amplification unit for receiving the input signal and generating a noise reduced signal , the input amplification signal being the noise reduced signal ;
wherein , the main noise reduction unit employs a first noise reduction algorithm and the auxiliary noise reduction unit employs a second noise (sound signal, music signal) reduction algorithm , the second noise reduction algorithm being adapted to attack noise more aggressively than the first noise reduction algorithm .

US8990073B2
CLAIM 10
. A method for detecting sound activity (high frequencies) in a sound signal (second noise, noise filter) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (second noise, noise filter) from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US7016507B1
CLAIM 10
. A method as claimed in claim 9 , wherein the oversubtraction factor β is divided by a preemphasis function P(f) to give a modified oversubtraction factor {circumflex over (β)}(f) , the preemphasis function being such as to reduce {circumflex over (β)}(f) at high frequencies (sound activity) , and thereby reduce attenuation at high frequencies .

US7016507B1
CLAIM 24
. An apparatus as claimed in claim 23 , wherein the input signal contains speech and the main noise reduction unit comprises : (1) a detector connected to said input and providing a detection signal indicative of the presence of speech ;
(2) magnitude means for determining the magnitude spectrum of the input signal (|X(f)|) , with both the detector and the magnitude means being connected to the input of the apparatus ;
(3) spectral estimate means for generating a noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and being connected to the detector and to the input of the apparatus ;
(4) a noise filter (sound signal, music signal) calculation unit connected to the spectral estimate means and the magnitude means , for receiving the noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and magnitude spectrum of the input signal (|X(f)|) and calculating an attenuation function (H(f)) ;
and , (5) a multiplication unit coupled to the noise filter calculation unit and the input signal for producing the noise reduced signal .

US7016507B1
CLAIM 32
. An apparatus , for reducing noise in an input signal , the apparatus including an input for receiving the input signal , the apparatus comprising : (a) a compression circuit for receiving a compression control signal and generating an amplification control signal in response ;
(b) an amplification unit for receiving an input amplification signal and the amplification control signal and generating an output signal with compression and reduced noise under the control of the amplification control signal ;
(c) an auxiliary noise reduction unit connected to the input for generating an auxiliary noise reduced signal , the compression control signal being the auxiliary noise reduced signal ;
and , (d) a main noise reduction unit connected to the input and the amplification unit for receiving the input signal and generating a noise reduced signal , the input amplification signal being the noise reduced signal ;
wherein , the main noise reduction unit employs a first noise reduction algorithm and the auxiliary noise reduction unit employs a second noise (sound signal, music signal) reduction algorithm , the second noise reduction algorithm being adapted to attack noise more aggressively than the first noise reduction algorithm .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound signal (second noise, noise filter) is detected .
US7016507B1
CLAIM 24
. An apparatus as claimed in claim 23 , wherein the input signal contains speech and the main noise reduction unit comprises : (1) a detector connected to said input and providing a detection signal indicative of the presence of speech ;
(2) magnitude means for determining the magnitude spectrum of the input signal (|X(f)|) , with both the detector and the magnitude means being connected to the input of the apparatus ;
(3) spectral estimate means for generating a noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and being connected to the detector and to the input of the apparatus ;
(4) a noise filter (sound signal, music signal) calculation unit connected to the spectral estimate means and the magnitude means , for receiving the noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and magnitude spectrum of the input signal (|X(f)|) and calculating an attenuation function (H(f)) ;
and , (5) a multiplication unit coupled to the noise filter calculation unit and the input signal for producing the noise reduced signal .

US7016507B1
CLAIM 32
. An apparatus , for reducing noise in an input signal , the apparatus including an input for receiving the input signal , the apparatus comprising : (a) a compression circuit for receiving a compression control signal and generating an amplification control signal in response ;
(b) an amplification unit for receiving an input amplification signal and the amplification control signal and generating an output signal with compression and reduced noise under the control of the amplification control signal ;
(c) an auxiliary noise reduction unit connected to the input for generating an auxiliary noise reduced signal , the compression control signal being the auxiliary noise reduced signal ;
and , (d) a main noise reduction unit connected to the input and the amplification unit for receiving the input signal and generating a noise reduced signal , the input amplification signal being the noise reduced signal ;
wherein , the main noise reduction unit employs a first noise reduction algorithm and the auxiliary noise reduction unit employs a second noise (sound signal, music signal) reduction algorithm , the second noise reduction algorithm being adapted to attack noise more aggressively than the first noise reduction algorithm .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity (high frequencies) in the sound signal (second noise, noise filter) further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
US7016507B1
CLAIM 10
. A method as claimed in claim 9 , wherein the oversubtraction factor β is divided by a preemphasis function P(f) to give a modified oversubtraction factor {circumflex over (β)}(f) , the preemphasis function being such as to reduce {circumflex over (β)}(f) at high frequencies (sound activity) , and thereby reduce attenuation at high frequencies .

US7016507B1
CLAIM 24
. An apparatus as claimed in claim 23 , wherein the input signal contains speech and the main noise reduction unit comprises : (1) a detector connected to said input and providing a detection signal indicative of the presence of speech ;
(2) magnitude means for determining the magnitude spectrum of the input signal (|X(f)|) , with both the detector and the magnitude means being connected to the input of the apparatus ;
(3) spectral estimate means for generating a noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and being connected to the detector and to the input of the apparatus ;
(4) a noise filter (sound signal, music signal) calculation unit connected to the spectral estimate means and the magnitude means , for receiving the noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and magnitude spectrum of the input signal (|X(f)|) and calculating an attenuation function (H(f)) ;
and , (5) a multiplication unit coupled to the noise filter calculation unit and the input signal for producing the noise reduced signal .

US7016507B1
CLAIM 32
. An apparatus , for reducing noise in an input signal , the apparatus including an input for receiving the input signal , the apparatus comprising : (a) a compression circuit for receiving a compression control signal and generating an amplification control signal in response ;
(b) an amplification unit for receiving an input amplification signal and the amplification control signal and generating an output signal with compression and reduced noise under the control of the amplification control signal ;
(c) an auxiliary noise reduction unit connected to the input for generating an auxiliary noise reduced signal , the compression control signal being the auxiliary noise reduced signal ;
and , (d) a main noise reduction unit connected to the input and the amplification unit for receiving the input signal and generating a noise reduced signal , the input amplification signal being the noise reduced signal ;
wherein , the main noise reduction unit employs a first noise reduction algorithm and the auxiliary noise reduction unit employs a second noise (sound signal, music signal) reduction algorithm , the second noise reduction algorithm being adapted to attack noise more aggressively than the first noise reduction algorithm .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (high frequencies) detection comprises detecting the sound signal (second noise, noise filter) based on a frequency dependent signal-to-noise ratio (SNR) .
US7016507B1
CLAIM 10
. A method as claimed in claim 9 , wherein the oversubtraction factor β is divided by a preemphasis function P(f) to give a modified oversubtraction factor {circumflex over (β)}(f) , the preemphasis function being such as to reduce {circumflex over (β)}(f) at high frequencies (sound activity) , and thereby reduce attenuation at high frequencies .

US7016507B1
CLAIM 24
. An apparatus as claimed in claim 23 , wherein the input signal contains speech and the main noise reduction unit comprises : (1) a detector connected to said input and providing a detection signal indicative of the presence of speech ;
(2) magnitude means for determining the magnitude spectrum of the input signal (|X(f)|) , with both the detector and the magnitude means being connected to the input of the apparatus ;
(3) spectral estimate means for generating a noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and being connected to the detector and to the input of the apparatus ;
(4) a noise filter (sound signal, music signal) calculation unit connected to the spectral estimate means and the magnitude means , for receiving the noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and magnitude spectrum of the input signal (|X(f)|) and calculating an attenuation function (H(f)) ;
and , (5) a multiplication unit coupled to the noise filter calculation unit and the input signal for producing the noise reduced signal .

US7016507B1
CLAIM 32
. An apparatus , for reducing noise in an input signal , the apparatus including an input for receiving the input signal , the apparatus comprising : (a) a compression circuit for receiving a compression control signal and generating an amplification control signal in response ;
(b) an amplification unit for receiving an input amplification signal and the amplification control signal and generating an output signal with compression and reduced noise under the control of the amplification control signal ;
(c) an auxiliary noise reduction unit connected to the input for generating an auxiliary noise reduced signal , the compression control signal being the auxiliary noise reduced signal ;
and , (d) a main noise reduction unit connected to the input and the amplification unit for receiving the input signal and generating a noise reduced signal , the input amplification signal being the noise reduced signal ;
wherein , the main noise reduction unit employs a first noise reduction algorithm and the auxiliary noise reduction unit employs a second noise (sound signal, music signal) reduction algorithm , the second noise reduction algorithm being adapted to attack noise more aggressively than the first noise reduction algorithm .

US8990073B2
CLAIM 14
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (high frequencies) detection comprises comparing an average signal-to-noise ratio (SNR av ) to a threshold calculated as a function of a long-term signal-to-noise ratio (SNR LT ) .
US7016507B1
CLAIM 10
. A method as claimed in claim 9 , wherein the oversubtraction factor β is divided by a preemphasis function P(f) to give a modified oversubtraction factor {circumflex over (β)}(f) , the preemphasis function being such as to reduce {circumflex over (β)}(f) at high frequencies (sound activity) , and thereby reduce attenuation at high frequencies .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity (high frequencies) detection in the sound signal (second noise, noise filter) further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
US7016507B1
CLAIM 10
. A method as claimed in claim 9 , wherein the oversubtraction factor β is divided by a preemphasis function P(f) to give a modified oversubtraction factor {circumflex over (β)}(f) , the preemphasis function being such as to reduce {circumflex over (β)}(f) at high frequencies (sound activity) , and thereby reduce attenuation at high frequencies .

US7016507B1
CLAIM 24
. An apparatus as claimed in claim 23 , wherein the input signal contains speech and the main noise reduction unit comprises : (1) a detector connected to said input and providing a detection signal indicative of the presence of speech ;
(2) magnitude means for determining the magnitude spectrum of the input signal (|X(f)|) , with both the detector and the magnitude means being connected to the input of the apparatus ;
(3) spectral estimate means for generating a noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and being connected to the detector and to the input of the apparatus ;
(4) a noise filter (sound signal, music signal) calculation unit connected to the spectral estimate means and the magnitude means , for receiving the noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and magnitude spectrum of the input signal (|X(f)|) and calculating an attenuation function (H(f)) ;
and , (5) a multiplication unit coupled to the noise filter calculation unit and the input signal for producing the noise reduced signal .

US7016507B1
CLAIM 32
. An apparatus , for reducing noise in an input signal , the apparatus including an input for receiving the input signal , the apparatus comprising : (a) a compression circuit for receiving a compression control signal and generating an amplification control signal in response ;
(b) an amplification unit for receiving an input amplification signal and the amplification control signal and generating an output signal with compression and reduced noise under the control of the amplification control signal ;
(c) an auxiliary noise reduction unit connected to the input for generating an auxiliary noise reduced signal , the compression control signal being the auxiliary noise reduced signal ;
and , (d) a main noise reduction unit connected to the input and the amplification unit for receiving the input signal and generating a noise reduced signal , the input amplification signal being the noise reduced signal ;
wherein , the main noise reduction unit employs a first noise reduction algorithm and the auxiliary noise reduction unit employs a second noise (sound signal, music signal) reduction algorithm , the second noise reduction algorithm being adapted to attack noise more aggressively than the first noise reduction algorithm .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity (high frequencies) detection further comprises updating the noise estimates for a next frame .
US7016507B1
CLAIM 10
. A method as claimed in claim 9 , wherein the oversubtraction factor β is divided by a preemphasis function P(f) to give a modified oversubtraction factor {circumflex over (β)}(f) , the preemphasis function being such as to reduce {circumflex over (β)}(f) at high frequencies (sound activity) , and thereby reduce attenuation at high frequencies .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal (second noise, noise filter) and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
US7016507B1
CLAIM 24
. An apparatus as claimed in claim 23 , wherein the input signal contains speech and the main noise reduction unit comprises : (1) a detector connected to said input and providing a detection signal indicative of the presence of speech ;
(2) magnitude means for determining the magnitude spectrum of the input signal (|X(f)|) , with both the detector and the magnitude means being connected to the input of the apparatus ;
(3) spectral estimate means for generating a noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and being connected to the detector and to the input of the apparatus ;
(4) a noise filter (sound signal, music signal) calculation unit connected to the spectral estimate means and the magnitude means , for receiving the noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and magnitude spectrum of the input signal (|X(f)|) and calculating an attenuation function (H(f)) ;
and , (5) a multiplication unit coupled to the noise filter calculation unit and the input signal for producing the noise reduced signal .

US7016507B1
CLAIM 32
. An apparatus , for reducing noise in an input signal , the apparatus including an input for receiving the input signal , the apparatus comprising : (a) a compression circuit for receiving a compression control signal and generating an amplification control signal in response ;
(b) an amplification unit for receiving an input amplification signal and the amplification control signal and generating an output signal with compression and reduced noise under the control of the amplification control signal ;
(c) an auxiliary noise reduction unit connected to the input for generating an auxiliary noise reduced signal , the compression control signal being the auxiliary noise reduced signal ;
and , (d) a main noise reduction unit connected to the input and the amplification unit for receiving the input signal and generating a noise reduced signal , the input amplification signal being the noise reduced signal ;
wherein , the main noise reduction unit employs a first noise reduction algorithm and the auxiliary noise reduction unit employs a second noise (sound signal, music signal) reduction algorithm , the second noise reduction algorithm being adapted to attack noise more aggressively than the first noise reduction algorithm .

US8990073B2
CLAIM 18
. A method as defined in claim 14 , comprising classifying the sound signal (second noise, noise filter) as one of an inactive sound signal and active sound signal , which comprises determining an inactive sound signal when the average signal-to-noise ratio (SNR av ) is inferior to the calculated threshold .
US7016507B1
CLAIM 24
. An apparatus as claimed in claim 23 , wherein the input signal contains speech and the main noise reduction unit comprises : (1) a detector connected to said input and providing a detection signal indicative of the presence of speech ;
(2) magnitude means for determining the magnitude spectrum of the input signal (|X(f)|) , with both the detector and the magnitude means being connected to the input of the apparatus ;
(3) spectral estimate means for generating a noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and being connected to the detector and to the input of the apparatus ;
(4) a noise filter (sound signal, music signal) calculation unit connected to the spectral estimate means and the magnitude means , for receiving the noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and magnitude spectrum of the input signal (|X(f)|) and calculating an attenuation function (H(f)) ;
and , (5) a multiplication unit coupled to the noise filter calculation unit and the input signal for producing the noise reduced signal .

US7016507B1
CLAIM 32
. An apparatus , for reducing noise in an input signal , the apparatus including an input for receiving the input signal , the apparatus comprising : (a) a compression circuit for receiving a compression control signal and generating an amplification control signal in response ;
(b) an amplification unit for receiving an input amplification signal and the amplification control signal and generating an output signal with compression and reduced noise under the control of the amplification control signal ;
(c) an auxiliary noise reduction unit connected to the input for generating an auxiliary noise reduced signal , the compression control signal being the auxiliary noise reduced signal ;
and , (d) a main noise reduction unit connected to the input and the amplification unit for receiving the input signal and generating a noise reduced signal , the input amplification signal being the noise reduced signal ;
wherein , the main noise reduction unit employs a first noise reduction algorithm and the auxiliary noise reduction unit employs a second noise (sound signal, music signal) reduction algorithm , the second noise reduction algorithm being adapted to attack noise more aggressively than the first noise reduction algorithm .

US8990073B2
CLAIM 19
. A method as defined in claim 14 , comprising classifying the sound signal (second noise, noise filter) as one of an inactive sound signal and active sound signal , which comprises determining an active sound signal when the average signal-to-noise ratio (SNR av ) is larger than the calculated threshold .
US7016507B1
CLAIM 24
. An apparatus as claimed in claim 23 , wherein the input signal contains speech and the main noise reduction unit comprises : (1) a detector connected to said input and providing a detection signal indicative of the presence of speech ;
(2) magnitude means for determining the magnitude spectrum of the input signal (|X(f)|) , with both the detector and the magnitude means being connected to the input of the apparatus ;
(3) spectral estimate means for generating a noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and being connected to the detector and to the input of the apparatus ;
(4) a noise filter (sound signal, music signal) calculation unit connected to the spectral estimate means and the magnitude means , for receiving the noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and magnitude spectrum of the input signal (|X(f)|) and calculating an attenuation function (H(f)) ;
and , (5) a multiplication unit coupled to the noise filter calculation unit and the input signal for producing the noise reduced signal .

US7016507B1
CLAIM 32
. An apparatus , for reducing noise in an input signal , the apparatus including an input for receiving the input signal , the apparatus comprising : (a) a compression circuit for receiving a compression control signal and generating an amplification control signal in response ;
(b) an amplification unit for receiving an input amplification signal and the amplification control signal and generating an output signal with compression and reduced noise under the control of the amplification control signal ;
(c) an auxiliary noise reduction unit connected to the input for generating an auxiliary noise reduced signal , the compression control signal being the auxiliary noise reduced signal ;
and , (d) a main noise reduction unit connected to the input and the amplification unit for receiving the input signal and generating a noise reduced signal , the input amplification signal being the noise reduced signal ;
wherein , the main noise reduction unit employs a first noise reduction algorithm and the auxiliary noise reduction unit employs a second noise (sound signal, music signal) reduction algorithm , the second noise reduction algorithm being adapted to attack noise more aggressively than the first noise reduction algorithm .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal (second noise, noise filter) prevents updating of noise energy estimates when a music signal (second noise, noise filter) is detected .
US7016507B1
CLAIM 24
. An apparatus as claimed in claim 23 , wherein the input signal contains speech and the main noise reduction unit comprises : (1) a detector connected to said input and providing a detection signal indicative of the presence of speech ;
(2) magnitude means for determining the magnitude spectrum of the input signal (|X(f)|) , with both the detector and the magnitude means being connected to the input of the apparatus ;
(3) spectral estimate means for generating a noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and being connected to the detector and to the input of the apparatus ;
(4) a noise filter (sound signal, music signal) calculation unit connected to the spectral estimate means and the magnitude means , for receiving the noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and magnitude spectrum of the input signal (|X(f)|) and calculating an attenuation function (H(f)) ;
and , (5) a multiplication unit coupled to the noise filter calculation unit and the input signal for producing the noise reduced signal .

US7016507B1
CLAIM 32
. An apparatus , for reducing noise in an input signal , the apparatus including an input for receiving the input signal , the apparatus comprising : (a) a compression circuit for receiving a compression control signal and generating an amplification control signal in response ;
(b) an amplification unit for receiving an input amplification signal and the amplification control signal and generating an output signal with compression and reduced noise under the control of the amplification control signal ;
(c) an auxiliary noise reduction unit connected to the input for generating an auxiliary noise reduced signal , the compression control signal being the auxiliary noise reduced signal ;
and , (d) a main noise reduction unit connected to the input and the amplification unit for receiving the input signal and generating a noise reduced signal , the input amplification signal being the noise reduced signal ;
wherein , the main noise reduction unit employs a first noise reduction algorithm and the auxiliary noise reduction unit employs a second noise (sound signal, music signal) reduction algorithm , the second noise reduction algorithm being adapted to attack noise more aggressively than the first noise reduction algorithm .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal (second noise, noise filter) from a background noise signal and prevent update of noise energy estimates on the music signal .
US7016507B1
CLAIM 24
. An apparatus as claimed in claim 23 , wherein the input signal contains speech and the main noise reduction unit comprises : (1) a detector connected to said input and providing a detection signal indicative of the presence of speech ;
(2) magnitude means for determining the magnitude spectrum of the input signal (|X(f)|) , with both the detector and the magnitude means being connected to the input of the apparatus ;
(3) spectral estimate means for generating a noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and being connected to the detector and to the input of the apparatus ;
(4) a noise filter (sound signal, music signal) calculation unit connected to the spectral estimate means and the magnitude means , for receiving the noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and magnitude spectrum of the input signal (|X(f)|) and calculating an attenuation function (H(f)) ;
and , (5) a multiplication unit coupled to the noise filter calculation unit and the input signal for producing the noise reduced signal .

US7016507B1
CLAIM 32
. An apparatus , for reducing noise in an input signal , the apparatus including an input for receiving the input signal , the apparatus comprising : (a) a compression circuit for receiving a compression control signal and generating an amplification control signal in response ;
(b) an amplification unit for receiving an input amplification signal and the amplification control signal and generating an output signal with compression and reduced noise under the control of the amplification control signal ;
(c) an auxiliary noise reduction unit connected to the input for generating an auxiliary noise reduced signal , the compression control signal being the auxiliary noise reduced signal ;
and , (d) a main noise reduction unit connected to the input and the amplification unit for receiving the input signal and generating a noise reduced signal , the input amplification signal being the noise reduced signal ;
wherein , the main noise reduction unit employs a first noise reduction algorithm and the auxiliary noise reduction unit employs a second noise (sound signal, music signal) reduction algorithm , the second noise reduction algorithm being adapted to attack noise more aggressively than the first noise reduction algorithm .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal (second noise, noise filter) in a current frame and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US7016507B1
CLAIM 24
. An apparatus as claimed in claim 23 , wherein the input signal contains speech and the main noise reduction unit comprises : (1) a detector connected to said input and providing a detection signal indicative of the presence of speech ;
(2) magnitude means for determining the magnitude spectrum of the input signal (|X(f)|) , with both the detector and the magnitude means being connected to the input of the apparatus ;
(3) spectral estimate means for generating a noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and being connected to the detector and to the input of the apparatus ;
(4) a noise filter (sound signal, music signal) calculation unit connected to the spectral estimate means and the magnitude means , for receiving the noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and magnitude spectrum of the input signal (|X(f)|) and calculating an attenuation function (H(f)) ;
and , (5) a multiplication unit coupled to the noise filter calculation unit and the input signal for producing the noise reduced signal .

US7016507B1
CLAIM 32
. An apparatus , for reducing noise in an input signal , the apparatus including an input for receiving the input signal , the apparatus comprising : (a) a compression circuit for receiving a compression control signal and generating an amplification control signal in response ;
(b) an amplification unit for receiving an input amplification signal and the amplification control signal and generating an output signal with compression and reduced noise under the control of the amplification control signal ;
(c) an auxiliary noise reduction unit connected to the input for generating an auxiliary noise reduced signal , the compression control signal being the auxiliary noise reduced signal ;
and , (d) a main noise reduction unit connected to the input and the amplification unit for receiving the input signal and generating a noise reduced signal , the input amplification signal being the noise reduced signal ;
wherein , the main noise reduction unit employs a first noise reduction algorithm and the auxiliary noise reduction unit employs a second noise (sound signal, music signal) reduction algorithm , the second noise reduction algorithm being adapted to attack noise more aggressively than the first noise reduction algorithm .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (successive time) indicative of an activity of the sound signal (second noise, noise filter) .
US7016507B1
CLAIM 12
. A method as claimed in claim 8 , wherein the attenuation function (H(f)) is calculated at successive time (first frequency, activity prediction parameter, updating noise energy estimates) frames , and the attenuation function (H(f)) is calculated in accordance with the following equation : G n (f)=(1=γ) H (f)+γ G n-1 (f) wherein G n (f) and G n-1 (f) are the smoothed attenuation functions at the n' ;
th and (n−1) ' ;
th time frames , and γ is a forgetting factor .

US7016507B1
CLAIM 24
. An apparatus as claimed in claim 23 , wherein the input signal contains speech and the main noise reduction unit comprises : (1) a detector connected to said input and providing a detection signal indicative of the presence of speech ;
(2) magnitude means for determining the magnitude spectrum of the input signal (|X(f)|) , with both the detector and the magnitude means being connected to the input of the apparatus ;
(3) spectral estimate means for generating a noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and being connected to the detector and to the input of the apparatus ;
(4) a noise filter (sound signal, music signal) calculation unit connected to the spectral estimate means and the magnitude means , for receiving the noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and magnitude spectrum of the input signal (|X(f)|) and calculating an attenuation function (H(f)) ;
and , (5) a multiplication unit coupled to the noise filter calculation unit and the input signal for producing the noise reduced signal .

US7016507B1
CLAIM 32
. An apparatus , for reducing noise in an input signal , the apparatus including an input for receiving the input signal , the apparatus comprising : (a) a compression circuit for receiving a compression control signal and generating an amplification control signal in response ;
(b) an amplification unit for receiving an input amplification signal and the amplification control signal and generating an output signal with compression and reduced noise under the control of the amplification control signal ;
(c) an auxiliary noise reduction unit connected to the input for generating an auxiliary noise reduced signal , the compression control signal being the auxiliary noise reduced signal ;
and , (d) a main noise reduction unit connected to the input and the amplification unit for receiving the input signal and generating a noise reduced signal , the input amplification signal being the noise reduced signal ;
wherein , the main noise reduction unit employs a first noise reduction algorithm and the auxiliary noise reduction unit employs a second noise (sound signal, music signal) reduction algorithm , the second noise reduction algorithm being adapted to attack noise more aggressively than the first noise reduction algorithm .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (successive time) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal (second noise, noise filter) and the complementary non-stationarity parameter .
US7016507B1
CLAIM 12
. A method as claimed in claim 8 , wherein the attenuation function (H(f)) is calculated at successive time (first frequency, activity prediction parameter, updating noise energy estimates) frames , and the attenuation function (H(f)) is calculated in accordance with the following equation : G n (f)=(1=γ) H (f)+γ G n-1 (f) wherein G n (f) and G n-1 (f) are the smoothed attenuation functions at the n' ;
th and (n−1) ' ;
th time frames , and γ is a forgetting factor .

US7016507B1
CLAIM 24
. An apparatus as claimed in claim 23 , wherein the input signal contains speech and the main noise reduction unit comprises : (1) a detector connected to said input and providing a detection signal indicative of the presence of speech ;
(2) magnitude means for determining the magnitude spectrum of the input signal (|X(f)|) , with both the detector and the magnitude means being connected to the input of the apparatus ;
(3) spectral estimate means for generating a noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and being connected to the detector and to the input of the apparatus ;
(4) a noise filter (sound signal, music signal) calculation unit connected to the spectral estimate means and the magnitude means , for receiving the noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and magnitude spectrum of the input signal (|X(f)|) and calculating an attenuation function (H(f)) ;
and , (5) a multiplication unit coupled to the noise filter calculation unit and the input signal for producing the noise reduced signal .

US7016507B1
CLAIM 32
. An apparatus , for reducing noise in an input signal , the apparatus including an input for receiving the input signal , the apparatus comprising : (a) a compression circuit for receiving a compression control signal and generating an amplification control signal in response ;
(b) an amplification unit for receiving an input amplification signal and the amplification control signal and generating an output signal with compression and reduced noise under the control of the amplification control signal ;
(c) an auxiliary noise reduction unit connected to the input for generating an auxiliary noise reduced signal , the compression control signal being the auxiliary noise reduced signal ;
and , (d) a main noise reduction unit connected to the input and the amplification unit for receiving the input signal and generating a noise reduced signal , the input amplification signal being the noise reduced signal ;
wherein , the main noise reduction unit employs a first noise reduction algorithm and the auxiliary noise reduction unit employs a second noise (sound signal, music signal) reduction algorithm , the second noise reduction algorithm being adapted to attack noise more aggressively than the first noise reduction algorithm .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (successive time) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US7016507B1
CLAIM 12
. A method as claimed in claim 8 , wherein the attenuation function (H(f)) is calculated at successive time (first frequency, activity prediction parameter, updating noise energy estimates) frames , and the attenuation function (H(f)) is calculated in accordance with the following equation : G n (f)=(1=γ) H (f)+γ G n-1 (f) wherein G n (f) and G n-1 (f) are the smoothed attenuation functions at the n' ;
th and (n−1) ' ;
th time frames , and γ is a forgetting factor .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency (successive time) bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US7016507B1
CLAIM 12
. A method as claimed in claim 8 , wherein the attenuation function (H(f)) is calculated at successive time (first frequency, activity prediction parameter, updating noise energy estimates) frames , and the attenuation function (H(f)) is calculated in accordance with the following equation : G n (f)=(1=γ) H (f)+γ G n-1 (f) wherein G n (f) and G n-1 (f) are the smoothed attenuation functions at the n' ;
th and (n−1) ' ;
th time frames , and γ is a forgetting factor .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal (second noise, noise filter) using a frequency spectrum (time frames) of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US7016507B1
CLAIM 12
. A method as claimed in claim 8 , wherein the attenuation function (H(f)) is calculated at successive time frames (frequency spectrum) , and the attenuation function (H(f)) is calculated in accordance with the following equation : G n (f)=(1=γ) H (f)+γ G n-1 (f) wherein G n (f) and G n-1 (f) are the smoothed attenuation functions at the n' ;
th and (n−1) ' ;
th time frames , and γ is a forgetting factor .

US7016507B1
CLAIM 24
. An apparatus as claimed in claim 23 , wherein the input signal contains speech and the main noise reduction unit comprises : (1) a detector connected to said input and providing a detection signal indicative of the presence of speech ;
(2) magnitude means for determining the magnitude spectrum of the input signal (|X(f)|) , with both the detector and the magnitude means being connected to the input of the apparatus ;
(3) spectral estimate means for generating a noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and being connected to the detector and to the input of the apparatus ;
(4) a noise filter (sound signal, music signal) calculation unit connected to the spectral estimate means and the magnitude means , for receiving the noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and magnitude spectrum of the input signal (|X(f)|) and calculating an attenuation function (H(f)) ;
and , (5) a multiplication unit coupled to the noise filter calculation unit and the input signal for producing the noise reduced signal .

US7016507B1
CLAIM 32
. An apparatus , for reducing noise in an input signal , the apparatus including an input for receiving the input signal , the apparatus comprising : (a) a compression circuit for receiving a compression control signal and generating an amplification control signal in response ;
(b) an amplification unit for receiving an input amplification signal and the amplification control signal and generating an output signal with compression and reduced noise under the control of the amplification control signal ;
(c) an auxiliary noise reduction unit connected to the input for generating an auxiliary noise reduced signal , the compression control signal being the auxiliary noise reduced signal ;
and , (d) a main noise reduction unit connected to the input and the amplification unit for receiving the input signal and generating a noise reduced signal , the input amplification signal being the noise reduced signal ;
wherein , the main noise reduction unit employs a first noise reduction algorithm and the auxiliary noise reduction unit employs a second noise (sound signal, music signal) reduction algorithm , the second noise reduction algorithm being adapted to attack noise more aggressively than the first noise reduction algorithm .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal (second noise, noise filter) using a frequency spectrum (time frames) of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US7016507B1
CLAIM 12
. A method as claimed in claim 8 , wherein the attenuation function (H(f)) is calculated at successive time frames (frequency spectrum) , and the attenuation function (H(f)) is calculated in accordance with the following equation : G n (f)=(1=γ) H (f)+γ G n-1 (f) wherein G n (f) and G n-1 (f) are the smoothed attenuation functions at the n' ;
th and (n−1) ' ;
th time frames , and γ is a forgetting factor .

US7016507B1
CLAIM 24
. An apparatus as claimed in claim 23 , wherein the input signal contains speech and the main noise reduction unit comprises : (1) a detector connected to said input and providing a detection signal indicative of the presence of speech ;
(2) magnitude means for determining the magnitude spectrum of the input signal (|X(f)|) , with both the detector and the magnitude means being connected to the input of the apparatus ;
(3) spectral estimate means for generating a noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and being connected to the detector and to the input of the apparatus ;
(4) a noise filter (sound signal, music signal) calculation unit connected to the spectral estimate means and the magnitude means , for receiving the noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and magnitude spectrum of the input signal (|X(f)|) and calculating an attenuation function (H(f)) ;
and , (5) a multiplication unit coupled to the noise filter calculation unit and the input signal for producing the noise reduced signal .

US7016507B1
CLAIM 32
. An apparatus , for reducing noise in an input signal , the apparatus including an input for receiving the input signal , the apparatus comprising : (a) a compression circuit for receiving a compression control signal and generating an amplification control signal in response ;
(b) an amplification unit for receiving an input amplification signal and the amplification control signal and generating an output signal with compression and reduced noise under the control of the amplification control signal ;
(c) an auxiliary noise reduction unit connected to the input for generating an auxiliary noise reduced signal , the compression control signal being the auxiliary noise reduced signal ;
and , (d) a main noise reduction unit connected to the input and the amplification unit for receiving the input signal and generating a noise reduced signal , the input amplification signal being the noise reduced signal ;
wherein , the main noise reduction unit employs a first noise reduction algorithm and the auxiliary noise reduction unit employs a second noise (sound signal, music signal) reduction algorithm , the second noise reduction algorithm being adapted to attack noise more aggressively than the first noise reduction algorithm .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum (time frames) of the sound signal (second noise, noise filter) in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US7016507B1
CLAIM 12
. A method as claimed in claim 8 , wherein the attenuation function (H(f)) is calculated at successive time frames (frequency spectrum) , and the attenuation function (H(f)) is calculated in accordance with the following equation : G n (f)=(1=γ) H (f)+γ G n-1 (f) wherein G n (f) and G n-1 (f) are the smoothed attenuation functions at the n' ;
th and (n−1) ' ;
th time frames , and γ is a forgetting factor .

US7016507B1
CLAIM 24
. An apparatus as claimed in claim 23 , wherein the input signal contains speech and the main noise reduction unit comprises : (1) a detector connected to said input and providing a detection signal indicative of the presence of speech ;
(2) magnitude means for determining the magnitude spectrum of the input signal (|X(f)|) , with both the detector and the magnitude means being connected to the input of the apparatus ;
(3) spectral estimate means for generating a noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and being connected to the detector and to the input of the apparatus ;
(4) a noise filter (sound signal, music signal) calculation unit connected to the spectral estimate means and the magnitude means , for receiving the noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and magnitude spectrum of the input signal (|X(f)|) and calculating an attenuation function (H(f)) ;
and , (5) a multiplication unit coupled to the noise filter calculation unit and the input signal for producing the noise reduced signal .

US7016507B1
CLAIM 32
. An apparatus , for reducing noise in an input signal , the apparatus including an input for receiving the input signal , the apparatus comprising : (a) a compression circuit for receiving a compression control signal and generating an amplification control signal in response ;
(b) an amplification unit for receiving an input amplification signal and the amplification control signal and generating an output signal with compression and reduced noise under the control of the amplification control signal ;
(c) an auxiliary noise reduction unit connected to the input for generating an auxiliary noise reduced signal , the compression control signal being the auxiliary noise reduced signal ;
and , (d) a main noise reduction unit connected to the input and the amplification unit for receiving the input signal and generating a noise reduced signal , the input amplification signal being the noise reduced signal ;
wherein , the main noise reduction unit employs a first noise reduction algorithm and the auxiliary noise reduction unit employs a second noise (sound signal, music signal) reduction algorithm , the second noise reduction algorithm being adapted to attack noise more aggressively than the first noise reduction algorithm .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin (periodic signal) by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US7016507B1
CLAIM 19
. A method as claimed in claim 1 , wherein detecting the presence or absence of speech comprises : (1) taking a block of the input signal and performing an auto-correlation on that block to form a correlated signal ;
and , (2) checking the correlated signal for the presence of a periodic signal (frequency bin) having a pitch corresponding to that for a desired audio signal .

US8990073B2
CLAIM 34
. A device as defined in claim 31 , further comprising a detector of strong tones in the sound signal (second noise, noise filter) .
US7016507B1
CLAIM 24
. An apparatus as claimed in claim 23 , wherein the input signal contains speech and the main noise reduction unit comprises : (1) a detector connected to said input and providing a detection signal indicative of the presence of speech ;
(2) magnitude means for determining the magnitude spectrum of the input signal (|X(f)|) , with both the detector and the magnitude means being connected to the input of the apparatus ;
(3) spectral estimate means for generating a noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and being connected to the detector and to the input of the apparatus ;
(4) a noise filter (sound signal, music signal) calculation unit connected to the spectral estimate means and the magnitude means , for receiving the noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and magnitude spectrum of the input signal (|X(f)|) and calculating an attenuation function (H(f)) ;
and , (5) a multiplication unit coupled to the noise filter calculation unit and the input signal for producing the noise reduced signal .

US7016507B1
CLAIM 32
. An apparatus , for reducing noise in an input signal , the apparatus including an input for receiving the input signal , the apparatus comprising : (a) a compression circuit for receiving a compression control signal and generating an amplification control signal in response ;
(b) an amplification unit for receiving an input amplification signal and the amplification control signal and generating an output signal with compression and reduced noise under the control of the amplification control signal ;
(c) an auxiliary noise reduction unit connected to the input for generating an auxiliary noise reduced signal , the compression control signal being the auxiliary noise reduced signal ;
and , (d) a main noise reduction unit connected to the input and the amplification unit for receiving the input signal and generating a noise reduced signal , the input amplification signal being the noise reduced signal ;
wherein , the main noise reduction unit employs a first noise reduction algorithm and the auxiliary noise reduction unit employs a second noise (sound signal, music signal) reduction algorithm , the second noise reduction algorithm being adapted to attack noise more aggressively than the first noise reduction algorithm .

US8990073B2
CLAIM 35
. A device for detecting sound activity (high frequencies) in a sound signal (second noise, noise filter) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (second noise, noise filter) from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US7016507B1
CLAIM 10
. A method as claimed in claim 9 , wherein the oversubtraction factor β is divided by a preemphasis function P(f) to give a modified oversubtraction factor {circumflex over (β)}(f) , the preemphasis function being such as to reduce {circumflex over (β)}(f) at high frequencies (sound activity) , and thereby reduce attenuation at high frequencies .

US7016507B1
CLAIM 24
. An apparatus as claimed in claim 23 , wherein the input signal contains speech and the main noise reduction unit comprises : (1) a detector connected to said input and providing a detection signal indicative of the presence of speech ;
(2) magnitude means for determining the magnitude spectrum of the input signal (|X(f)|) , with both the detector and the magnitude means being connected to the input of the apparatus ;
(3) spectral estimate means for generating a noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and being connected to the detector and to the input of the apparatus ;
(4) a noise filter (sound signal, music signal) calculation unit connected to the spectral estimate means and the magnitude means , for receiving the noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and magnitude spectrum of the input signal (|X(f)|) and calculating an attenuation function (H(f)) ;
and , (5) a multiplication unit coupled to the noise filter calculation unit and the input signal for producing the noise reduced signal .

US7016507B1
CLAIM 32
. An apparatus , for reducing noise in an input signal , the apparatus including an input for receiving the input signal , the apparatus comprising : (a) a compression circuit for receiving a compression control signal and generating an amplification control signal in response ;
(b) an amplification unit for receiving an input amplification signal and the amplification control signal and generating an output signal with compression and reduced noise under the control of the amplification control signal ;
(c) an auxiliary noise reduction unit connected to the input for generating an auxiliary noise reduced signal , the compression control signal being the auxiliary noise reduced signal ;
and , (d) a main noise reduction unit connected to the input and the amplification unit for receiving the input signal and generating a noise reduced signal , the input amplification signal being the noise reduced signal ;
wherein , the main noise reduction unit employs a first noise reduction algorithm and the auxiliary noise reduction unit employs a second noise (sound signal, music signal) reduction algorithm , the second noise reduction algorithm being adapted to attack noise more aggressively than the first noise reduction algorithm .

US8990073B2
CLAIM 36
. A device for detecting sound activity (high frequencies) in a sound signal (second noise, noise filter) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal (second noise, noise filter) from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US7016507B1
CLAIM 10
. A method as claimed in claim 9 , wherein the oversubtraction factor β is divided by a preemphasis function P(f) to give a modified oversubtraction factor {circumflex over (β)}(f) , the preemphasis function being such as to reduce {circumflex over (β)}(f) at high frequencies (sound activity) , and thereby reduce attenuation at high frequencies .

US7016507B1
CLAIM 24
. An apparatus as claimed in claim 23 , wherein the input signal contains speech and the main noise reduction unit comprises : (1) a detector connected to said input and providing a detection signal indicative of the presence of speech ;
(2) magnitude means for determining the magnitude spectrum of the input signal (|X(f)|) , with both the detector and the magnitude means being connected to the input of the apparatus ;
(3) spectral estimate means for generating a noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and being connected to the detector and to the input of the apparatus ;
(4) a noise filter (sound signal, music signal) calculation unit connected to the spectral estimate means and the magnitude means , for receiving the noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and magnitude spectrum of the input signal (|X(f)|) and calculating an attenuation function (H(f)) ;
and , (5) a multiplication unit coupled to the noise filter calculation unit and the input signal for producing the noise reduced signal .

US7016507B1
CLAIM 32
. An apparatus , for reducing noise in an input signal , the apparatus including an input for receiving the input signal , the apparatus comprising : (a) a compression circuit for receiving a compression control signal and generating an amplification control signal in response ;
(b) an amplification unit for receiving an input amplification signal and the amplification control signal and generating an output signal with compression and reduced noise under the control of the amplification control signal ;
(c) an auxiliary noise reduction unit connected to the input for generating an auxiliary noise reduced signal , the compression control signal being the auxiliary noise reduced signal ;
and , (d) a main noise reduction unit connected to the input and the amplification unit for receiving the input signal and generating a noise reduced signal , the input amplification signal being the noise reduced signal ;
wherein , the main noise reduction unit employs a first noise reduction algorithm and the auxiliary noise reduction unit employs a second noise (sound signal, music signal) reduction algorithm , the second noise reduction algorithm being adapted to attack noise more aggressively than the first noise reduction algorithm .

US8990073B2
CLAIM 37
. A device as defined in claim 36 , further comprising a signal-to-noise ratio (SNR)-based sound activity (high frequencies) detector .
US7016507B1
CLAIM 10
. A method as claimed in claim 9 , wherein the oversubtraction factor β is divided by a preemphasis function P(f) to give a modified oversubtraction factor {circumflex over (β)}(f) , the preemphasis function being such as to reduce {circumflex over (β)}(f) at high frequencies (sound activity) , and thereby reduce attenuation at high frequencies .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity (high frequencies) detector comprises a comparator of an average signal to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US7016507B1
CLAIM 10
. A method as claimed in claim 9 , wherein the oversubtraction factor β is divided by a preemphasis function P(f) to give a modified oversubtraction factor {circumflex over (β)}(f) , the preemphasis function being such as to reduce {circumflex over (β)}(f) at high frequencies (sound activity) , and thereby reduce attenuation at high frequencies .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates (successive time) in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity (high frequencies) detector .
US7016507B1
CLAIM 10
. A method as claimed in claim 9 , wherein the oversubtraction factor β is divided by a preemphasis function P(f) to give a modified oversubtraction factor {circumflex over (β)}(f) , the preemphasis function being such as to reduce {circumflex over (β)}(f) at high frequencies (sound activity) , and thereby reduce attenuation at high frequencies .

US7016507B1
CLAIM 12
. A method as claimed in claim 8 , wherein the attenuation function (H(f)) is calculated at successive time (first frequency, activity prediction parameter, updating noise energy estimates) frames , and the attenuation function (H(f)) is calculated in accordance with the following equation : G n (f)=(1=γ) H (f)+γ G n-1 (f) wherein G n (f) and G n-1 (f) are the smoothed attenuation functions at the n' ;
th and (n−1) ' ;
th time frames , and γ is a forgetting factor .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal (second noise, noise filter) for distinguishing a music signal (second noise, noise filter) from a background noise signal and preventing update of noise energy estimates .
US7016507B1
CLAIM 24
. An apparatus as claimed in claim 23 , wherein the input signal contains speech and the main noise reduction unit comprises : (1) a detector connected to said input and providing a detection signal indicative of the presence of speech ;
(2) magnitude means for determining the magnitude spectrum of the input signal (|X(f)|) , with both the detector and the magnitude means being connected to the input of the apparatus ;
(3) spectral estimate means for generating a noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and being connected to the detector and to the input of the apparatus ;
(4) a noise filter (sound signal, music signal) calculation unit connected to the spectral estimate means and the magnitude means , for receiving the noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and magnitude spectrum of the input signal (|X(f)|) and calculating an attenuation function (H(f)) ;
and , (5) a multiplication unit coupled to the noise filter calculation unit and the input signal for producing the noise reduced signal .

US7016507B1
CLAIM 32
. An apparatus , for reducing noise in an input signal , the apparatus including an input for receiving the input signal , the apparatus comprising : (a) a compression circuit for receiving a compression control signal and generating an amplification control signal in response ;
(b) an amplification unit for receiving an input amplification signal and the amplification control signal and generating an output signal with compression and reduced noise under the control of the amplification control signal ;
(c) an auxiliary noise reduction unit connected to the input for generating an auxiliary noise reduced signal , the compression control signal being the auxiliary noise reduced signal ;
and , (d) a main noise reduction unit connected to the input and the amplification unit for receiving the input signal and generating a noise reduced signal , the input amplification signal being the noise reduced signal ;
wherein , the main noise reduction unit employs a first noise reduction algorithm and the auxiliary noise reduction unit employs a second noise (sound signal, music signal) reduction algorithm , the second noise reduction algorithm being adapted to attack noise more aggressively than the first noise reduction algorithm .

US8990073B2
CLAIM 41
. A device as defined in claim 36 , further comprising a calculator of a spectral parameter used for detecting spectral changes and spectral attacks in the sound signal (second noise, noise filter) .
US7016507B1
CLAIM 24
. An apparatus as claimed in claim 23 , wherein the input signal contains speech and the main noise reduction unit comprises : (1) a detector connected to said input and providing a detection signal indicative of the presence of speech ;
(2) magnitude means for determining the magnitude spectrum of the input signal (|X(f)|) , with both the detector and the magnitude means being connected to the input of the apparatus ;
(3) spectral estimate means for generating a noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and being connected to the detector and to the input of the apparatus ;
(4) a noise filter (sound signal, music signal) calculation unit connected to the spectral estimate means and the magnitude means , for receiving the noise magnitude spectral estimate (|{circumflex over (N)}(f)|) and magnitude spectrum of the input signal (|X(f)|) and calculating an attenuation function (H(f)) ;
and , (5) a multiplication unit coupled to the noise filter calculation unit and the input signal for producing the noise reduced signal .

US7016507B1
CLAIM 32
. An apparatus , for reducing noise in an input signal , the apparatus including an input for receiving the input signal , the apparatus comprising : (a) a compression circuit for receiving a compression control signal and generating an amplification control signal in response ;
(b) an amplification unit for receiving an input amplification signal and the amplification control signal and generating an output signal with compression and reduced noise under the control of the amplification control signal ;
(c) an auxiliary noise reduction unit connected to the input for generating an auxiliary noise reduced signal , the compression control signal being the auxiliary noise reduced signal ;
and , (d) a main noise reduction unit connected to the input and the amplification unit for receiving the input signal and generating a noise reduced signal , the input amplification signal being the noise reduced signal ;
wherein , the main noise reduction unit employs a first noise reduction algorithm and the auxiliary noise reduction unit employs a second noise (sound signal, music signal) reduction algorithm , the second noise reduction algorithm being adapted to attack noise more aggressively than the first noise reduction algorithm .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6317501B1

Filed: 1998-03-16     Issued: 2001-11-13

Microphone array apparatus

(Original Assignee) Fujitsu Ltd     (Current Assignee) Fujitsu Ltd

Naoshi Matsuo
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (microphone array) using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal (microphone array) in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (microphone array) .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (microphone array) comprises searching in the correlation map for frequency bins having a magnitude that exceeds a given fixed threshold .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (microphone array) comprises comparing the summed long-term correlation map with an adaptive threshold indicative of sound activity (microphone array) in the sound signal .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 10
. A method for detecting sound activity (microphone array) in a sound signal (microphone array) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound signal (microphone array) is detected .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity (microphone array) in the sound signal (microphone array) further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (microphone array) detection comprises detecting the sound signal (microphone array) based on a frequency dependent signal-to-noise ratio (SNR) .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 14
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (microphone array) detection comprises comparing an average signal-to-noise ratio (SNR av ) to a threshold calculated as a function of a long-term signal-to-noise ratio (SNR LT ) .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity (microphone array) detection in the sound signal (microphone array) further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity (microphone array) detection further comprises updating the noise estimates for a next frame .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal (microphone array) and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 18
. A method as defined in claim 14 , comprising classifying the sound signal (microphone array) as one of an inactive sound signal and active sound signal , which comprises determining an inactive sound signal when the average signal-to-noise ratio (SNR av ) is inferior to the calculated threshold .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 19
. A method as defined in claim 14 , comprising classifying the sound signal (microphone array) as one of an inactive sound signal and active sound signal , which comprises determining an active sound signal when the average signal-to-noise ratio (SNR av ) is larger than the calculated threshold .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal (microphone array) prevents updating of noise energy estimates when a music signal is detected .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal (microphone array) in a current frame and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter indicative of an activity of the sound signal (microphone array) .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal (microphone array) and the complementary non-stationarity parameter .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal (microphone array) using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal (microphone array) using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal (microphone array) in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 34
. A device as defined in claim 31 , further comprising a detector of strong tones in the sound signal (microphone array) .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 35
. A device for detecting sound activity (microphone array) in a sound signal (microphone array) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 36
. A device for detecting sound activity (microphone array) in a sound signal (microphone array) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 37
. A device as defined in claim 36 , further comprising a signal-to-noise ratio (SNR)-based sound activity (microphone array) detector .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity (microphone array) detector comprises a comparator of an average signal to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity (microphone array) detector .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal (microphone array) for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .

US8990073B2
CLAIM 41
. A device as defined in claim 36 , further comprising a calculator of a spectral parameter used for detecting spectral changes and spectral attacks in the sound signal (microphone array) .
US6317501B1
CLAIM 1
. A microphone array (sound signal, sound activity, sound activity detector) apparatus comprising : a microphone array including microphones , one of the microphones being a reference microphone ;
filters receiving output signals of the microphones ;
and a filter coefficient calculator which receives the output signals of the microphones , a noise and a residual signal obtained by subtracting filtered output signals of those of the microphones other than the reference microphone from a filtered output signal of the reference microphone and which obtain filter coefficients of the filters in accordance with an evaluation function represented by power of the residual signal .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6134518A

Filed: 1998-03-04     Issued: 2000-10-17

Digital audio signal coding using a CELP coder and a transform coder

(Original Assignee) International Business Machines Corp     (Current Assignee) Cisco Technology Inc

Gilad Cohen, Yossef Cohen, Doron Hoffman, Hagai Krupnik, Aharon Satt
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map (signal processor, decoding method) .
US6134518A
CLAIM 6
. Apparatus for digitally decoding an input signal comprising coded data for a series of frames of audio data , comprising : logic to detect an indication in the coded data stream for each frame as to whether the frame has been encoded using a first coder or a second coder ;
first and second decoders for digitally decoding the input signal using first and second decoding method (second group, sound activity detector, term correlation map, second energy values) s respectively ;
a switching arrangement , for each frame , directing the generation of an output signal by decoding the input signal using either the first or second decoders according to the detected indication ;
and wherein the first decoder is a CELP decoder and the second decoder is a transform decoder and when switching from the mode of operation of decoding CELP encoded frames to transform encoded frames , the transform coder uses the information in an extended CELP frame when decoding the first frame encoded using the transform coder .

US6134518A
CLAIM 13
. A computer program product which includes suitable program code means for causing a general purpose computer or digital signal processor (second group, sound activity detector, term correlation map, second energy values) to perform a method as claimed in claim 7 .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value (correlation value, more threshold) with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US6134518A
CLAIM 2
. Apparatus as claimed in claim 1 , wherein the distinguishing parameter comprises an autocorrelation value (correlation value) .

US6134518A
CLAIM 5
. Apparatus as claimed in claim 1 , comprising means arranged to compare the averaged speech probability value with one or more threshold (correlation value) s to determine the state of each frame .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis (method steps) ;

and summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US6134518A
CLAIM 20
. A program storage device readable by machine , tangibly embodying a program of instructions executable by the machine to perform method steps (frequency bin basis) for causing a digitally encoding of an input audio signal for storage or transmission wherein the input audio signal comprises a series of signal samples ordered in time and divided into frames , said method steps comprising : measuring a distinguishing parameter from the input signal , determining from the measured distinguishing parameter whether the input signal contains an audio signal of a first type or a second type ;
and generating an output signal by encoding the input signal using either first or second coding methods according to whether the input signal contains an audio signal of the first type or the second type at that time , wherein the first coding method is CELP coding and the second coding method is transform coding , and wherein the input signal is coded on a frame-by-frame basis , the transform coding comprising encoding a frame using a discrete frequency domain transform of a range of samples from a plurality of neighboring frames , and wherein the CELP coding comprises generating the last CELP encoded frame prior to a switch from a mode of operation in which frames are encoded using the CELP coding to a mode of operation in which frames are encoded using transform coding by encoding an extended frame , the extended frame covering the same range of samples as the transform coding , so that a transform decoder can generate the information required to decode the first frame encoded using the transform coding from the last CELP encoded frame .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group (signal processor, decoding method) of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values (signal processor, decoding method) so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US6134518A
CLAIM 6
. Apparatus for digitally decoding an input signal comprising coded data for a series of frames of audio data , comprising : logic to detect an indication in the coded data stream for each frame as to whether the frame has been encoded using a first coder or a second coder ;
first and second decoders for digitally decoding the input signal using first and second decoding method (second group, sound activity detector, term correlation map, second energy values) s respectively ;
a switching arrangement , for each frame , directing the generation of an output signal by decoding the input signal using either the first or second decoders according to the detected indication ;
and wherein the first decoder is a CELP decoder and the second decoder is a transform decoder and when switching from the mode of operation of decoding CELP encoded frames to transform encoded frames , the transform coder uses the information in an extended CELP frame when decoding the first frame encoded using the transform coder .

US6134518A
CLAIM 13
. A computer program product which includes suitable program code means for causing a general purpose computer or digital signal processor (second group, sound activity detector, term correlation map, second energy values) to perform a method as claimed in claim 7 .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis (method steps) ;

and an adder for summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US6134518A
CLAIM 20
. A program storage device readable by machine , tangibly embodying a program of instructions executable by the machine to perform method steps (frequency bin basis) for causing a digitally encoding of an input audio signal for storage or transmission wherein the input audio signal comprises a series of signal samples ordered in time and divided into frames , said method steps comprising : measuring a distinguishing parameter from the input signal , determining from the measured distinguishing parameter whether the input signal contains an audio signal of a first type or a second type ;
and generating an output signal by encoding the input signal using either first or second coding methods according to whether the input signal contains an audio signal of the first type or the second type at that time , wherein the first coding method is CELP coding and the second coding method is transform coding , and wherein the input signal is coded on a frame-by-frame basis , the transform coding comprising encoding a frame using a discrete frequency domain transform of a range of samples from a plurality of neighboring frames , and wherein the CELP coding comprises generating the last CELP encoded frame prior to a switch from a mode of operation in which frames are encoded using the CELP coding to a mode of operation in which frames are encoded using transform coding by encoding an extended frame , the extended frame covering the same range of samples as the transform coding , so that a transform decoder can generate the information required to decode the first frame encoded using the transform coding from the last CELP encoded frame .

US8990073B2
CLAIM 37
. A device as defined in claim 36 , further comprising a signal-to-noise ratio (SNR)-based sound activity detector (signal processor, decoding method) .
US6134518A
CLAIM 6
. Apparatus for digitally decoding an input signal comprising coded data for a series of frames of audio data , comprising : logic to detect an indication in the coded data stream for each frame as to whether the frame has been encoded using a first coder or a second coder ;
first and second decoders for digitally decoding the input signal using first and second decoding method (second group, sound activity detector, term correlation map, second energy values) s respectively ;
a switching arrangement , for each frame , directing the generation of an output signal by decoding the input signal using either the first or second decoders according to the detected indication ;
and wherein the first decoder is a CELP decoder and the second decoder is a transform decoder and when switching from the mode of operation of decoding CELP encoded frames to transform encoded frames , the transform coder uses the information in an extended CELP frame when decoding the first frame encoded using the transform coder .

US6134518A
CLAIM 13
. A computer program product which includes suitable program code means for causing a general purpose computer or digital signal processor (second group, sound activity detector, term correlation map, second energy values) to perform a method as claimed in claim 7 .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector (signal processor, decoding method) comprises a comparator of an average signal to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US6134518A
CLAIM 6
. Apparatus for digitally decoding an input signal comprising coded data for a series of frames of audio data , comprising : logic to detect an indication in the coded data stream for each frame as to whether the frame has been encoded using a first coder or a second coder ;
first and second decoders for digitally decoding the input signal using first and second decoding method (second group, sound activity detector, term correlation map, second energy values) s respectively ;
a switching arrangement , for each frame , directing the generation of an output signal by decoding the input signal using either the first or second decoders according to the detected indication ;
and wherein the first decoder is a CELP decoder and the second decoder is a transform decoder and when switching from the mode of operation of decoding CELP encoded frames to transform encoded frames , the transform coder uses the information in an extended CELP frame when decoding the first frame encoded using the transform coder .

US6134518A
CLAIM 13
. A computer program product which includes suitable program code means for causing a general purpose computer or digital signal processor (second group, sound activity detector, term correlation map, second energy values) to perform a method as claimed in claim 7 .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector (signal processor, decoding method) .
US6134518A
CLAIM 6
. Apparatus for digitally decoding an input signal comprising coded data for a series of frames of audio data , comprising : logic to detect an indication in the coded data stream for each frame as to whether the frame has been encoded using a first coder or a second coder ;
first and second decoders for digitally decoding the input signal using first and second decoding method (second group, sound activity detector, term correlation map, second energy values) s respectively ;
a switching arrangement , for each frame , directing the generation of an output signal by decoding the input signal using either the first or second decoders according to the detected indication ;
and wherein the first decoder is a CELP decoder and the second decoder is a transform decoder and when switching from the mode of operation of decoding CELP encoded frames to transform encoded frames , the transform coder uses the information in an extended CELP frame when decoding the first frame encoded using the transform coder .

US6134518A
CLAIM 13
. A computer program product which includes suitable program code means for causing a general purpose computer or digital signal processor (second group, sound activity detector, term correlation map, second energy values) to perform a method as claimed in claim 7 .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US5978824A

Filed: 1998-01-29     Issued: 1999-11-02

Noise canceler

(Original Assignee) NEC Corp     (Current Assignee) NEC Corp

Shigeji Ikeda
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (first maximum value) of the long term correlation map .
US5978824A
CLAIM 3
. A noise canceler as claimed in claim 1 , wherein said step size outputting means comprises : means for inputting said estimated value of the power ratio of the main signal to the noise signal to a preselected monotonously increasing function to thereby calculate a first function value ;
and means for outputting as said first step size said first function value when said first function value is between a first maximum value (initial value, correlation value, term value, first energy value) and a first minimum value , or outputting said first maximum value when said first function value is greater than said first maximum value , or outputting said first minimum value when said first function value is smaller than said first minimum value .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value (first maximum value) with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US5978824A
CLAIM 3
. A noise canceler as claimed in claim 1 , wherein said step size outputting means comprises : means for inputting said estimated value of the power ratio of the main signal to the noise signal to a preselected monotonously increasing function to thereby calculate a first function value ;
and means for outputting as said first step size said first function value when said first function value is between a first maximum value (initial value, correlation value, term value, first energy value) and a first minimum value , or outputting said first maximum value when said first function value is greater than said first maximum value , or outputting said first minimum value when said first function value is smaller than said first minimum value .

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (second noise signal, first noise signal) ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US5978824A
CLAIM 1
. A noise canceler comprising : first delaying means for delaying a main signal containing a desired signal and a noise signal by a preselected period of time to thereby output a delayed main signal ;
second delaying means for receiving the noise signal as a reference signal and delaying the reference singal by the preselected period of time to thereby output a delayed reference signal ;
first subtracting means for subtracting a first estimated noise signal from said delayed main signal to thereby generate a first desired signal output ;
second subtracting means for subtracting a first estimated desired signal from said delayed reference signal to thereby generate a first noise signal (background noise signal) output ;
a first adaptive filter for receiving said first noise signal output and adaptively estimating a noise signal contained in said delayed main signal to thereby output said first estimated noise signal ;
a second adaptive filter for receiving said first desired signal output and adaptively estimating a desired signal contained in said delayed reference singal to thereby output said first estimated desired signal ;
signal-to-noise power ratio estimating means for receiving said main signal and said reference signal and calculating desired signal power and noise signal power of the main signal and desired signal power and noise signal power of the reference signal to thereby output an estimated value of a power ratio of the main signal to the noise signal and an estimated value of a power ratio of the reference signal to the noise signal ;
and step size outputting means for receiving said estimated values from said signal-to-noise power ratio estimating means to thereby output a first and a second step size representative of an amount of correction of a filter coefficient of said first adaptive filter and an amount of correction of a filter coefficient of said second adaptive filter , respectively .

US5978824A
CLAIM 2
. A noise canceler as claimed in claim 1 , wherein said signal-to-noise power ratio estimating means comprises : third subtracting means for subtracting a second estimated noise signal from the main signal to thereby generate a second desired signal output ;
fourth subtracting means for subtracting a second estimated desired signal from the reference signal to thereby generate a second noise signal (background noise signal) output ;
a third adaptive filter for receiving said second noise signal output and adaptively estimating a noise signal contained in the main signal to thereby output said second estimated noise signal ;
a fourth adaptive filter for receiving said second desired signal output and adaptively estimating a desired signal contained in the reference signal to thereby output a second estimated desired signal ;
first power averaging means for receiving said second desired signal output and producing a square mean of said second desired signal output to thereby output desired signal power of the main signal ;
second power averaging means for receiving said second estimated noise signal and producing a square mean of said second estimated noise signal to thereby output noise signal power of the main signal ;
third power averaging means for receiving said second estimated desired signal and producing a square mean of said second estimated desired signal to thereby output desired signal power of the reference signal ;
fourth power averaging means for receiving said second noise signal output and producing a square mean of said second noise signal to thereby output noise signal power of the reference signal ;
first dividing means for dividing said desired signal power of the main signal by said noise signa power of the main signal to thereby output an estimated value of a power ratio of the main signal to the noise signla ;
and second dividing means for dividing said desired signal power of the reference signal by said noise signal power of the reference signal to thereby output an estimated value of a power ratio of the reference signal to the noise signal .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates (adaptive filter) when a tonal sound signal is detected .
US5978824A
CLAIM 1
. A noise canceler comprising : first delaying means for delaying a main signal containing a desired signal and a noise signal by a preselected period of time to thereby output a delayed main signal ;
second delaying means for receiving the noise signal as a reference signal and delaying the reference singal by the preselected period of time to thereby output a delayed reference signal ;
first subtracting means for subtracting a first estimated noise signal from said delayed main signal to thereby generate a first desired signal output ;
second subtracting means for subtracting a first estimated desired signal from said delayed reference signal to thereby generate a first noise signal output ;
a first adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) for receiving said first noise signal output and adaptively estimating a noise signal contained in said delayed main signal to thereby output said first estimated noise signal ;
a second adaptive filter for receiving said first desired signal output and adaptively estimating a desired signal contained in said delayed reference singal to thereby output said first estimated desired signal ;
signal-to-noise power ratio estimating means for receiving said main signal and said reference signal and calculating desired signal power and noise signal power of the main signal and desired signal power and noise signal power of the reference signal to thereby output an estimated value of a power ratio of the main signal to the noise signal and an estimated value of a power ratio of the reference signal to the noise signal ;
and step size outputting means for receiving said estimated values from said signal-to-noise power ratio estimating means to thereby output a first and a second step size representative of an amount of correction of a filter coefficient of said first adaptive filter and an amount of correction of a filter coefficient of said second adaptive filter , respectively .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates (adaptive filter) calculated in a previous frame in a SNR calculation (noise power) .
US5978824A
CLAIM 1
. A noise canceler comprising : first delaying means for delaying a main signal containing a desired signal and a noise signal by a preselected period of time to thereby output a delayed main signal ;
second delaying means for receiving the noise signal as a reference signal and delaying the reference singal by the preselected period of time to thereby output a delayed reference signal ;
first subtracting means for subtracting a first estimated noise signal from said delayed main signal to thereby generate a first desired signal output ;
second subtracting means for subtracting a first estimated desired signal from said delayed reference signal to thereby generate a first noise signal output ;
a first adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) for receiving said first noise signal output and adaptively estimating a noise signal contained in said delayed main signal to thereby output said first estimated noise signal ;
a second adaptive filter for receiving said first desired signal output and adaptively estimating a desired signal contained in said delayed reference singal to thereby output said first estimated desired signal ;
signal-to-noise power (SNR LT, SNR calculation) ratio estimating means for receiving said main signal and said reference signal and calculating desired signal power and noise signal power of the main signal and desired signal power and noise signal power of the reference signal to thereby output an estimated value of a power ratio of the main signal to the noise signal and an estimated value of a power ratio of the reference signal to the noise signal ;
and step size outputting means for receiving said estimated values from said signal-to-noise power ratio estimating means to thereby output a first and a second step size representative of an amount of correction of a filter coefficient of said first adaptive filter and an amount of correction of a filter coefficient of said second adaptive filter , respectively .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection further comprises updating the noise estimates (adaptive filter) for a next frame .
US5978824A
CLAIM 1
. A noise canceler comprising : first delaying means for delaying a main signal containing a desired signal and a noise signal by a preselected period of time to thereby output a delayed main signal ;
second delaying means for receiving the noise signal as a reference signal and delaying the reference singal by the preselected period of time to thereby output a delayed reference signal ;
first subtracting means for subtracting a first estimated noise signal from said delayed main signal to thereby generate a first desired signal output ;
second subtracting means for subtracting a first estimated desired signal from said delayed reference signal to thereby generate a first noise signal output ;
a first adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) for receiving said first noise signal output and adaptively estimating a noise signal contained in said delayed main signal to thereby output said first estimated noise signal ;
a second adaptive filter for receiving said first desired signal output and adaptively estimating a desired signal contained in said delayed reference singal to thereby output said first estimated desired signal ;
signal-to-noise power ratio estimating means for receiving said main signal and said reference signal and calculating desired signal power and noise signal power of the main signal and desired signal power and noise signal power of the reference signal to thereby output an estimated value of a power ratio of the main signal to the noise signal and an estimated value of a power ratio of the reference signal to the noise signal ;
and step size outputting means for receiving said estimated values from said signal-to-noise power ratio estimating means to thereby output a first and a second step size representative of an amount of correction of a filter coefficient of said first adaptive filter and an amount of correction of a filter coefficient of said second adaptive filter , respectively .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates (adaptive filter) for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction (first power) residual error energies .
US5978824A
CLAIM 1
. A noise canceler comprising : first delaying means for delaying a main signal containing a desired signal and a noise signal by a preselected period of time to thereby output a delayed main signal ;
second delaying means for receiving the noise signal as a reference signal and delaying the reference singal by the preselected period of time to thereby output a delayed reference signal ;
first subtracting means for subtracting a first estimated noise signal from said delayed main signal to thereby generate a first desired signal output ;
second subtracting means for subtracting a first estimated desired signal from said delayed reference signal to thereby generate a first noise signal output ;
a first adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) for receiving said first noise signal output and adaptively estimating a noise signal contained in said delayed main signal to thereby output said first estimated noise signal ;
a second adaptive filter for receiving said first desired signal output and adaptively estimating a desired signal contained in said delayed reference singal to thereby output said first estimated desired signal ;
signal-to-noise power ratio estimating means for receiving said main signal and said reference signal and calculating desired signal power and noise signal power of the main signal and desired signal power and noise signal power of the reference signal to thereby output an estimated value of a power ratio of the main signal to the noise signal and an estimated value of a power ratio of the reference signal to the noise signal ;
and step size outputting means for receiving said estimated values from said signal-to-noise power ratio estimating means to thereby output a first and a second step size representative of an amount of correction of a filter coefficient of said first adaptive filter and an amount of correction of a filter coefficient of said second adaptive filter , respectively .

US5978824A
CLAIM 2
. A noise canceler as claimed in claim 1 , wherein said signal-to-noise power ratio estimating means comprises : third subtracting means for subtracting a second estimated noise signal from the main signal to thereby generate a second desired signal output ;
fourth subtracting means for subtracting a second estimated desired signal from the reference signal to thereby generate a second noise signal output ;
a third adaptive filter for receiving said second noise signal output and adaptively estimating a noise signal contained in the main signal to thereby output said second estimated noise signal ;
a fourth adaptive filter for receiving said second desired signal output and adaptively estimating a desired signal contained in the reference signal to thereby output a second estimated desired signal ;
first power (linear prediction) averaging means for receiving said second desired signal output and producing a square mean of said second desired signal output to thereby output desired signal power of the main signal ;
second power averaging means for receiving said second estimated noise signal and producing a square mean of said second estimated noise signal to thereby output noise signal power of the main signal ;
third power averaging means for receiving said second estimated desired signal and producing a square mean of said second estimated desired signal to thereby output desired signal power of the reference signal ;
fourth power averaging means for receiving said second noise signal output and producing a square mean of said second noise signal to thereby output noise signal power of the reference signal ;
first dividing means for dividing said desired signal power of the main signal by said noise signa power of the main signal to thereby output an estimated value of a power ratio of the main signal to the noise signla ;
and second dividing means for dividing said desired signal power of the reference signal by said noise signal power of the reference signal to thereby output an estimated value of a power ratio of the reference signal to the noise signal .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal prevents updating of noise energy estimates (adaptive filter) when a music signal is detected .
US5978824A
CLAIM 1
. A noise canceler comprising : first delaying means for delaying a main signal containing a desired signal and a noise signal by a preselected period of time to thereby output a delayed main signal ;
second delaying means for receiving the noise signal as a reference signal and delaying the reference singal by the preselected period of time to thereby output a delayed reference signal ;
first subtracting means for subtracting a first estimated noise signal from said delayed main signal to thereby generate a first desired signal output ;
second subtracting means for subtracting a first estimated desired signal from said delayed reference signal to thereby generate a first noise signal output ;
a first adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) for receiving said first noise signal output and adaptively estimating a noise signal contained in said delayed main signal to thereby output said first estimated noise signal ;
a second adaptive filter for receiving said first desired signal output and adaptively estimating a desired signal contained in said delayed reference singal to thereby output said first estimated desired signal ;
signal-to-noise power ratio estimating means for receiving said main signal and said reference signal and calculating desired signal power and noise signal power of the main signal and desired signal power and noise signal power of the reference signal to thereby output an estimated value of a power ratio of the main signal to the noise signal and an estimated value of a power ratio of the reference signal to the noise signal ;
and step size outputting means for receiving said estimated values from said signal-to-noise power ratio estimating means to thereby output a first and a second step size representative of an amount of correction of a filter coefficient of said first adaptive filter and an amount of correction of a filter coefficient of said second adaptive filter , respectively .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal (second noise signal, first noise signal) and prevent update of noise energy estimates (adaptive filter) on the music signal .
US5978824A
CLAIM 1
. A noise canceler comprising : first delaying means for delaying a main signal containing a desired signal and a noise signal by a preselected period of time to thereby output a delayed main signal ;
second delaying means for receiving the noise signal as a reference signal and delaying the reference singal by the preselected period of time to thereby output a delayed reference signal ;
first subtracting means for subtracting a first estimated noise signal from said delayed main signal to thereby generate a first desired signal output ;
second subtracting means for subtracting a first estimated desired signal from said delayed reference signal to thereby generate a first noise signal (background noise signal) output ;
a first adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) for receiving said first noise signal output and adaptively estimating a noise signal contained in said delayed main signal to thereby output said first estimated noise signal ;
a second adaptive filter for receiving said first desired signal output and adaptively estimating a desired signal contained in said delayed reference singal to thereby output said first estimated desired signal ;
signal-to-noise power ratio estimating means for receiving said main signal and said reference signal and calculating desired signal power and noise signal power of the main signal and desired signal power and noise signal power of the reference signal to thereby output an estimated value of a power ratio of the main signal to the noise signal and an estimated value of a power ratio of the reference signal to the noise signal ;
and step size outputting means for receiving said estimated values from said signal-to-noise power ratio estimating means to thereby output a first and a second step size representative of an amount of correction of a filter coefficient of said first adaptive filter and an amount of correction of a filter coefficient of said second adaptive filter , respectively .

US5978824A
CLAIM 2
. A noise canceler as claimed in claim 1 , wherein said signal-to-noise power ratio estimating means comprises : third subtracting means for subtracting a second estimated noise signal from the main signal to thereby generate a second desired signal output ;
fourth subtracting means for subtracting a second estimated desired signal from the reference signal to thereby generate a second noise signal (background noise signal) output ;
a third adaptive filter for receiving said second noise signal output and adaptively estimating a noise signal contained in the main signal to thereby output said second estimated noise signal ;
a fourth adaptive filter for receiving said second desired signal output and adaptively estimating a desired signal contained in the reference signal to thereby output a second estimated desired signal ;
first power averaging means for receiving said second desired signal output and producing a square mean of said second desired signal output to thereby output desired signal power of the main signal ;
second power averaging means for receiving said second estimated noise signal and producing a square mean of said second estimated noise signal to thereby output noise signal power of the main signal ;
third power averaging means for receiving said second estimated desired signal and producing a square mean of said second estimated desired signal to thereby output desired signal power of the reference signal ;
fourth power averaging means for receiving said second noise signal output and producing a square mean of said second noise signal to thereby output noise signal power of the reference signal ;
first dividing means for dividing said desired signal power of the main signal by said noise signa power of the main signal to thereby output an estimated value of a power ratio of the main signal to the noise signla ;
and second dividing means for dividing said desired signal power of the reference signal by said noise signal power of the reference signal to thereby output an estimated value of a power ratio of the reference signal to the noise signal .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates (adaptive filter) is prevented in response to having simultaneously the activity prediction parameter larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US5978824A
CLAIM 1
. A noise canceler comprising : first delaying means for delaying a main signal containing a desired signal and a noise signal by a preselected period of time to thereby output a delayed main signal ;
second delaying means for receiving the noise signal as a reference signal and delaying the reference singal by the preselected period of time to thereby output a delayed reference signal ;
first subtracting means for subtracting a first estimated noise signal from said delayed main signal to thereby generate a first desired signal output ;
second subtracting means for subtracting a first estimated desired signal from said delayed reference signal to thereby generate a first noise signal output ;
a first adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) for receiving said first noise signal output and adaptively estimating a noise signal contained in said delayed main signal to thereby output said first estimated noise signal ;
a second adaptive filter for receiving said first desired signal output and adaptively estimating a desired signal contained in said delayed reference singal to thereby output said first estimated desired signal ;
signal-to-noise power ratio estimating means for receiving said main signal and said reference signal and calculating desired signal power and noise signal power of the main signal and desired signal power and noise signal power of the reference signal to thereby output an estimated value of a power ratio of the main signal to the noise signal and an estimated value of a power ratio of the reference signal to the noise signal ;
and step size outputting means for receiving said estimated values from said signal-to-noise power ratio estimating means to thereby output a first and a second step size representative of an amount of correction of a filter coefficient of said first adaptive filter and an amount of correction of a filter coefficient of said second adaptive filter , respectively .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency (time t) bands and a second group of a rest of the frequency bands ;

calculating a first energy value (first maximum value) for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US5978824A
CLAIM 1
. A noise canceler comprising : first delaying means for delaying a main signal containing a desired signal and a noise signal by a preselected period of time t (first frequency) o thereby output a delayed main signal ;
second delaying means for receiving the noise signal as a reference signal and delaying the reference singal by the preselected period of time to thereby output a delayed reference signal ;
first subtracting means for subtracting a first estimated noise signal from said delayed main signal to thereby generate a first desired signal output ;
second subtracting means for subtracting a first estimated desired signal from said delayed reference signal to thereby generate a first noise signal output ;
a first adaptive filter for receiving said first noise signal output and adaptively estimating a noise signal contained in said delayed main signal to thereby output said first estimated noise signal ;
a second adaptive filter for receiving said first desired signal output and adaptively estimating a desired signal contained in said delayed reference singal to thereby output said first estimated desired signal ;
signal-to-noise power ratio estimating means for receiving said main signal and said reference signal and calculating desired signal power and noise signal power of the main signal and desired signal power and noise signal power of the reference signal to thereby output an estimated value of a power ratio of the main signal to the noise signal and an estimated value of a power ratio of the reference signal to the noise signal ;
and step size outputting means for receiving said estimated values from said signal-to-noise power ratio estimating means to thereby output a first and a second step size representative of an amount of correction of a filter coefficient of said first adaptive filter and an amount of correction of a filter coefficient of said second adaptive filter , respectively .

US5978824A
CLAIM 3
. A noise canceler as claimed in claim 1 , wherein said step size outputting means comprises : means for inputting said estimated value of the power ratio of the main signal to the noise signal to a preselected monotonously increasing function to thereby calculate a first function value ;
and means for outputting as said first step size said first function value when said first function value is between a first maximum value (initial value, correlation value, term value, first energy value) and a first minimum value , or outputting said first maximum value when said first function value is greater than said first maximum value , or outputting said first minimum value when said first function value is smaller than said first minimum value .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates (adaptive filter) is prevented in response to having the noise character parameter inferior than a given fixed threshold .
US5978824A
CLAIM 1
. A noise canceler comprising : first delaying means for delaying a main signal containing a desired signal and a noise signal by a preselected period of time to thereby output a delayed main signal ;
second delaying means for receiving the noise signal as a reference signal and delaying the reference singal by the preselected period of time to thereby output a delayed reference signal ;
first subtracting means for subtracting a first estimated noise signal from said delayed main signal to thereby generate a first desired signal output ;
second subtracting means for subtracting a first estimated desired signal from said delayed reference signal to thereby generate a first noise signal output ;
a first adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) for receiving said first noise signal output and adaptively estimating a noise signal contained in said delayed main signal to thereby output said first estimated noise signal ;
a second adaptive filter for receiving said first desired signal output and adaptively estimating a desired signal contained in said delayed reference singal to thereby output said first estimated desired signal ;
signal-to-noise power ratio estimating means for receiving said main signal and said reference signal and calculating desired signal power and noise signal power of the main signal and desired signal power and noise signal power of the reference signal to thereby output an estimated value of a power ratio of the main signal to the noise signal and an estimated value of a power ratio of the reference signal to the noise signal ;
and step size outputting means for receiving said estimated values from said signal-to-noise power ratio estimating means to thereby output a first and a second step size representative of an amount of correction of a filter coefficient of said first adaptive filter and an amount of correction of a filter coefficient of said second adaptive filter , respectively .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (first maximum value) of the long-term correlation map .
US5978824A
CLAIM 3
. A noise canceler as claimed in claim 1 , wherein said step size outputting means comprises : means for inputting said estimated value of the power ratio of the main signal to the noise signal to a preselected monotonously increasing function to thereby calculate a first function value ;
and means for outputting as said first step size said first function value when said first function value is between a first maximum value (initial value, correlation value, term value, first energy value) and a first minimum value , or outputting said first maximum value when said first function value is greater than said first maximum value , or outputting said first minimum value when said first function value is smaller than said first minimum value .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (first maximum value) of the long-term correlation map .
US5978824A
CLAIM 3
. A noise canceler as claimed in claim 1 , wherein said step size outputting means comprises : means for inputting said estimated value of the power ratio of the main signal to the noise signal to a preselected monotonously increasing function to thereby calculate a first function value ;
and means for outputting as said first step size said first function value when said first function value is between a first maximum value (initial value, correlation value, term value, first energy value) and a first minimum value , or outputting said first maximum value when said first function value is greater than said first maximum value , or outputting said first minimum value when said first function value is smaller than said first minimum value .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (second noise signal, first noise signal) ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US5978824A
CLAIM 1
. A noise canceler comprising : first delaying means for delaying a main signal containing a desired signal and a noise signal by a preselected period of time to thereby output a delayed main signal ;
second delaying means for receiving the noise signal as a reference signal and delaying the reference singal by the preselected period of time to thereby output a delayed reference signal ;
first subtracting means for subtracting a first estimated noise signal from said delayed main signal to thereby generate a first desired signal output ;
second subtracting means for subtracting a first estimated desired signal from said delayed reference signal to thereby generate a first noise signal (background noise signal) output ;
a first adaptive filter for receiving said first noise signal output and adaptively estimating a noise signal contained in said delayed main signal to thereby output said first estimated noise signal ;
a second adaptive filter for receiving said first desired signal output and adaptively estimating a desired signal contained in said delayed reference singal to thereby output said first estimated desired signal ;
signal-to-noise power ratio estimating means for receiving said main signal and said reference signal and calculating desired signal power and noise signal power of the main signal and desired signal power and noise signal power of the reference signal to thereby output an estimated value of a power ratio of the main signal to the noise signal and an estimated value of a power ratio of the reference signal to the noise signal ;
and step size outputting means for receiving said estimated values from said signal-to-noise power ratio estimating means to thereby output a first and a second step size representative of an amount of correction of a filter coefficient of said first adaptive filter and an amount of correction of a filter coefficient of said second adaptive filter , respectively .

US5978824A
CLAIM 2
. A noise canceler as claimed in claim 1 , wherein said signal-to-noise power ratio estimating means comprises : third subtracting means for subtracting a second estimated noise signal from the main signal to thereby generate a second desired signal output ;
fourth subtracting means for subtracting a second estimated desired signal from the reference signal to thereby generate a second noise signal (background noise signal) output ;
a third adaptive filter for receiving said second noise signal output and adaptively estimating a noise signal contained in the main signal to thereby output said second estimated noise signal ;
a fourth adaptive filter for receiving said second desired signal output and adaptively estimating a desired signal contained in the reference signal to thereby output a second estimated desired signal ;
first power averaging means for receiving said second desired signal output and producing a square mean of said second desired signal output to thereby output desired signal power of the main signal ;
second power averaging means for receiving said second estimated noise signal and producing a square mean of said second estimated noise signal to thereby output noise signal power of the main signal ;
third power averaging means for receiving said second estimated desired signal and producing a square mean of said second estimated desired signal to thereby output desired signal power of the reference signal ;
fourth power averaging means for receiving said second noise signal output and producing a square mean of said second noise signal to thereby output noise signal power of the reference signal ;
first dividing means for dividing said desired signal power of the main signal by said noise signa power of the main signal to thereby output an estimated value of a power ratio of the main signal to the noise signla ;
and second dividing means for dividing said desired signal power of the reference signal by said noise signal power of the reference signal to thereby output an estimated value of a power ratio of the reference signal to the noise signal .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal (second noise signal, first noise signal) ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US5978824A
CLAIM 1
. A noise canceler comprising : first delaying means for delaying a main signal containing a desired signal and a noise signal by a preselected period of time to thereby output a delayed main signal ;
second delaying means for receiving the noise signal as a reference signal and delaying the reference singal by the preselected period of time to thereby output a delayed reference signal ;
first subtracting means for subtracting a first estimated noise signal from said delayed main signal to thereby generate a first desired signal output ;
second subtracting means for subtracting a first estimated desired signal from said delayed reference signal to thereby generate a first noise signal (background noise signal) output ;
a first adaptive filter for receiving said first noise signal output and adaptively estimating a noise signal contained in said delayed main signal to thereby output said first estimated noise signal ;
a second adaptive filter for receiving said first desired signal output and adaptively estimating a desired signal contained in said delayed reference singal to thereby output said first estimated desired signal ;
signal-to-noise power ratio estimating means for receiving said main signal and said reference signal and calculating desired signal power and noise signal power of the main signal and desired signal power and noise signal power of the reference signal to thereby output an estimated value of a power ratio of the main signal to the noise signal and an estimated value of a power ratio of the reference signal to the noise signal ;
and step size outputting means for receiving said estimated values from said signal-to-noise power ratio estimating means to thereby output a first and a second step size representative of an amount of correction of a filter coefficient of said first adaptive filter and an amount of correction of a filter coefficient of said second adaptive filter , respectively .

US5978824A
CLAIM 2
. A noise canceler as claimed in claim 1 , wherein said signal-to-noise power ratio estimating means comprises : third subtracting means for subtracting a second estimated noise signal from the main signal to thereby generate a second desired signal output ;
fourth subtracting means for subtracting a second estimated desired signal from the reference signal to thereby generate a second noise signal (background noise signal) output ;
a third adaptive filter for receiving said second noise signal output and adaptively estimating a noise signal contained in the main signal to thereby output said second estimated noise signal ;
a fourth adaptive filter for receiving said second desired signal output and adaptively estimating a desired signal contained in the reference signal to thereby output a second estimated desired signal ;
first power averaging means for receiving said second desired signal output and producing a square mean of said second desired signal output to thereby output desired signal power of the main signal ;
second power averaging means for receiving said second estimated noise signal and producing a square mean of said second estimated noise signal to thereby output noise signal power of the main signal ;
third power averaging means for receiving said second estimated desired signal and producing a square mean of said second estimated desired signal to thereby output desired signal power of the reference signal ;
fourth power averaging means for receiving said second noise signal output and producing a square mean of said second noise signal to thereby output noise signal power of the reference signal ;
first dividing means for dividing said desired signal power of the main signal by said noise signa power of the main signal to thereby output an estimated value of a power ratio of the main signal to the noise signla ;
and second dividing means for dividing said desired signal power of the reference signal by said noise signal power of the reference signal to thereby output an estimated value of a power ratio of the reference signal to the noise signal .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates (adaptive filter) in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector .
US5978824A
CLAIM 1
. A noise canceler comprising : first delaying means for delaying a main signal containing a desired signal and a noise signal by a preselected period of time to thereby output a delayed main signal ;
second delaying means for receiving the noise signal as a reference signal and delaying the reference singal by the preselected period of time to thereby output a delayed reference signal ;
first subtracting means for subtracting a first estimated noise signal from said delayed main signal to thereby generate a first desired signal output ;
second subtracting means for subtracting a first estimated desired signal from said delayed reference signal to thereby generate a first noise signal output ;
a first adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) for receiving said first noise signal output and adaptively estimating a noise signal contained in said delayed main signal to thereby output said first estimated noise signal ;
a second adaptive filter for receiving said first desired signal output and adaptively estimating a desired signal contained in said delayed reference singal to thereby output said first estimated desired signal ;
signal-to-noise power ratio estimating means for receiving said main signal and said reference signal and calculating desired signal power and noise signal power of the main signal and desired signal power and noise signal power of the reference signal to thereby output an estimated value of a power ratio of the main signal to the noise signal and an estimated value of a power ratio of the reference signal to the noise signal ;
and step size outputting means for receiving said estimated values from said signal-to-noise power ratio estimating means to thereby output a first and a second step size representative of an amount of correction of a filter coefficient of said first adaptive filter and an amount of correction of a filter coefficient of said second adaptive filter , respectively .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal (second noise signal, first noise signal) and preventing update of noise energy estimates (adaptive filter) .
US5978824A
CLAIM 1
. A noise canceler comprising : first delaying means for delaying a main signal containing a desired signal and a noise signal by a preselected period of time to thereby output a delayed main signal ;
second delaying means for receiving the noise signal as a reference signal and delaying the reference singal by the preselected period of time to thereby output a delayed reference signal ;
first subtracting means for subtracting a first estimated noise signal from said delayed main signal to thereby generate a first desired signal output ;
second subtracting means for subtracting a first estimated desired signal from said delayed reference signal to thereby generate a first noise signal (background noise signal) output ;
a first adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) for receiving said first noise signal output and adaptively estimating a noise signal contained in said delayed main signal to thereby output said first estimated noise signal ;
a second adaptive filter for receiving said first desired signal output and adaptively estimating a desired signal contained in said delayed reference singal to thereby output said first estimated desired signal ;
signal-to-noise power ratio estimating means for receiving said main signal and said reference signal and calculating desired signal power and noise signal power of the main signal and desired signal power and noise signal power of the reference signal to thereby output an estimated value of a power ratio of the main signal to the noise signal and an estimated value of a power ratio of the reference signal to the noise signal ;
and step size outputting means for receiving said estimated values from said signal-to-noise power ratio estimating means to thereby output a first and a second step size representative of an amount of correction of a filter coefficient of said first adaptive filter and an amount of correction of a filter coefficient of said second adaptive filter , respectively .

US5978824A
CLAIM 2
. A noise canceler as claimed in claim 1 , wherein said signal-to-noise power ratio estimating means comprises : third subtracting means for subtracting a second estimated noise signal from the main signal to thereby generate a second desired signal output ;
fourth subtracting means for subtracting a second estimated desired signal from the reference signal to thereby generate a second noise signal (background noise signal) output ;
a third adaptive filter for receiving said second noise signal output and adaptively estimating a noise signal contained in the main signal to thereby output said second estimated noise signal ;
a fourth adaptive filter for receiving said second desired signal output and adaptively estimating a desired signal contained in the reference signal to thereby output a second estimated desired signal ;
first power averaging means for receiving said second desired signal output and producing a square mean of said second desired signal output to thereby output desired signal power of the main signal ;
second power averaging means for receiving said second estimated noise signal and producing a square mean of said second estimated noise signal to thereby output noise signal power of the main signal ;
third power averaging means for receiving said second estimated desired signal and producing a square mean of said second estimated desired signal to thereby output desired signal power of the reference signal ;
fourth power averaging means for receiving said second noise signal output and producing a square mean of said second noise signal to thereby output noise signal power of the reference signal ;
first dividing means for dividing said desired signal power of the main signal by said noise signa power of the main signal to thereby output an estimated value of a power ratio of the main signal to the noise signla ;
and second dividing means for dividing said desired signal power of the reference signal by said noise signal power of the reference signal to thereby output an estimated value of a power ratio of the reference signal to the noise signal .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6070137A

Filed: 1998-01-07     Issued: 2000-05-30

Integrated frequency-domain voice coding using an adaptive spectral enhancement filter

(Original Assignee) Ericsson Inc     (Current Assignee) Ericsson Inc

Leland S. Bloebaum, Phillip M. Johnson
US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (removing noise) indicative of sound activity in the sound signal .
US6070137A
CLAIM 24
. A method of suppressing noise in a voice encoder , comprising the steps of : converting a received analog audio signal into frames of time-domain audio samples ;
determining presence or absence of speech in a current frame of the time-domain audio samples ;
transforming the frame time-domain audio samples to a frequency-domain representation ;
updating a noise model using the transformed current frame if there is an absence of speech creating a noise suppression filter from the frequency-domain representation ;
and removing noise (adaptive threshold) characteristics from the frequency-domain representation of the current frame using the noise suppression filter and developing a set of spectral magnitudes .

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (analog audio signal) from a background noise signal (analog audio signal) ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US6070137A
CLAIM 1
. A system for encoding voice with integrated noise suppression , comprising : a sampler which converts an analog audio signal (music signal, background noise signal) into frames of time-domain audio samples ;
a voice activity detector operatively coupled to the sampler for determining presence or absence of speech in a current frame ;
a transformer operatively coupled to the sampler for transforming the frame of time-domain audio samples to a frequency-domain representation ;
a noise model adapter operatively associated with the voice activity detector and the transformer for updating a noise model using a current frame if the voice activity detector determines there is an absence of speech ;
a transformer and filter creator operatively coupled to the transformer and the noise model adaptor to create a noise suppression filter ;
and a spectral estimator operatively coupled to the transformer and the transformer and filter creator to remove noise characteristics from the frequency-domain representation of the current frame using the noise suppression filter and to develop a set of spectral magnitudes .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal prevents updating of noise energy estimates when a music signal (analog audio signal) is detected .
US6070137A
CLAIM 1
. A system for encoding voice with integrated noise suppression , comprising : a sampler which converts an analog audio signal (music signal, background noise signal) into frames of time-domain audio samples ;
a voice activity detector operatively coupled to the sampler for determining presence or absence of speech in a current frame ;
a transformer operatively coupled to the sampler for transforming the frame of time-domain audio samples to a frequency-domain representation ;
a noise model adapter operatively associated with the voice activity detector and the transformer for updating a noise model using a current frame if the voice activity detector determines there is an absence of speech ;
a transformer and filter creator operatively coupled to the transformer and the noise model adaptor to create a noise suppression filter ;
and a spectral estimator operatively coupled to the transformer and the transformer and filter creator to remove noise characteristics from the frequency-domain representation of the current frame using the noise suppression filter and to develop a set of spectral magnitudes .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character (noise character) parameter in order to distinguish a music signal (analog audio signal) from a background noise signal (analog audio signal) and prevent update of noise energy estimates on the music signal .
US6070137A
CLAIM 1
. A system for encoding voice with integrated noise suppression , comprising : a sampler which converts an analog audio signal (music signal, background noise signal) into frames of time-domain audio samples ;
a voice activity detector operatively coupled to the sampler for determining presence or absence of speech in a current frame ;
a transformer operatively coupled to the sampler for transforming the frame of time-domain audio samples to a frequency-domain representation ;
a noise model adapter operatively associated with the voice activity detector and the transformer for updating a noise model using a current frame if the voice activity detector determines there is an absence of speech ;
a transformer and filter creator operatively coupled to the transformer and the noise model adaptor to create a noise suppression filter ;
and a spectral estimator operatively coupled to the transformer and the transformer and filter creator to remove noise character (noise character) istics from the frequency-domain representation of the current frame using the noise suppression filter and to develop a set of spectral magnitudes .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision (linear prediction coefficient) obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
US6070137A
CLAIM 14
. The system of claim 8 wherein the vector of noise model parameters is comprised of a time domain model such as an autocorrelation function (ACF) or a set of linear prediction coefficient (binary decision) s (LPCs) .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character (noise character) parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US6070137A
CLAIM 1
. A system for encoding voice with integrated noise suppression , comprising : a sampler which converts an analog audio signal into frames of time-domain audio samples ;
a voice activity detector operatively coupled to the sampler for determining presence or absence of speech in a current frame ;
a transformer operatively coupled to the sampler for transforming the frame of time-domain audio samples to a frequency-domain representation ;
a noise model adapter operatively associated with the voice activity detector and the transformer for updating a noise model using a current frame if the voice activity detector determines there is an absence of speech ;
a transformer and filter creator operatively coupled to the transformer and the noise model adaptor to create a noise suppression filter ;
and a spectral estimator operatively coupled to the transformer and the transformer and filter creator to remove noise character (noise character) istics from the frequency-domain representation of the current frame using the noise suppression filter and to develop a set of spectral magnitudes .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character (noise character) parameter inferior than a given fixed threshold .
US6070137A
CLAIM 1
. A system for encoding voice with integrated noise suppression , comprising : a sampler which converts an analog audio signal into frames of time-domain audio samples ;
a voice activity detector operatively coupled to the sampler for determining presence or absence of speech in a current frame ;
a transformer operatively coupled to the sampler for transforming the frame of time-domain audio samples to a frequency-domain representation ;
a noise model adapter operatively associated with the voice activity detector and the transformer for updating a noise model using a current frame if the voice activity detector determines there is an absence of speech ;
a transformer and filter creator operatively coupled to the transformer and the noise model adaptor to create a noise suppression filter ;
and a spectral estimator operatively coupled to the transformer and the transformer and filter creator to remove noise character (noise character) istics from the frequency-domain representation of the current frame using the noise suppression filter and to develop a set of spectral magnitudes .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (analog audio signal) from a background noise signal (analog audio signal) ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US6070137A
CLAIM 1
. A system for encoding voice with integrated noise suppression , comprising : a sampler which converts an analog audio signal (music signal, background noise signal) into frames of time-domain audio samples ;
a voice activity detector operatively coupled to the sampler for determining presence or absence of speech in a current frame ;
a transformer operatively coupled to the sampler for transforming the frame of time-domain audio samples to a frequency-domain representation ;
a noise model adapter operatively associated with the voice activity detector and the transformer for updating a noise model using a current frame if the voice activity detector determines there is an absence of speech ;
a transformer and filter creator operatively coupled to the transformer and the noise model adaptor to create a noise suppression filter ;
and a spectral estimator operatively coupled to the transformer and the transformer and filter creator to remove noise characteristics from the frequency-domain representation of the current frame using the noise suppression filter and to develop a set of spectral magnitudes .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal (analog audio signal) from a background noise signal (analog audio signal) ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US6070137A
CLAIM 1
. A system for encoding voice with integrated noise suppression , comprising : a sampler which converts an analog audio signal (music signal, background noise signal) into frames of time-domain audio samples ;
a voice activity detector operatively coupled to the sampler for determining presence or absence of speech in a current frame ;
a transformer operatively coupled to the sampler for transforming the frame of time-domain audio samples to a frequency-domain representation ;
a noise model adapter operatively associated with the voice activity detector and the transformer for updating a noise model using a current frame if the voice activity detector determines there is an absence of speech ;
a transformer and filter creator operatively coupled to the transformer and the noise model adaptor to create a noise suppression filter ;
and a spectral estimator operatively coupled to the transformer and the transformer and filter creator to remove noise characteristics from the frequency-domain representation of the current frame using the noise suppression filter and to develop a set of spectral magnitudes .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal to noise ratio (square root) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US6070137A
CLAIM 10
. The system of claim 9 wherein the noise model is stored using the same number of points as the PSD estimate , but wherein the value stored represents square root (noise ratio) s of the values actually used in the PSD estimate .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character (noise character) of the sound signal for distinguishing a music signal (analog audio signal) from a background noise signal (analog audio signal) and preventing update of noise energy estimates .
US6070137A
CLAIM 1
. A system for encoding voice with integrated noise suppression , comprising : a sampler which converts an analog audio signal (music signal, background noise signal) into frames of time-domain audio samples ;
a voice activity detector operatively coupled to the sampler for determining presence or absence of speech in a current frame ;
a transformer operatively coupled to the sampler for transforming the frame of time-domain audio samples to a frequency-domain representation ;
a noise model adapter operatively associated with the voice activity detector and the transformer for updating a noise model using a current frame if the voice activity detector determines there is an absence of speech ;
a transformer and filter creator operatively coupled to the transformer and the noise model adaptor to create a noise suppression filter ;
and a spectral estimator operatively coupled to the transformer and the transformer and filter creator to remove noise character (noise character) istics from the frequency-domain representation of the current frame using the noise suppression filter and to develop a set of spectral magnitudes .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6018706A

Filed: 1997-12-29     Issued: 2000-01-25

Pitch determiner for a speech analyzer

(Original Assignee) Motorola Solutions Inc     (Current Assignee) Google Technology Holdings LLC

Jian-Cheng Huang, Floyd Simpson, Xiaojun Li
US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group (absolute value) of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US6018706A
CLAIM 4
. The pitch determiner of claim 1 , wherein said pitch function generator comprises : a squarer for squaring each of the predetermined number of digitized speech samples representing a segment of speech to generating squared digitized speech samples ;
Fast Fourier Transform (FFT) calculator for deriving frequency components corresponding to the predetermined number of squared digitized speech samples representing a segment of speech ;
an absolute value (second group) calculator for calculating an absolute value of the frequency components derived by tb FFT calculator ;
and an Inverse Fourier Transform (IFFT) calculator for deriving a plurality of pitch components from the frequency components derived by the FFT calculator .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US5974380A

Filed: 1997-12-16     Issued: 1999-10-26

Multi-channel audio decoder

(Original Assignee) Digital Theater Systems Inc     (Current Assignee) DTS LLC

Stephen Malcolm Smyth, Michael Henry Smyth, William Paul Smith
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum (frequency subbands) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US5974380A
CLAIM 1
. A multi-channel audio decoder for reconstructing multiple audio channels up to a decoder sampling rate from a data stream , in which each audio channel was sampled at an encoder sampling rate that is at least as high as the decoder sampling rate , subdivided into a plurality of frequency subbands (current residual spectrum) , compressed and multiplexed into the data stream at a transmission rate , comprising : an input buffer for reading in and storing the data stream a frame at a time , each of said frames including a sync word , a frame header , an audio header , and at least one subframe , which includes audio side information , a plurality of sub-subframes having baseband audio codes over a baseband frequency range , a block of high sampling rate audio codes over a high sampling rate frequency range , and an unpack sync ;
a demultiplexer that a) detects the sync word , b) unpacks the frame header to extract a window size that indicates a number of audio samples in the frame and a frame size that indicates a number of bytes in the frame , said window size being set as a function of the ratio of the transmission rate to the encoder sampling rate so that the frame size is constrained to be less than the size of the input buffer , c) unpacks the audio header to extract the number of subframes in the frame and the number of encoded audio channels , and d) sequentially unpacks each subframe to extract the audio side information including the number of sub-subframes , demultiplex the baseband audio codes in each sub-subframe into the multiple audio channels and unpack each audio channel into its subband audio codes , demultiplex the high sampling rate audio codes into the multiple audio channels up to the decoder sampling rate and skip the remaining high sampling rate audio codes up to the encoder sampling rate , and detects the unpack sync to verify the end of the subframe ;
a baseband decoder that uses the side information to decode the subband audio codes into reconstructed subband signals a subframe at a time without reference to any other subframes ;
a baseband reconstruction filter that combines each channel' ;
s reconstructed subband signals into a reconstructed baseband signal a subframe at a time ;
a high sampling rate decoder that uses the side information to decode the high sampling rate audio codes up to the decoder sampling rate into a reconstructed high sampling rate signal for each audio channel a subframe at a time ;
and a channel reconstruction filter that combines the reconstructed baseband and high sampling rate signals into a reconstructed multi-channel audio signal a subframe at a time .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum (frequency subbands) comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US5974380A
CLAIM 1
. A multi-channel audio decoder for reconstructing multiple audio channels up to a decoder sampling rate from a data stream , in which each audio channel was sampled at an encoder sampling rate that is at least as high as the decoder sampling rate , subdivided into a plurality of frequency subbands (current residual spectrum) , compressed and multiplexed into the data stream at a transmission rate , comprising : an input buffer for reading in and storing the data stream a frame at a time , each of said frames including a sync word , a frame header , an audio header , and at least one subframe , which includes audio side information , a plurality of sub-subframes having baseband audio codes over a baseband frequency range , a block of high sampling rate audio codes over a high sampling rate frequency range , and an unpack sync ;
a demultiplexer that a) detects the sync word , b) unpacks the frame header to extract a window size that indicates a number of audio samples in the frame and a frame size that indicates a number of bytes in the frame , said window size being set as a function of the ratio of the transmission rate to the encoder sampling rate so that the frame size is constrained to be less than the size of the input buffer , c) unpacks the audio header to extract the number of subframes in the frame and the number of encoded audio channels , and d) sequentially unpacks each subframe to extract the audio side information including the number of sub-subframes , demultiplex the baseband audio codes in each sub-subframe into the multiple audio channels and unpack each audio channel into its subband audio codes , demultiplex the high sampling rate audio codes into the multiple audio channels up to the decoder sampling rate and skip the remaining high sampling rate audio codes up to the encoder sampling rate , and detects the unpack sync to verify the end of the subframe ;
a baseband decoder that uses the side information to decode the subband audio codes into reconstructed subband signals a subframe at a time without reference to any other subframes ;
a baseband reconstruction filter that combines each channel' ;
s reconstructed subband signals into a reconstructed baseband signal a subframe at a time ;
a high sampling rate decoder that uses the side information to decode the high sampling rate audio codes up to the decoder sampling rate into a reconstructed high sampling rate signal for each audio channel a subframe at a time ;
and a channel reconstruction filter that combines the reconstructed baseband and high sampling rate signals into a reconstructed multi-channel audio signal a subframe at a time .

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum (frequency subbands) comprises locating a maximum between each pair of two consecutive minima of the current residual spectrum .
US5974380A
CLAIM 1
. A multi-channel audio decoder for reconstructing multiple audio channels up to a decoder sampling rate from a data stream , in which each audio channel was sampled at an encoder sampling rate that is at least as high as the decoder sampling rate , subdivided into a plurality of frequency subbands (current residual spectrum) , compressed and multiplexed into the data stream at a transmission rate , comprising : an input buffer for reading in and storing the data stream a frame at a time , each of said frames including a sync word , a frame header , an audio header , and at least one subframe , which includes audio side information , a plurality of sub-subframes having baseband audio codes over a baseband frequency range , a block of high sampling rate audio codes over a high sampling rate frequency range , and an unpack sync ;
a demultiplexer that a) detects the sync word , b) unpacks the frame header to extract a window size that indicates a number of audio samples in the frame and a frame size that indicates a number of bytes in the frame , said window size being set as a function of the ratio of the transmission rate to the encoder sampling rate so that the frame size is constrained to be less than the size of the input buffer , c) unpacks the audio header to extract the number of subframes in the frame and the number of encoded audio channels , and d) sequentially unpacks each subframe to extract the audio side information including the number of sub-subframes , demultiplex the baseband audio codes in each sub-subframe into the multiple audio channels and unpack each audio channel into its subband audio codes , demultiplex the high sampling rate audio codes into the multiple audio channels up to the decoder sampling rate and skip the remaining high sampling rate audio codes up to the encoder sampling rate , and detects the unpack sync to verify the end of the subframe ;
a baseband decoder that uses the side information to decode the subband audio codes into reconstructed subband signals a subframe at a time without reference to any other subframes ;
a baseband reconstruction filter that combines each channel' ;
s reconstructed subband signals into a reconstructed baseband signal a subframe at a time ;
a high sampling rate decoder that uses the side information to decode the high sampling rate audio codes up to the decoder sampling rate into a reconstructed high sampling rate signal for each audio channel a subframe at a time ;
and a channel reconstruction filter that combines the reconstructed baseband and high sampling rate signals into a reconstructed multi-channel audio signal a subframe at a time .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum (frequency subbands) , calculating a normalized correlation value with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US5974380A
CLAIM 1
. A multi-channel audio decoder for reconstructing multiple audio channels up to a decoder sampling rate from a data stream , in which each audio channel was sampled at an encoder sampling rate that is at least as high as the decoder sampling rate , subdivided into a plurality of frequency subbands (current residual spectrum) , compressed and multiplexed into the data stream at a transmission rate , comprising : an input buffer for reading in and storing the data stream a frame at a time , each of said frames including a sync word , a frame header , an audio header , and at least one subframe , which includes audio side information , a plurality of sub-subframes having baseband audio codes over a baseband frequency range , a block of high sampling rate audio codes over a high sampling rate frequency range , and an unpack sync ;
a demultiplexer that a) detects the sync word , b) unpacks the frame header to extract a window size that indicates a number of audio samples in the frame and a frame size that indicates a number of bytes in the frame , said window size being set as a function of the ratio of the transmission rate to the encoder sampling rate so that the frame size is constrained to be less than the size of the input buffer , c) unpacks the audio header to extract the number of subframes in the frame and the number of encoded audio channels , and d) sequentially unpacks each subframe to extract the audio side information including the number of sub-subframes , demultiplex the baseband audio codes in each sub-subframe into the multiple audio channels and unpack each audio channel into its subband audio codes , demultiplex the high sampling rate audio codes into the multiple audio channels up to the decoder sampling rate and skip the remaining high sampling rate audio codes up to the encoder sampling rate , and detects the unpack sync to verify the end of the subframe ;
a baseband decoder that uses the side information to decode the subband audio codes into reconstructed subband signals a subframe at a time without reference to any other subframes ;
a baseband reconstruction filter that combines each channel' ;
s reconstructed subband signals into a reconstructed baseband signal a subframe at a time ;
a high sampling rate decoder that uses the side information to decode the high sampling rate audio codes up to the decoder sampling rate into a reconstructed high sampling rate signal for each audio channel a subframe at a time ;
and a channel reconstruction filter that combines the reconstructed baseband and high sampling rate signals into a reconstructed multi-channel audio signal a subframe at a time .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (differential pulse) indicative of sound activity in the sound signal .
US5974380A
CLAIM 2
. The multi-channel audio decoder of claim 1 , wherein the baseband decoder comprises a plurality of inverse adaptive differential pulse (adaptive threshold) code modulation (ADPCM) coders for decoding the respective subband audio codes , said side information including prediction coefficients for the respective ADPCM coders and a prediction mode (PMODE) for controlling the application of the prediction coefficients to the respective ADPCM coders to selectively enable and disable their prediction capabilities .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame and an energy of the sound signal in a previous frame , for frequency bands (band frequency) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US5974380A
CLAIM 1
. A multi-channel audio decoder for reconstructing multiple audio channels up to a decoder sampling rate from a data stream , in which each audio channel was sampled at an encoder sampling rate that is at least as high as the decoder sampling rate , subdivided into a plurality of frequency subbands , compressed and multiplexed into the data stream at a transmission rate , comprising : an input buffer for reading in and storing the data stream a frame at a time , each of said frames including a sync word , a frame header , an audio header , and at least one subframe , which includes audio side information , a plurality of sub-subframes having baseband audio codes over a baseband frequency (frequency bands, first frequency bands) range , a block of high sampling rate audio codes over a high sampling rate frequency range , and an unpack sync ;
a demultiplexer that a) detects the sync word , b) unpacks the frame header to extract a window size that indicates a number of audio samples in the frame and a frame size that indicates a number of bytes in the frame , said window size being set as a function of the ratio of the transmission rate to the encoder sampling rate so that the frame size is constrained to be less than the size of the input buffer , c) unpacks the audio header to extract the number of subframes in the frame and the number of encoded audio channels , and d) sequentially unpacks each subframe to extract the audio side information including the number of sub-subframes , demultiplex the baseband audio codes in each sub-subframe into the multiple audio channels and unpack each audio channel into its subband audio codes , demultiplex the high sampling rate audio codes into the multiple audio channels up to the decoder sampling rate and skip the remaining high sampling rate audio codes up to the encoder sampling rate , and detects the unpack sync to verify the end of the subframe ;
a baseband decoder that uses the side information to decode the subband audio codes into reconstructed subband signals a subframe at a time without reference to any other subframes ;
a baseband reconstruction filter that combines each channel' ;
s reconstructed subband signals into a reconstructed baseband signal a subframe at a time ;
a high sampling rate decoder that uses the side information to decode the high sampling rate audio codes up to the decoder sampling rate into a reconstructed high sampling rate signal for each audio channel a subframe at a time ;
and a channel reconstruction filter that combines the reconstructed baseband and high sampling rate signals into a reconstructed multi-channel audio signal a subframe at a time .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (band frequency) into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US5974380A
CLAIM 1
. A multi-channel audio decoder for reconstructing multiple audio channels up to a decoder sampling rate from a data stream , in which each audio channel was sampled at an encoder sampling rate that is at least as high as the decoder sampling rate , subdivided into a plurality of frequency subbands , compressed and multiplexed into the data stream at a transmission rate , comprising : an input buffer for reading in and storing the data stream a frame at a time , each of said frames including a sync word , a frame header , an audio header , and at least one subframe , which includes audio side information , a plurality of sub-subframes having baseband audio codes over a baseband frequency (frequency bands, first frequency bands) range , a block of high sampling rate audio codes over a high sampling rate frequency range , and an unpack sync ;
a demultiplexer that a) detects the sync word , b) unpacks the frame header to extract a window size that indicates a number of audio samples in the frame and a frame size that indicates a number of bytes in the frame , said window size being set as a function of the ratio of the transmission rate to the encoder sampling rate so that the frame size is constrained to be less than the size of the input buffer , c) unpacks the audio header to extract the number of subframes in the frame and the number of encoded audio channels , and d) sequentially unpacks each subframe to extract the audio side information including the number of sub-subframes , demultiplex the baseband audio codes in each sub-subframe into the multiple audio channels and unpack each audio channel into its subband audio codes , demultiplex the high sampling rate audio codes into the multiple audio channels up to the decoder sampling rate and skip the remaining high sampling rate audio codes up to the encoder sampling rate , and detects the unpack sync to verify the end of the subframe ;
a baseband decoder that uses the side information to decode the subband audio codes into reconstructed subband signals a subframe at a time without reference to any other subframes ;
a baseband reconstruction filter that combines each channel' ;
s reconstructed subband signals into a reconstructed baseband signal a subframe at a time ;
a high sampling rate decoder that uses the side information to decode the high sampling rate audio codes up to the decoder sampling rate into a reconstructed high sampling rate signal for each audio channel a subframe at a time ;
and a channel reconstruction filter that combines the reconstructed baseband and high sampling rate signals into a reconstructed multi-channel audio signal a subframe at a time .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum (frequency subbands) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US5974380A
CLAIM 1
. A multi-channel audio decoder for reconstructing multiple audio channels up to a decoder sampling rate from a data stream , in which each audio channel was sampled at an encoder sampling rate that is at least as high as the decoder sampling rate , subdivided into a plurality of frequency subbands (current residual spectrum) , compressed and multiplexed into the data stream at a transmission rate , comprising : an input buffer for reading in and storing the data stream a frame at a time , each of said frames including a sync word , a frame header , an audio header , and at least one subframe , which includes audio side information , a plurality of sub-subframes having baseband audio codes over a baseband frequency range , a block of high sampling rate audio codes over a high sampling rate frequency range , and an unpack sync ;
a demultiplexer that a) detects the sync word , b) unpacks the frame header to extract a window size that indicates a number of audio samples in the frame and a frame size that indicates a number of bytes in the frame , said window size being set as a function of the ratio of the transmission rate to the encoder sampling rate so that the frame size is constrained to be less than the size of the input buffer , c) unpacks the audio header to extract the number of subframes in the frame and the number of encoded audio channels , and d) sequentially unpacks each subframe to extract the audio side information including the number of sub-subframes , demultiplex the baseband audio codes in each sub-subframe into the multiple audio channels and unpack each audio channel into its subband audio codes , demultiplex the high sampling rate audio codes into the multiple audio channels up to the decoder sampling rate and skip the remaining high sampling rate audio codes up to the encoder sampling rate , and detects the unpack sync to verify the end of the subframe ;
a baseband decoder that uses the side information to decode the subband audio codes into reconstructed subband signals a subframe at a time without reference to any other subframes ;
a baseband reconstruction filter that combines each channel' ;
s reconstructed subband signals into a reconstructed baseband signal a subframe at a time ;
a high sampling rate decoder that uses the side information to decode the high sampling rate audio codes up to the decoder sampling rate into a reconstructed high sampling rate signal for each audio channel a subframe at a time ;
and a channel reconstruction filter that combines the reconstructed baseband and high sampling rate signals into a reconstructed multi-channel audio signal a subframe at a time .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum (frequency subbands) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US5974380A
CLAIM 1
. A multi-channel audio decoder for reconstructing multiple audio channels up to a decoder sampling rate from a data stream , in which each audio channel was sampled at an encoder sampling rate that is at least as high as the decoder sampling rate , subdivided into a plurality of frequency subbands (current residual spectrum) , compressed and multiplexed into the data stream at a transmission rate , comprising : an input buffer for reading in and storing the data stream a frame at a time , each of said frames including a sync word , a frame header , an audio header , and at least one subframe , which includes audio side information , a plurality of sub-subframes having baseband audio codes over a baseband frequency range , a block of high sampling rate audio codes over a high sampling rate frequency range , and an unpack sync ;
a demultiplexer that a) detects the sync word , b) unpacks the frame header to extract a window size that indicates a number of audio samples in the frame and a frame size that indicates a number of bytes in the frame , said window size being set as a function of the ratio of the transmission rate to the encoder sampling rate so that the frame size is constrained to be less than the size of the input buffer , c) unpacks the audio header to extract the number of subframes in the frame and the number of encoded audio channels , and d) sequentially unpacks each subframe to extract the audio side information including the number of sub-subframes , demultiplex the baseband audio codes in each sub-subframe into the multiple audio channels and unpack each audio channel into its subband audio codes , demultiplex the high sampling rate audio codes into the multiple audio channels up to the decoder sampling rate and skip the remaining high sampling rate audio codes up to the encoder sampling rate , and detects the unpack sync to verify the end of the subframe ;
a baseband decoder that uses the side information to decode the subband audio codes into reconstructed subband signals a subframe at a time without reference to any other subframes ;
a baseband reconstruction filter that combines each channel' ;
s reconstructed subband signals into a reconstructed baseband signal a subframe at a time ;
a high sampling rate decoder that uses the side information to decode the high sampling rate audio codes up to the decoder sampling rate into a reconstructed high sampling rate signal for each audio channel a subframe at a time ;
and a channel reconstruction filter that combines the reconstructed baseband and high sampling rate signals into a reconstructed multi-channel audio signal a subframe at a time .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum (frequency subbands) comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US5974380A
CLAIM 1
. A multi-channel audio decoder for reconstructing multiple audio channels up to a decoder sampling rate from a data stream , in which each audio channel was sampled at an encoder sampling rate that is at least as high as the decoder sampling rate , subdivided into a plurality of frequency subbands (current residual spectrum) , compressed and multiplexed into the data stream at a transmission rate , comprising : an input buffer for reading in and storing the data stream a frame at a time , each of said frames including a sync word , a frame header , an audio header , and at least one subframe , which includes audio side information , a plurality of sub-subframes having baseband audio codes over a baseband frequency range , a block of high sampling rate audio codes over a high sampling rate frequency range , and an unpack sync ;
a demultiplexer that a) detects the sync word , b) unpacks the frame header to extract a window size that indicates a number of audio samples in the frame and a frame size that indicates a number of bytes in the frame , said window size being set as a function of the ratio of the transmission rate to the encoder sampling rate so that the frame size is constrained to be less than the size of the input buffer , c) unpacks the audio header to extract the number of subframes in the frame and the number of encoded audio channels , and d) sequentially unpacks each subframe to extract the audio side information including the number of sub-subframes , demultiplex the baseband audio codes in each sub-subframe into the multiple audio channels and unpack each audio channel into its subband audio codes , demultiplex the high sampling rate audio codes into the multiple audio channels up to the decoder sampling rate and skip the remaining high sampling rate audio codes up to the encoder sampling rate , and detects the unpack sync to verify the end of the subframe ;
a baseband decoder that uses the side information to decode the subband audio codes into reconstructed subband signals a subframe at a time without reference to any other subframes ;
a baseband reconstruction filter that combines each channel' ;
s reconstructed subband signals into a reconstructed baseband signal a subframe at a time ;
a high sampling rate decoder that uses the side information to decode the high sampling rate audio codes up to the decoder sampling rate into a reconstructed high sampling rate signal for each audio channel a subframe at a time ;
and a channel reconstruction filter that combines the reconstructed baseband and high sampling rate signals into a reconstructed multi-channel audio signal a subframe at a time .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US5960389A

Filed: 1997-11-06     Issued: 1999-09-28

Methods for generating comfort noise during discontinuous transmission

(Original Assignee) Nokia Mobile Phones Ltd     (Current Assignee) Nokia Technologies Oy

Kari Jarvinen, Pekka Kapanen, Vesa Ruoppila, Jani Rotola-Pukkila
US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (odd number) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US5960389A
CLAIM 36
. A method as in claim 35 , wherein a length N of the averaging period is an odd number (frequency bins, pole filter) , and wherein the median of the ordered set is the ((N+1)/2)th element of the set .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (odd number) so as to produce a summed long-term correlation map .
US5960389A
CLAIM 36
. A method as in claim 35 , wherein a length N of the averaging period is an odd number (frequency bins, pole filter) , and wherein the median of the ordered set is the ((N+1)/2)th element of the set .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map for frequency bins (odd number) having a magnitude that exceeds a given fixed threshold .
US5960389A
CLAIM 36
. A method as in claim 35 , wherein a length N of the averaging period is an odd number (frequency bins, pole filter) , and wherein the median of the ordered set is the ((N+1)/2)th element of the set .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (predetermined threshold value) indicative of sound activity in the sound signal .
US5960389A
CLAIM 35
. A method as in claim 31 , wherein the step of replacing includes steps of : forming a set of buffered excitation gain values over the averaging period ;
ordering the set of buffered excitation gain values ;
and performing a median replacement operation in which those L excitation gain values differing the most from the median value , where the difference exceeds a predetermined threshold value (adaptive threshold) , are replaced by the median value of the set .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (ordered set) .
US5960389A
CLAIM 36
. A method as in claim 35 , wherein a length N of the averaging period is an odd number , and wherein the median of the ordered set (SNR calculation) is the ((N+1)/2)th element of the set .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order (second order) and a sixteenth order of linear prediction (other parameter) residual error energies .
US5960389A
CLAIM 1
. A method for producing comfort noise (CN) in a digital mobile terminal that uses a discontinuous transmission , comprising the steps of : in response to a speech pause , calculating random excitation spectral control (RESC) parameters ;
transmitting the RESC parameters to a receiver together with predetermined ones of CN parameters (linear prediction residual error energies) ;
receiving the RESC parameters ;
and shaping the spectral content of an excitation using the received RESC parameters prior to applying the excitation to a synthesis filter .

US5960389A
CLAIM 4
. A method as in claim 2 , wherein the speech coder implements a LPC analysis technique of order greater than two , and wherein the step of analyzing is performed by first or second order (second order) LPC analysis .

US5960389A
CLAIM 32
. A method as in claim 31 , wherein the step of replacing includes the steps of : measuring distances of the speech coding parameters from one another between individual frames within the averaging period ;
identifying those speech coding parameters which have the largest distances to the other parameter (linear prediction) s within the averaging period ;
and if the distances exceed a predetermined threshold , replacing an identified speech coding parameter with a speech coding parameter which has a smallest measured distance to the other speech coding parameters within the averaging period .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame energy and an average frame (quantization index) energy .
US5960389A
CLAIM 15
. A method as in claim 1 , wherein the predetermined ones of the CN parameters are comprised of a Line Spectral Frequency (LSF) residual vector and a CN energy quantization index (average frame) .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (odd number) so as to produce a summed long-term correlation map .
US5960389A
CLAIM 36
. A method as in claim 35 , wherein a length N of the averaging period is an odd number (frequency bins, pole filter) , and wherein the median of the ordered set is the ((N+1)/2)th element of the set .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal to noise ratio (gain parameter) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US5960389A
CLAIM 13
. A method as in claim 1 , wherein the predetermined ones of the CN parameters are comprised of synthesis filter coefficients and gain parameter (noise ratio) s .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6140809A

Filed: 1997-07-30     Issued: 2000-10-31

Spectrum analyzer

(Original Assignee) Advantest Corp     (Current Assignee) Advantest Corp

Wataru Doi
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum (frequency spectrum) of the sound signal , the method comprising : calculating a current residual spectrum (frequency band) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (first stage) of the long term correlation map .
US6140809A
CLAIM 1
. A spectrum analyzer comprising : at least two stages of frequency converters wherein a frequency of a local oscillation signal supplied to a later stage frequency converter is swept while a frequency sweep of a local oscillation signal supplied to an earlier stage frequency converter is fixed to determine a frequency of an input signal to be measured ;
branch means for branching an output signal from the earlier stage frequency converter at a point prior to the later stage frequency converter ;
and demodulation means for demodulating the signal from the earlier stage frequency converter which is branched by the branch means ;
whereby a demodulated output signal from the demodulation means is monitored in a time domain while a frequency spectrum (frequency spectrum) of said input signal is displayed in a frequency domain simultaneously .

US6140809A
CLAIM 3
. A spectrum analyzer comprising : at least two stages of frequency converters wherein a frequency of a local oscillation signal supplied to a later stage frequency converter is swept while a frequency sweep of a local oscillation signal supplied to an earlier stage frequency converter is fixed to determine a frequency of an input signal to be measured ;
branch means for branching an output signal from the earlier stage frequency converter at a point prior to the later stage frequency converter ;
demodulation means for demodulating the signal from the earlier stage frequency converter which is branched by the branch means , wherein the input signal is a television signal and the demodulation means is means for demodulating a picture signal as well as a sync signal ;
sweep control pulse generating means responsive to the sync signal demodulated by the demodulation means for generating a sweep control pulse during a no-modulation interval contained in a vertical blanking interval of the television signal ;
means responsive to the sweep control pulse for generating a ramp voltage causing the frequency of the local oscillation signal supplied to the later stage frequency converter to be swept during the duration of the sweep control pulse so that components in the frequency band (current residual spectrum) of the television signal may be extracted from the later stage frequency converter ;
and display means having a display panel for displaying the demodulated picture signal from the demodulation means in a time domain as well as a frequency spectrum of said input signal based on the thus extracted components in a frequency domain on the same display panel and simultaneously .

US6140809A
CLAIM 7
. A spectrum analyzer comprising : an earlier stage frequency converter wherein an input signal to be measure is mixed with a first local oscillation signal generated by a first local oscillator so that a first intermediate frequency signal is extracted ;
a controller controlling said earlier stage frequency converter such that a frequency sweep of the first local oscillation signal supplied to said earlier stage frequency converter is fixed to tune to a frequency of an input signal to be measured ;
a later stage frequency converter supplied with the first intermediate frequency signal from said first stage (initial value) frequency converter , wherein a frequency of a second local oscillation signal generated by a second local oscillator and supplied to the later stage frequency converter is swept , and the first intermediate frequency signal is mixed with said second local oscillation signal so that a second intermediate frequency signal is extracted : processing means for processing an output signal from said later stage frequency converter to obtain data of a frequency component of the signal under processing ;
branch means for branching an output signal from the earlier stage frequency converter at a point prior to the later stage frequency converter ;
demodulation means for effecting demodulation on the signal from the earlier stage frequency converter which is branched by the branch means ;
control means responsive to said demodulation means for controlling generation of the local oscillation signal supplied to the later stage frequency converter ;
and display means for displaying a frequency spectrum of said input signal in a frequency domain from the data of the processing means and monitoring an output signal from the demodulation means in a time domain simultaneously .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum (frequency band) comprises : searching for the minima in the frequency spectrum (frequency spectrum) of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US6140809A
CLAIM 1
. A spectrum analyzer comprising : at least two stages of frequency converters wherein a frequency of a local oscillation signal supplied to a later stage frequency converter is swept while a frequency sweep of a local oscillation signal supplied to an earlier stage frequency converter is fixed to determine a frequency of an input signal to be measured ;
branch means for branching an output signal from the earlier stage frequency converter at a point prior to the later stage frequency converter ;
and demodulation means for demodulating the signal from the earlier stage frequency converter which is branched by the branch means ;
whereby a demodulated output signal from the demodulation means is monitored in a time domain while a frequency spectrum (frequency spectrum) of said input signal is displayed in a frequency domain simultaneously .

US6140809A
CLAIM 3
. A spectrum analyzer comprising : at least two stages of frequency converters wherein a frequency of a local oscillation signal supplied to a later stage frequency converter is swept while a frequency sweep of a local oscillation signal supplied to an earlier stage frequency converter is fixed to determine a frequency of an input signal to be measured ;
branch means for branching an output signal from the earlier stage frequency converter at a point prior to the later stage frequency converter ;
demodulation means for demodulating the signal from the earlier stage frequency converter which is branched by the branch means , wherein the input signal is a television signal and the demodulation means is means for demodulating a picture signal as well as a sync signal ;
sweep control pulse generating means responsive to the sync signal demodulated by the demodulation means for generating a sweep control pulse during a no-modulation interval contained in a vertical blanking interval of the television signal ;
means responsive to the sweep control pulse for generating a ramp voltage causing the frequency of the local oscillation signal supplied to the later stage frequency converter to be swept during the duration of the sweep control pulse so that components in the frequency band (current residual spectrum) of the television signal may be extracted from the later stage frequency converter ;
and display means having a display panel for displaying the demodulated picture signal from the demodulation means in a time domain as well as a frequency spectrum of said input signal based on the thus extracted components in a frequency domain on the same display panel and simultaneously .

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum (frequency band) comprises locating a maximum between each pair of two consecutive minima of the current residual spectrum .
US6140809A
CLAIM 3
. A spectrum analyzer comprising : at least two stages of frequency converters wherein a frequency of a local oscillation signal supplied to a later stage frequency converter is swept while a frequency sweep of a local oscillation signal supplied to an earlier stage frequency converter is fixed to determine a frequency of an input signal to be measured ;
branch means for branching an output signal from the earlier stage frequency converter at a point prior to the later stage frequency converter ;
demodulation means for demodulating the signal from the earlier stage frequency converter which is branched by the branch means , wherein the input signal is a television signal and the demodulation means is means for demodulating a picture signal as well as a sync signal ;
sweep control pulse generating means responsive to the sync signal demodulated by the demodulation means for generating a sweep control pulse during a no-modulation interval contained in a vertical blanking interval of the television signal ;
means responsive to the sweep control pulse for generating a ramp voltage causing the frequency of the local oscillation signal supplied to the later stage frequency converter to be swept during the duration of the sweep control pulse so that components in the frequency band (current residual spectrum) of the television signal may be extracted from the later stage frequency converter ;
and display means having a display panel for displaying the demodulated picture signal from the demodulation means in a time domain as well as a frequency spectrum of said input signal based on the thus extracted components in a frequency domain on the same display panel and simultaneously .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum (frequency band) , calculating a normalized correlation value with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US6140809A
CLAIM 3
. A spectrum analyzer comprising : at least two stages of frequency converters wherein a frequency of a local oscillation signal supplied to a later stage frequency converter is swept while a frequency sweep of a local oscillation signal supplied to an earlier stage frequency converter is fixed to determine a frequency of an input signal to be measured ;
branch means for branching an output signal from the earlier stage frequency converter at a point prior to the later stage frequency converter ;
demodulation means for demodulating the signal from the earlier stage frequency converter which is branched by the branch means , wherein the input signal is a television signal and the demodulation means is means for demodulating a picture signal as well as a sync signal ;
sweep control pulse generating means responsive to the sync signal demodulated by the demodulation means for generating a sweep control pulse during a no-modulation interval contained in a vertical blanking interval of the television signal ;
means responsive to the sweep control pulse for generating a ramp voltage causing the frequency of the local oscillation signal supplied to the later stage frequency converter to be swept during the duration of the sweep control pulse so that components in the frequency band (current residual spectrum) of the television signal may be extracted from the later stage frequency converter ;
and display means having a display panel for displaying the demodulated picture signal from the demodulation means in a time domain as well as a frequency spectrum of said input signal based on the thus extracted components in a frequency domain on the same display panel and simultaneously .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis (monitor means) ;

and summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US6140809A
CLAIM 2
. The spectrum analyzer according to claim 1 , further comprising monitor means (frequency bin basis) , receiving the demodulated output signal from the demodulation means , for monitoring the demodulated output signal .

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (vertical blanking) ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US6140809A
CLAIM 3
. A spectrum analyzer comprising : at least two stages of frequency converters wherein a frequency of a local oscillation signal supplied to a later stage frequency converter is swept while a frequency sweep of a local oscillation signal supplied to an earlier stage frequency converter is fixed to determine a frequency of an input signal to be measured ;
branch means for branching an output signal from the earlier stage frequency converter at a point prior to the later stage frequency converter ;
demodulation means for demodulating the signal from the earlier stage frequency converter which is branched by the branch means , wherein the input signal is a television signal and the demodulation means is means for demodulating a picture signal as well as a sync signal ;
sweep control pulse generating means responsive to the sync signal demodulated by the demodulation means for generating a sweep control pulse during a no-modulation interval contained in a vertical blanking (background noise signal) interval of the television signal ;
means responsive to the sweep control pulse for generating a ramp voltage causing the frequency of the local oscillation signal supplied to the later stage frequency converter to be swept during the duration of the sweep control pulse so that components in the frequency band of the television signal may be extracted from the later stage frequency converter ;
and display means having a display panel for displaying the demodulated picture signal from the demodulation means in a time domain as well as a frequency spectrum of said input signal based on the thus extracted components in a frequency domain on the same display panel and simultaneously .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (said time) in order to distinguish a music signal from a background noise signal (vertical blanking) and prevent update (further control) of noise energy estimates on the music signal .
US6140809A
CLAIM 3
. A spectrum analyzer comprising : at least two stages of frequency converters wherein a frequency of a local oscillation signal supplied to a later stage frequency converter is swept while a frequency sweep of a local oscillation signal supplied to an earlier stage frequency converter is fixed to determine a frequency of an input signal to be measured ;
branch means for branching an output signal from the earlier stage frequency converter at a point prior to the later stage frequency converter ;
demodulation means for demodulating the signal from the earlier stage frequency converter which is branched by the branch means , wherein the input signal is a television signal and the demodulation means is means for demodulating a picture signal as well as a sync signal ;
sweep control pulse generating means responsive to the sync signal demodulated by the demodulation means for generating a sweep control pulse during a no-modulation interval contained in a vertical blanking (background noise signal) interval of the television signal ;
means responsive to the sweep control pulse for generating a ramp voltage causing the frequency of the local oscillation signal supplied to the later stage frequency converter to be swept during the duration of the sweep control pulse so that components in the frequency band of the television signal may be extracted from the later stage frequency converter ;
and display means having a display panel for displaying the demodulated picture signal from the demodulation means in a time domain as well as a frequency spectrum of said input signal based on the thus extracted components in a frequency domain on the same display panel and simultaneously .

US6140809A
CLAIM 6
. A spectrum analyzer comprising : at least two stages of frequency converters wherein a frequency of a local oscillation signal supplied to a later stage frequency converter is swept while a frequency sweep of a local oscillation signal supplied to an earlier stage frequency converter is fixed to determine a frequency of an input signal to be measured ;
branch means for branching an output signal from the earlier stage frequency converter at a point prior to the later stage frequency converter ;
demodulation means for demodulating the signal from the earlier stage frequency converter which is branched by the branch means ;
wherein the input signal is a time division multiplex signal and the demodulation means is means for demodulating each time slot signal in the time division multiplex signal ;
channel detecting means for generating a sweep control pulse for the interval of one time slot from the demodulation means ;
means responsive to the sweep control pulse for generating a ramp voltage causing the frequency of the local oscillation signal supplied to the later stage frequency converter to be swept during the duration of the control pulse so that components in the frequency band of the time division multiplex signal may be extracted from the later stage frequency converter ;
and display means for displaying a frequency spectrum of said time (noise character parameter) division multiplex signal based on the components in a frequency domain .

US6140809A
CLAIM 13
. A spectrum analyzer comprising : a first frequency converter converting an input carrier signal to be measured which has been modulated by an input modulation signal into a first intermediate frequency signal by mixing with a first local oscillation signal generated by a first frequency sweep oscillator ;
a controller controlling said first frequency sweep oscillator such that a frequency of said first local signal is held suspended to such a fixed frequency as to tune the first frequency converter to the input carrier signal ;
a second frequency converter converting the first intermediate frequency signal into a second intermediate frequency signal by mixing with a second local oscillation signal generated by a second frequency sweep oscillator ;
said controller further control (prevent update) ling said second frequency sweep oscillator such that said second local oscillation signal is swept with a swept frequency range , processing means for processing the second intermediate frequency signal to obtain a component data of the input carrier signal within the swept frequency range ;
display means for displaying a frequency spectrum in a frequency domain based on the data from the processing means ;
demodulation means for effecting demodulation on the first intermediate frequency signal which is supplied thereto from a point prior to input of the second frequency converter thereby to obtain a demodulated signal representing the input modulation signal ;
and whereby said demodulated signal from said demodulation means is displayed by said monitor means in a time domain as well as the frequency spectrum of the input carrier signal in a frequency domain simultaneously .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (said time) comprises : dividing a plurality of frequency bands into a first group (one time slot) of a certain number of first frequency (first frequency) bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value (second intermediate) of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US6140809A
CLAIM 6
. A spectrum analyzer comprising : at least two stages of frequency converters wherein a frequency of a local oscillation signal supplied to a later stage frequency converter is swept while a frequency sweep of a local oscillation signal supplied to an earlier stage frequency converter is fixed to determine a frequency of an input signal to be measured ;
branch means for branching an output signal from the earlier stage frequency converter at a point prior to the later stage frequency converter ;
demodulation means for demodulating the signal from the earlier stage frequency converter which is branched by the branch means ;
wherein the input signal is a time division multiplex signal and the demodulation means is means for demodulating each time slot signal in the time division multiplex signal ;
channel detecting means for generating a sweep control pulse for the interval of one time slot (first group) from the demodulation means ;
means responsive to the sweep control pulse for generating a ramp voltage causing the frequency of the local oscillation signal supplied to the later stage frequency converter to be swept during the duration of the control pulse so that components in the frequency band of the time division multiplex signal may be extracted from the later stage frequency converter ;
and display means for displaying a frequency spectrum of said time (noise character parameter) division multiplex signal based on the components in a frequency domain .

US6140809A
CLAIM 7
. A spectrum analyzer comprising : an earlier stage frequency converter wherein an input signal to be measure is mixed with a first local oscillation signal generated by a first local oscillator so that a first intermediate frequency signal is extracted ;
a controller controlling said earlier stage frequency converter such that a frequency sweep of the first local oscillation signal supplied to said earlier stage frequency converter is fixed to tune to a frequency of an input signal to be measured ;
a later stage frequency converter supplied with the first intermediate frequency signal from said first stage frequency converter , wherein a frequency of a second local oscillation signal generated by a second local oscillator and supplied to the later stage frequency converter is swept , and the first intermediate frequency signal is mixed with said second local oscillation signal so that a second intermediate (second energy value) frequency signal is extracted : processing means for processing an output signal from said later stage frequency converter to obtain data of a frequency component of the signal under processing ;
branch means for branching an output signal from the earlier stage frequency converter at a point prior to the later stage frequency converter ;
demodulation means for effecting demodulation on the signal from the earlier stage frequency converter which is branched by the branch means ;
control means responsive to said demodulation means for controlling generation of the local oscillation signal supplied to the later stage frequency converter ;
and display means for displaying a frequency spectrum of said input signal in a frequency domain from the data of the processing means and monitoring an output signal from the demodulation means in a time domain simultaneously .

US6140809A
CLAIM 13
. A spectrum analyzer comprising : a first frequency (first frequency) converter converting an input carrier signal to be measured which has been modulated by an input modulation signal into a first intermediate frequency signal by mixing with a first local oscillation signal generated by a first frequency sweep oscillator ;
a controller controlling said first frequency sweep oscillator such that a frequency of said first local signal is held suspended to such a fixed frequency as to tune the first frequency converter to the input carrier signal ;
a second frequency converter converting the first intermediate frequency signal into a second intermediate frequency signal by mixing with a second local oscillation signal generated by a second frequency sweep oscillator ;
said controller further controlling said second frequency sweep oscillator such that said second local oscillation signal is swept with a swept frequency range , processing means for processing the second intermediate frequency signal to obtain a component data of the input carrier signal within the swept frequency range ;
display means for displaying a frequency spectrum in a frequency domain based on the data from the processing means ;
demodulation means for effecting demodulation on the first intermediate frequency signal which is supplied thereto from a point prior to input of the second frequency converter thereby to obtain a demodulated signal representing the input modulation signal ;
and whereby said demodulated signal from said demodulation means is displayed by said monitor means in a time domain as well as the frequency spectrum of the input carrier signal in a frequency domain simultaneously .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (said time) inferior than a given fixed threshold .
US6140809A
CLAIM 6
. A spectrum analyzer comprising : at least two stages of frequency converters wherein a frequency of a local oscillation signal supplied to a later stage frequency converter is swept while a frequency sweep of a local oscillation signal supplied to an earlier stage frequency converter is fixed to determine a frequency of an input signal to be measured ;
branch means for branching an output signal from the earlier stage frequency converter at a point prior to the later stage frequency converter ;
demodulation means for demodulating the signal from the earlier stage frequency converter which is branched by the branch means ;
wherein the input signal is a time division multiplex signal and the demodulation means is means for demodulating each time slot signal in the time division multiplex signal ;
channel detecting means for generating a sweep control pulse for the interval of one time slot from the demodulation means ;
means responsive to the sweep control pulse for generating a ramp voltage causing the frequency of the local oscillation signal supplied to the later stage frequency converter to be swept during the duration of the control pulse so that components in the frequency band of the time division multiplex signal may be extracted from the later stage frequency converter ;
and display means for displaying a frequency spectrum of said time (noise character parameter) division multiplex signal based on the components in a frequency domain .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (frequency spectrum) of the sound signal , the device comprising : means for calculating a current residual spectrum (frequency band) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (first stage) of the long-term correlation map .
US6140809A
CLAIM 1
. A spectrum analyzer comprising : at least two stages of frequency converters wherein a frequency of a local oscillation signal supplied to a later stage frequency converter is swept while a frequency sweep of a local oscillation signal supplied to an earlier stage frequency converter is fixed to determine a frequency of an input signal to be measured ;
branch means for branching an output signal from the earlier stage frequency converter at a point prior to the later stage frequency converter ;
and demodulation means for demodulating the signal from the earlier stage frequency converter which is branched by the branch means ;
whereby a demodulated output signal from the demodulation means is monitored in a time domain while a frequency spectrum (frequency spectrum) of said input signal is displayed in a frequency domain simultaneously .

US6140809A
CLAIM 3
. A spectrum analyzer comprising : at least two stages of frequency converters wherein a frequency of a local oscillation signal supplied to a later stage frequency converter is swept while a frequency sweep of a local oscillation signal supplied to an earlier stage frequency converter is fixed to determine a frequency of an input signal to be measured ;
branch means for branching an output signal from the earlier stage frequency converter at a point prior to the later stage frequency converter ;
demodulation means for demodulating the signal from the earlier stage frequency converter which is branched by the branch means , wherein the input signal is a television signal and the demodulation means is means for demodulating a picture signal as well as a sync signal ;
sweep control pulse generating means responsive to the sync signal demodulated by the demodulation means for generating a sweep control pulse during a no-modulation interval contained in a vertical blanking interval of the television signal ;
means responsive to the sweep control pulse for generating a ramp voltage causing the frequency of the local oscillation signal supplied to the later stage frequency converter to be swept during the duration of the sweep control pulse so that components in the frequency band (current residual spectrum) of the television signal may be extracted from the later stage frequency converter ;
and display means having a display panel for displaying the demodulated picture signal from the demodulation means in a time domain as well as a frequency spectrum of said input signal based on the thus extracted components in a frequency domain on the same display panel and simultaneously .

US6140809A
CLAIM 7
. A spectrum analyzer comprising : an earlier stage frequency converter wherein an input signal to be measure is mixed with a first local oscillation signal generated by a first local oscillator so that a first intermediate frequency signal is extracted ;
a controller controlling said earlier stage frequency converter such that a frequency sweep of the first local oscillation signal supplied to said earlier stage frequency converter is fixed to tune to a frequency of an input signal to be measured ;
a later stage frequency converter supplied with the first intermediate frequency signal from said first stage (initial value) frequency converter , wherein a frequency of a second local oscillation signal generated by a second local oscillator and supplied to the later stage frequency converter is swept , and the first intermediate frequency signal is mixed with said second local oscillation signal so that a second intermediate frequency signal is extracted : processing means for processing an output signal from said later stage frequency converter to obtain data of a frequency component of the signal under processing ;
branch means for branching an output signal from the earlier stage frequency converter at a point prior to the later stage frequency converter ;
demodulation means for effecting demodulation on the signal from the earlier stage frequency converter which is branched by the branch means ;
control means responsive to said demodulation means for controlling generation of the local oscillation signal supplied to the later stage frequency converter ;
and display means for displaying a frequency spectrum of said input signal in a frequency domain from the data of the processing means and monitoring an output signal from the demodulation means in a time domain simultaneously .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum (frequency spectrum) of the sound signal , the device comprising : a calculator of a current residual spectrum (frequency band) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (first stage) of the long-term correlation map .
US6140809A
CLAIM 1
. A spectrum analyzer comprising : at least two stages of frequency converters wherein a frequency of a local oscillation signal supplied to a later stage frequency converter is swept while a frequency sweep of a local oscillation signal supplied to an earlier stage frequency converter is fixed to determine a frequency of an input signal to be measured ;
branch means for branching an output signal from the earlier stage frequency converter at a point prior to the later stage frequency converter ;
and demodulation means for demodulating the signal from the earlier stage frequency converter which is branched by the branch means ;
whereby a demodulated output signal from the demodulation means is monitored in a time domain while a frequency spectrum (frequency spectrum) of said input signal is displayed in a frequency domain simultaneously .

US6140809A
CLAIM 3
. A spectrum analyzer comprising : at least two stages of frequency converters wherein a frequency of a local oscillation signal supplied to a later stage frequency converter is swept while a frequency sweep of a local oscillation signal supplied to an earlier stage frequency converter is fixed to determine a frequency of an input signal to be measured ;
branch means for branching an output signal from the earlier stage frequency converter at a point prior to the later stage frequency converter ;
demodulation means for demodulating the signal from the earlier stage frequency converter which is branched by the branch means , wherein the input signal is a television signal and the demodulation means is means for demodulating a picture signal as well as a sync signal ;
sweep control pulse generating means responsive to the sync signal demodulated by the demodulation means for generating a sweep control pulse during a no-modulation interval contained in a vertical blanking interval of the television signal ;
means responsive to the sweep control pulse for generating a ramp voltage causing the frequency of the local oscillation signal supplied to the later stage frequency converter to be swept during the duration of the sweep control pulse so that components in the frequency band (current residual spectrum) of the television signal may be extracted from the later stage frequency converter ;
and display means having a display panel for displaying the demodulated picture signal from the demodulation means in a time domain as well as a frequency spectrum of said input signal based on the thus extracted components in a frequency domain on the same display panel and simultaneously .

US6140809A
CLAIM 7
. A spectrum analyzer comprising : an earlier stage frequency converter wherein an input signal to be measure is mixed with a first local oscillation signal generated by a first local oscillator so that a first intermediate frequency signal is extracted ;
a controller controlling said earlier stage frequency converter such that a frequency sweep of the first local oscillation signal supplied to said earlier stage frequency converter is fixed to tune to a frequency of an input signal to be measured ;
a later stage frequency converter supplied with the first intermediate frequency signal from said first stage (initial value) frequency converter , wherein a frequency of a second local oscillation signal generated by a second local oscillator and supplied to the later stage frequency converter is swept , and the first intermediate frequency signal is mixed with said second local oscillation signal so that a second intermediate frequency signal is extracted : processing means for processing an output signal from said later stage frequency converter to obtain data of a frequency component of the signal under processing ;
branch means for branching an output signal from the earlier stage frequency converter at a point prior to the later stage frequency converter ;
demodulation means for effecting demodulation on the signal from the earlier stage frequency converter which is branched by the branch means ;
control means responsive to said demodulation means for controlling generation of the local oscillation signal supplied to the later stage frequency converter ;
and display means for displaying a frequency spectrum of said input signal in a frequency domain from the data of the processing means and monitoring an output signal from the demodulation means in a time domain simultaneously .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum (frequency band) comprises : a locator of the minima in the frequency spectrum (frequency spectrum) of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US6140809A
CLAIM 1
. A spectrum analyzer comprising : at least two stages of frequency converters wherein a frequency of a local oscillation signal supplied to a later stage frequency converter is swept while a frequency sweep of a local oscillation signal supplied to an earlier stage frequency converter is fixed to determine a frequency of an input signal to be measured ;
branch means for branching an output signal from the earlier stage frequency converter at a point prior to the later stage frequency converter ;
and demodulation means for demodulating the signal from the earlier stage frequency converter which is branched by the branch means ;
whereby a demodulated output signal from the demodulation means is monitored in a time domain while a frequency spectrum (frequency spectrum) of said input signal is displayed in a frequency domain simultaneously .

US6140809A
CLAIM 3
. A spectrum analyzer comprising : at least two stages of frequency converters wherein a frequency of a local oscillation signal supplied to a later stage frequency converter is swept while a frequency sweep of a local oscillation signal supplied to an earlier stage frequency converter is fixed to determine a frequency of an input signal to be measured ;
branch means for branching an output signal from the earlier stage frequency converter at a point prior to the later stage frequency converter ;
demodulation means for demodulating the signal from the earlier stage frequency converter which is branched by the branch means , wherein the input signal is a television signal and the demodulation means is means for demodulating a picture signal as well as a sync signal ;
sweep control pulse generating means responsive to the sync signal demodulated by the demodulation means for generating a sweep control pulse during a no-modulation interval contained in a vertical blanking interval of the television signal ;
means responsive to the sweep control pulse for generating a ramp voltage causing the frequency of the local oscillation signal supplied to the later stage frequency converter to be swept during the duration of the sweep control pulse so that components in the frequency band (current residual spectrum) of the television signal may be extracted from the later stage frequency converter ;
and display means having a display panel for displaying the demodulated picture signal from the demodulation means in a time domain as well as a frequency spectrum of said input signal based on the thus extracted components in a frequency domain on the same display panel and simultaneously .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis (monitor means) ;

and an adder for summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US6140809A
CLAIM 2
. The spectrum analyzer according to claim 1 , further comprising monitor means (frequency bin basis) , receiving the demodulated output signal from the demodulation means , for monitoring the demodulated output signal .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal (vertical blanking) ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US6140809A
CLAIM 3
. A spectrum analyzer comprising : at least two stages of frequency converters wherein a frequency of a local oscillation signal supplied to a later stage frequency converter is swept while a frequency sweep of a local oscillation signal supplied to an earlier stage frequency converter is fixed to determine a frequency of an input signal to be measured ;
branch means for branching an output signal from the earlier stage frequency converter at a point prior to the later stage frequency converter ;
demodulation means for demodulating the signal from the earlier stage frequency converter which is branched by the branch means , wherein the input signal is a television signal and the demodulation means is means for demodulating a picture signal as well as a sync signal ;
sweep control pulse generating means responsive to the sync signal demodulated by the demodulation means for generating a sweep control pulse during a no-modulation interval contained in a vertical blanking (background noise signal) interval of the television signal ;
means responsive to the sweep control pulse for generating a ramp voltage causing the frequency of the local oscillation signal supplied to the later stage frequency converter to be swept during the duration of the sweep control pulse so that components in the frequency band of the television signal may be extracted from the later stage frequency converter ;
and display means having a display panel for displaying the demodulated picture signal from the demodulation means in a time domain as well as a frequency spectrum of said input signal based on the thus extracted components in a frequency domain on the same display panel and simultaneously .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal (vertical blanking) ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US6140809A
CLAIM 3
. A spectrum analyzer comprising : at least two stages of frequency converters wherein a frequency of a local oscillation signal supplied to a later stage frequency converter is swept while a frequency sweep of a local oscillation signal supplied to an earlier stage frequency converter is fixed to determine a frequency of an input signal to be measured ;
branch means for branching an output signal from the earlier stage frequency converter at a point prior to the later stage frequency converter ;
demodulation means for demodulating the signal from the earlier stage frequency converter which is branched by the branch means , wherein the input signal is a television signal and the demodulation means is means for demodulating a picture signal as well as a sync signal ;
sweep control pulse generating means responsive to the sync signal demodulated by the demodulation means for generating a sweep control pulse during a no-modulation interval contained in a vertical blanking (background noise signal) interval of the television signal ;
means responsive to the sweep control pulse for generating a ramp voltage causing the frequency of the local oscillation signal supplied to the later stage frequency converter to be swept during the duration of the sweep control pulse so that components in the frequency band of the television signal may be extracted from the later stage frequency converter ;
and display means having a display panel for displaying the demodulated picture signal from the demodulation means in a time domain as well as a frequency spectrum of said input signal based on the thus extracted components in a frequency domain on the same display panel and simultaneously .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal (bandpass filter) to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US6140809A
CLAIM 4
. A spectrum analyzer according to claim 3 , in which a converted output from the earlier stage frequency converter is extracted therefrom by a bandpass filter (average signal) having a passband which exhibits a flat frequency response over at least the frequency bandwidth of the television signal .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal (vertical blanking) and preventing update of noise energy estimates .
US6140809A
CLAIM 3
. A spectrum analyzer comprising : at least two stages of frequency converters wherein a frequency of a local oscillation signal supplied to a later stage frequency converter is swept while a frequency sweep of a local oscillation signal supplied to an earlier stage frequency converter is fixed to determine a frequency of an input signal to be measured ;
branch means for branching an output signal from the earlier stage frequency converter at a point prior to the later stage frequency converter ;
demodulation means for demodulating the signal from the earlier stage frequency converter which is branched by the branch means , wherein the input signal is a television signal and the demodulation means is means for demodulating a picture signal as well as a sync signal ;
sweep control pulse generating means responsive to the sync signal demodulated by the demodulation means for generating a sweep control pulse during a no-modulation interval contained in a vertical blanking (background noise signal) interval of the television signal ;
means responsive to the sweep control pulse for generating a ramp voltage causing the frequency of the local oscillation signal supplied to the later stage frequency converter to be swept during the duration of the sweep control pulse so that components in the frequency band of the television signal may be extracted from the later stage frequency converter ;
and display means having a display panel for displaying the demodulated picture signal from the demodulation means in a time domain as well as a frequency spectrum of said input signal based on the thus extracted components in a frequency domain on the same display panel and simultaneously .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US5943429A

Filed: 1997-07-28     Issued: 1999-08-24

Spectral subtraction noise suppression method

(Original Assignee) Telefonaktiebolaget LM Ericsson AB     (Current Assignee) Telefonaktiebolaget LM Ericsson AB

Peter Handel
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (power spectral density) of the long term correlation map .
US5943429A
CLAIM 1
. A spectral subtraction noise suppression method in a frame based digital communication system , each frame including a predetermined number N of audio samples , thereby giving each frame N degrees of freedom , wherein a spectral subtraction function H(ω) is based on an estimate Φ v (ω) of a power spectral density (initial value) of background noise of non-speech frames and an estimate Φ x (ω) of a power spectral density of speech frames comprising the steps of : approximating each speech frame by a parametric model that reduces the number of degrees of freedom to less than N ;
estimating said estimate Φ x (ω) of the power spectral density of each speech frame by a parametric power spectrum estimation method based on the approximative parametric model ;
and estimating said estimate Φ v (ω) of the power spectral density of each non-speech frame by a non-parametric power spectrum estimation method .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group (weighting function) of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US5943429A
CLAIM 5
. The method of claim 3 , wherein the a spectral subtraction function H(ω) is in accordance with the formula : ##EQU45## where G(ω) is a weighting function (first group) and δ(ω) is a subtraction factor .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (power spectral density) of the long-term correlation map .
US5943429A
CLAIM 1
. A spectral subtraction noise suppression method in a frame based digital communication system , each frame including a predetermined number N of audio samples , thereby giving each frame N degrees of freedom , wherein a spectral subtraction function H(ω) is based on an estimate Φ v (ω) of a power spectral density (initial value) of background noise of non-speech frames and an estimate Φ x (ω) of a power spectral density of speech frames comprising the steps of : approximating each speech frame by a parametric model that reduces the number of degrees of freedom to less than N ;
estimating said estimate Φ x (ω) of the power spectral density of each speech frame by a parametric power spectrum estimation method based on the approximative parametric model ;
and estimating said estimate Φ v (ω) of the power spectral density of each non-speech frame by a non-parametric power spectrum estimation method .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value (power spectral density) of the long-term correlation map .
US5943429A
CLAIM 1
. A spectral subtraction noise suppression method in a frame based digital communication system , each frame including a predetermined number N of audio samples , thereby giving each frame N degrees of freedom , wherein a spectral subtraction function H(ω) is based on an estimate Φ v (ω) of a power spectral density (initial value) of background noise of non-speech frames and an estimate Φ x (ω) of a power spectral density of speech frames comprising the steps of : approximating each speech frame by a parametric model that reduces the number of degrees of freedom to less than N ;
estimating said estimate Φ x (ω) of the power spectral density of each speech frame by a parametric power spectrum estimation method based on the approximative parametric model ;
and estimating said estimate Φ v (ω) of the power spectral density of each non-speech frame by a non-parametric power spectrum estimation method .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6430295B1

Filed: 1997-07-11     Issued: 2002-08-06

Methods and apparatus for measuring signal level and delay at multiple sensors

(Original Assignee) Telefonaktiebolaget LM Ericsson AB     (Current Assignee) Telefonaktiebolaget LM Ericsson AB

Peter Händel, Jim Rasmusson
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum (frequency band) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US6430295B1
CLAIM 8
. A signal processing device according to claim 7 , wherein the filtering characteristic of the first filter includes a passband corresponding to the 300-600 Hz frequency band (current residual spectrum) .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum (frequency band) comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US6430295B1
CLAIM 8
. A signal processing device according to claim 7 , wherein the filtering characteristic of the first filter includes a passband corresponding to the 300-600 Hz frequency band (current residual spectrum) .

US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum (frequency band) comprises locating a maximum between each pair of two consecutive minima of the current residual spectrum .
US6430295B1
CLAIM 8
. A signal processing device according to claim 7 , wherein the filtering characteristic of the first filter includes a passband corresponding to the 300-600 Hz frequency band (current residual spectrum) .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum (frequency band) , calculating a normalized correlation value with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US6430295B1
CLAIM 8
. A signal processing device according to claim 7 , wherein the filtering characteristic of the first filter includes a passband corresponding to the 300-600 Hz frequency band (current residual spectrum) .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (noise ratio) .
US6430295B1
CLAIM 10
. A signal processing device according to claim 1 , wherein the filtering characteristic of said first filter includes coefficients which are adjusted to optimize a signal-to-noise ratio (noise ratio, SNR LT, SNR calculation) of said first filter .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group (signal processor, signal source) of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US6430295B1
CLAIM 1
. A signal processing device , comprising : a first signal sensor ;
a first filter having an input coupled to an output of said first sensor ;
a second signal sensor ;
a second filter having an input coupled to an output of said second sensor and having an adjustable filtering characteristic ;
a summing device having a first input coupled to an output of said first filter and a second input coupled to an output of said second filter , wherein the adjustable filtering characteristic of said second filter is adjusted in dependence upon an output of said summing device to cause the second filter output to emulate the first filter output ;
and a processor for computing an estimate of at least one parameter indicating a relationship between said first and second sensors , wherein the estimate is computed as a function of a filtering characteristic of said first filter and as a function of the adjustable filtering characteristic of said second filter , and wherein the estimate includes estimates of a relative time delay and a relative scale factor between said first and second sensors with respect to a signal source (second group, sound activity detector) .

US6430295B1
CLAIM 2
. A signal processing device according to claim 1 , wherein said first filter , said second filter , said summing device and said processor are implemented using a digital signal processor (second group, sound activity detector) (DSP) integrated circuit (IC) .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum (frequency band) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US6430295B1
CLAIM 8
. A signal processing device according to claim 7 , wherein the filtering characteristic of the first filter includes a passband corresponding to the 300-600 Hz frequency band (current residual spectrum) .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum (frequency band) of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US6430295B1
CLAIM 8
. A signal processing device according to claim 7 , wherein the filtering characteristic of the first filter includes a passband corresponding to the 300-600 Hz frequency band (current residual spectrum) .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum (frequency band) comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US6430295B1
CLAIM 8
. A signal processing device according to claim 7 , wherein the filtering characteristic of the first filter includes a passband corresponding to the 300-600 Hz frequency band (current residual spectrum) .

US8990073B2
CLAIM 37
. A device as defined in claim 36 , further comprising a signal-to-noise ratio (SNR)-based sound activity detector (signal processor, signal source) .
US6430295B1
CLAIM 1
. A signal processing device , comprising : a first signal sensor ;
a first filter having an input coupled to an output of said first sensor ;
a second signal sensor ;
a second filter having an input coupled to an output of said second sensor and having an adjustable filtering characteristic ;
a summing device having a first input coupled to an output of said first filter and a second input coupled to an output of said second filter , wherein the adjustable filtering characteristic of said second filter is adjusted in dependence upon an output of said summing device to cause the second filter output to emulate the first filter output ;
and a processor for computing an estimate of at least one parameter indicating a relationship between said first and second sensors , wherein the estimate is computed as a function of a filtering characteristic of said first filter and as a function of the adjustable filtering characteristic of said second filter , and wherein the estimate includes estimates of a relative time delay and a relative scale factor between said first and second sensors with respect to a signal source (second group, sound activity detector) .

US6430295B1
CLAIM 2
. A signal processing device according to claim 1 , wherein said first filter , said second filter , said summing device and said processor are implemented using a digital signal processor (second group, sound activity detector) (DSP) integrated circuit (IC) .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector (signal processor, signal source) comprises a comparator of an average signal to noise ratio (noise ratio) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US6430295B1
CLAIM 1
. A signal processing device , comprising : a first signal sensor ;
a first filter having an input coupled to an output of said first sensor ;
a second signal sensor ;
a second filter having an input coupled to an output of said second sensor and having an adjustable filtering characteristic ;
a summing device having a first input coupled to an output of said first filter and a second input coupled to an output of said second filter , wherein the adjustable filtering characteristic of said second filter is adjusted in dependence upon an output of said summing device to cause the second filter output to emulate the first filter output ;
and a processor for computing an estimate of at least one parameter indicating a relationship between said first and second sensors , wherein the estimate is computed as a function of a filtering characteristic of said first filter and as a function of the adjustable filtering characteristic of said second filter , and wherein the estimate includes estimates of a relative time delay and a relative scale factor between said first and second sensors with respect to a signal source (second group, sound activity detector) .

US6430295B1
CLAIM 2
. A signal processing device according to claim 1 , wherein said first filter , said second filter , said summing device and said processor are implemented using a digital signal processor (second group, sound activity detector) (DSP) integrated circuit (IC) .

US6430295B1
CLAIM 10
. A signal processing device according to claim 1 , wherein the filtering characteristic of said first filter includes coefficients which are adjusted to optimize a signal-to-noise ratio (noise ratio, SNR LT, SNR calculation) of said first filter .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector (signal processor, signal source) .
US6430295B1
CLAIM 1
. A signal processing device , comprising : a first signal sensor ;
a first filter having an input coupled to an output of said first sensor ;
a second signal sensor ;
a second filter having an input coupled to an output of said second sensor and having an adjustable filtering characteristic ;
a summing device having a first input coupled to an output of said first filter and a second input coupled to an output of said second filter , wherein the adjustable filtering characteristic of said second filter is adjusted in dependence upon an output of said summing device to cause the second filter output to emulate the first filter output ;
and a processor for computing an estimate of at least one parameter indicating a relationship between said first and second sensors , wherein the estimate is computed as a function of a filtering characteristic of said first filter and as a function of the adjustable filtering characteristic of said second filter , and wherein the estimate includes estimates of a relative time delay and a relative scale factor between said first and second sensors with respect to a signal source (second group, sound activity detector) .

US6430295B1
CLAIM 2
. A signal processing device according to claim 1 , wherein said first filter , said second filter , said summing device and said processor are implemented using a digital signal processor (second group, sound activity detector) (DSP) integrated circuit (IC) .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6072881A

Filed: 1997-06-09     Issued: 2000-06-06

Microphone noise rejection system

(Original Assignee) Chiefs Voice Inc     (Current Assignee) Chiefs Voice Inc

Frank X. Linder
US8990073B2
CLAIM 3
. A method as defined in claim 1 , wherein detecting the peaks in the current residual spectrum comprises locating a maximum between each pair of two consecutive minima (characteristic frequency, half period) of the current residual spectrum .
US6072881A
CLAIM 1
. A method of rejecting repetitive noise introduced into an information signal , comprising the steps of : receiving from a first source a repetitive noise signal ;
identifying a characteristic frequency (two consecutive minima) of the repetitive noise signal received from the first source ;
receiving from a second source that is distinct from the first source an information signal having an information component and a repetitive noise component originating from the first source ;
delaying the information signal for a period of time based on the identified characteristic frequency of the repetitive noise signal to form a phase-shifted or delayed information signal ;
and processing the delayed information signal with a non-delayed information signal received from the second source to form a processed information signal in which the information component is substantial and the noise component is negligible .

US6072881A
CLAIM 2
. A method as defined in claim 1 , wherein the step of delaying comprises delaying the information signal by an odd number of half period (two consecutive minima) s of the repetitive noise signal , and the step of processing comprises adding the delayed information signal to the non-delayed information signal to form the processed information signal .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (odd number) between two consecutive minima (characteristic frequency, half period) in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US6072881A
CLAIM 1
. A method of rejecting repetitive noise introduced into an information signal , comprising the steps of : receiving from a first source a repetitive noise signal ;
identifying a characteristic frequency (two consecutive minima) of the repetitive noise signal received from the first source ;
receiving from a second source that is distinct from the first source an information signal having an information component and a repetitive noise component originating from the first source ;
delaying the information signal for a period of time based on the identified characteristic frequency of the repetitive noise signal to form a phase-shifted or delayed information signal ;
and processing the delayed information signal with a non-delayed information signal received from the second source to form a processed information signal in which the information component is substantial and the noise component is negligible .

US6072881A
CLAIM 2
. A method as defined in claim 1 , wherein the step of delaying comprises delaying the information signal by an odd number (frequency bins, pole filter) of half period (two consecutive minima) s of the repetitive noise signal , and the step of processing comprises adding the delayed information signal to the non-delayed information signal to form the processed information signal .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (odd number) so as to produce a summed long-term correlation map .
US6072881A
CLAIM 2
. A method as defined in claim 1 , wherein the step of delaying comprises delaying the information signal by an odd number (frequency bins, pole filter) of half periods of the repetitive noise signal , and the step of processing comprises adding the delayed information signal to the non-delayed information signal to form the processed information signal .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map for frequency bins (odd number) having a magnitude that exceeds a given fixed threshold .
US6072881A
CLAIM 2
. A method as defined in claim 1 , wherein the step of delaying comprises delaying the information signal by an odd number (frequency bins, pole filter) of half periods of the repetitive noise signal , and the step of processing comprises adding the delayed information signal to the non-delayed information signal to form the processed information signal .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (A/D converter) indicative of sound activity in the sound signal .
US6072881A
CLAIM 8
. A method as defined in claim 1 , wherein : the step of delaying includes the steps of passing an analog information signal through an A/D converter (adaptive threshold) to form a digital information signal , and delaying the digital information signal , and wherein : the step of processing includes the steps of processing the delayed digital information signal with a non-delayed digital information signal to form a digital processed information , and passing the processed information signal through a D/A converter to form an analog processed information signal .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy (log information) value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US6072881A
CLAIM 8
. A method as defined in claim 1 , wherein : the step of delaying includes the steps of passing an analog information (first energy, first energy value) signal through an A/D converter to form a digital information signal , and delaying the digital information signal , and wherein : the step of processing includes the steps of processing the delayed digital information signal with a non-delayed digital information signal to form a digital processed information , and passing the processed information signal through a D/A converter to form an analog processed information signal .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (odd number) so as to produce a summed long-term correlation map .
US6072881A
CLAIM 2
. A method as defined in claim 1 , wherein the step of delaying comprises delaying the information signal by an odd number (frequency bins, pole filter) of half periods of the repetitive noise signal , and the step of processing comprises adding the delayed information signal to the non-delayed information signal to form the processed information signal .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US5911128A

Filed: 1997-03-11     Issued: 1999-06-08

Method and apparatus for performing speech frame encoding mode selection in a variable rate encoding system

(Original Assignee) Dejaco; Andrew P.     

Andrew P. DeJaco
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (third threshold) , and an initial value of the long term correlation map .
US5911128A
CLAIM 10
. The apparatus of claim 1 wherein said set of parameters comprises a normalized autocorrelation measurement indicative of periodicity in said speech samples , an encoding quality ratio indicative of a match between a previous frame of speech and synthesized speech derived therefrom , and a prediction gain differential measurement indicative of a frame to frame stability of a set of formant parameters , and wherein when said normalized autocorrelation measurement exceeds a predetermined first threshold , said prediction gain differential is below a second predetermined threshold and said encoding quality ratio exceeds a predetermined third threshold (current frame, current frame energy) , said rate determination logic means selects an encoding mode of half rate encoding .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame (third threshold) ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US5911128A
CLAIM 10
. The apparatus of claim 1 wherein said set of parameters comprises a normalized autocorrelation measurement indicative of periodicity in said speech samples , an encoding quality ratio indicative of a match between a previous frame of speech and synthesized speech derived therefrom , and a prediction gain differential measurement indicative of a frame to frame stability of a set of formant parameters , and wherein when said normalized autocorrelation measurement exceeds a predetermined first threshold , said prediction gain differential is below a second predetermined threshold and said encoding quality ratio exceeds a predetermined third threshold (current frame, current frame energy) , said rate determination logic means selects an encoding mode of half rate encoding .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (high frequency components) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US5911128A
CLAIM 4
. The apparatus of claim 2 wherein said set of parameters further includes a zero crossings count indicative of a presence of high frequency components (frequency bins) in said speech frame .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (high frequency components) so as to produce a summed long-term correlation map .
US5911128A
CLAIM 4
. The apparatus of claim 2 wherein said set of parameters further includes a zero crossings count indicative of a presence of high frequency components (frequency bins) in said speech frame .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises searching in the correlation map for frequency bins (high frequency components) having a magnitude that exceeds a given fixed threshold .
US5911128A
CLAIM 4
. The apparatus of claim 2 wherein said set of parameters further includes a zero crossings count indicative of a presence of high frequency components (frequency bins) in said speech frame .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (rate selection) .
US5911128A
CLAIM 1
. An apparatus for selecting an encoding rate from a predetermined set of encoding rates for encoding a frame of speech including a plurality of speech samples , comprising : mode measurement means , responsive to said speech samples and to at least one signal derived from said speech samples , for generating a set of parameters indicative of characteristics of said frame of speech ;
and rate determination logic means for receiving said set of parameters , for determining the psychoacoustic significance of said speech samples in accordance with said set of parameters and for selecting an encoding rate from said predetermined set of encoding rates using predetermined rate selection (SNR calculation) rules , wherein said rate selection rules select said encoding rate which allocates a first number of bits for the encoding of said speech samples when said speech samples are determined to be of greater psychoacoustic significance and wherein said rate selection rules select said encoding rate which allocates a second number of bits for the encoding of said speech samples when said speech samples are determined to be of a lesser psychoacoustic significance and wherein said first number of bits is greater than said second number of bits .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame (third threshold) energy and an average frame energy (average frame energy) .
US5911128A
CLAIM 6
. The apparatus of claim 2 wherein said set of parameters further includes a frame energy differential measurement indicative of changes in energy between energy of said speech frame and an average frame energy (average frame energy) .

US5911128A
CLAIM 10
. The apparatus of claim 1 wherein said set of parameters comprises a normalized autocorrelation measurement indicative of periodicity in said speech samples , an encoding quality ratio indicative of a match between a previous frame of speech and synthesized speech derived therefrom , and a prediction gain differential measurement indicative of a frame to frame stability of a set of formant parameters , and wherein when said normalized autocorrelation measurement exceeds a predetermined first threshold , said prediction gain differential is below a second predetermined threshold and said encoding quality ratio exceeds a predetermined third threshold (current frame, current frame energy) , said rate determination logic means selects an encoding mode of half rate encoding .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame (third threshold) and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US5911128A
CLAIM 10
. The apparatus of claim 1 wherein said set of parameters comprises a normalized autocorrelation measurement indicative of periodicity in said speech samples , an encoding quality ratio indicative of a match between a previous frame of speech and synthesized speech derived therefrom , and a prediction gain differential measurement indicative of a frame to frame stability of a set of formant parameters , and wherein when said normalized autocorrelation measurement exceeds a predetermined first threshold , said prediction gain differential is below a second predetermined threshold and said encoding quality ratio exceeds a predetermined third threshold (current frame, current frame energy) , said rate determination logic means selects an encoding mode of half rate encoding .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (previous frame) indicative of an activity of the sound signal .
US5911128A
CLAIM 2
. The apparatus of claim 1 wherein said set of parameters includes an encoding quality ratio indicative of a match between a previous frame (activity prediction parameter) of speech and synthesized speech derived therefrom .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (previous frame) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
US5911128A
CLAIM 2
. The apparatus of claim 1 wherein said set of parameters includes an encoding quality ratio indicative of a match between a previous frame (activity prediction parameter) of speech and synthesized speech derived therefrom .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (previous frame) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US5911128A
CLAIM 2
. The apparatus of claim 1 wherein said set of parameters includes an encoding quality ratio indicative of a match between a previous frame (activity prediction parameter) of speech and synthesized speech derived therefrom .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (third threshold) , and an initial value of the long-term correlation map .
US5911128A
CLAIM 10
. The apparatus of claim 1 wherein said set of parameters comprises a normalized autocorrelation measurement indicative of periodicity in said speech samples , an encoding quality ratio indicative of a match between a previous frame of speech and synthesized speech derived therefrom , and a prediction gain differential measurement indicative of a frame to frame stability of a set of formant parameters , and wherein when said normalized autocorrelation measurement exceeds a predetermined first threshold , said prediction gain differential is below a second predetermined threshold and said encoding quality ratio exceeds a predetermined third threshold (current frame, current frame energy) , said rate determination logic means selects an encoding mode of half rate encoding .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (third threshold) , and an initial value of the long-term correlation map .
US5911128A
CLAIM 10
. The apparatus of claim 1 wherein said set of parameters comprises a normalized autocorrelation measurement indicative of periodicity in said speech samples , an encoding quality ratio indicative of a match between a previous frame of speech and synthesized speech derived therefrom , and a prediction gain differential measurement indicative of a frame to frame stability of a set of formant parameters , and wherein when said normalized autocorrelation measurement exceeds a predetermined first threshold , said prediction gain differential is below a second predetermined threshold and said encoding quality ratio exceeds a predetermined third threshold (current frame, current frame energy) , said rate determination logic means selects an encoding mode of half rate encoding .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame (third threshold) ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US5911128A
CLAIM 10
. The apparatus of claim 1 wherein said set of parameters comprises a normalized autocorrelation measurement indicative of periodicity in said speech samples , an encoding quality ratio indicative of a match between a previous frame of speech and synthesized speech derived therefrom , and a prediction gain differential measurement indicative of a frame to frame stability of a set of formant parameters , and wherein when said normalized autocorrelation measurement exceeds a predetermined first threshold , said prediction gain differential is below a second predetermined threshold and said encoding quality ratio exceeds a predetermined third threshold (current frame, current frame energy) , said rate determination logic means selects an encoding mode of half rate encoding .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (high frequency components) so as to produce a summed long-term correlation map .
US5911128A
CLAIM 4
. The apparatus of claim 2 wherein said set of parameters further includes a zero crossings count indicative of a presence of high frequency components (frequency bins) in said speech frame .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US5933495A

Filed: 1997-02-07     Issued: 1999-08-03

Subband acoustic noise suppression

(Original Assignee) Texas Instruments Inc     (Current Assignee) Texas Instruments Inc

Stephen S. Oh
US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates (adaptive filter) when a tonal sound signal is detected .
US5933495A
CLAIM 1
. A conditioning circuit , comprising : microphone-in and speaker-line input terminals for respectively receiving microphone and speaker signals , and a microphone-out output terminal ;
an echo canceller circuit , coupled between the microphone-in and speaker-line input terminals , for producing a subband reduced-echo microphone signal by (i) transforming the microphone signal into a subband microphone signal and the speaker signal into a filtered subband speaker signal , and (ii) subband subtracting the filtered subband speaker signal from the subband microphone signal ;
a subband noise-suppresser circuit , coupled to the echo canceller circuit and receiving said subband reduced-echo microphone signal , for producing a subband reduced-noise , reduced-echo microphone signal by subband noise suppression of the subband reduced-echo microphone signal ;
and a synthesis filter , coupled between the noise-suppresser circuit and the microphone-out terminal , for transforming the subband reduced-noise , reduced-echo microphone signal into a fullband reduced-noise , reduced-echo microphone signal ;
wherein the echo canceller circuit comprises an adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) having changeable filter coefficients responsive to feedback of the subband reduced-echo microphone signal , for transforming the speaker signal into a filtered subband speaker signal ;
wherein the detector circuit comprises : a circuit for determining an echo path energy ratio (EPR) as an energy ratio between the output signal and the input signal and an echo canceller energy ratio (ECER) as an energy ratio between an output of the echo cancellation circuit and the output signal ;
a circuit for comparing EPR to a first predetermined threshold level and ECER to a second predetermined threshold level ;
and a sensing circuit for determining that near-end speech is present when EPR exceeds the first predetermined threshold level and at the same time ECER exceeds the second predetermined threshold level , further comprising a detector circuit responsive to the subband microphone signal , subband reduced-echo microphone signal , and reduced-noise , reduced-echo microphone signal for generating a FREEZE control signal only when the microphone signal contains near-end speech (speech actually voiced near the microphone) ;
the adaptive filter being responsive to the FREEZE control signal for disabling the filter coefficients from changing .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity in the sound signal further comprises using a signal-to-noise ratio (SNR)-based sound activity detection (speech detector) .
US5933495A
CLAIM 13
. An echo cancellation and noise suppression apparatus comprising : a loudspeaker responsive to an output signal source for generating a corresponding first acoustic signal ;
a microphone responsive to a second acoustic signal which includes a component of the first acoustic signal for generating a corresponding input signal ;
a subband echo cancellation circuit coupled between the output signal source and the microphone for generating a subband reduced-echo signal by reducing the component of the first acoustic signal in the input signal ;
and a subband noise suppression circuit responsive to the subband reduced-echo signal for generating a subband reduced-noise reduced-echo signal without generating an intermediary full-band signal by reducing noise in the subband reduced-echo signal further comprising a near-end speech detector (sound activity detection) comprising : echo detector for determining an echo path energy ratio (EPR) as an energy ratio between the output signal and the input signal ;
an echo canceller output detector for determining an echo canceller energy ratio (ECER) as an energy ratio between an output of the echo cancellation circuit and the output signal ;
a comparator circuit for simultaneously comparing (i) EPR to a first predetermined threshold level and (ii) ECER to a second predetermined threshold level ;
and a detector circuit responsive to the comparator circuit for indicating that near-end speech is present when EPR and ECER simultaneously respectively exceed the first and second predetermined threshold levels .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection (speech detector) comprises detecting the sound signal based on a frequency dependent signal-to-noise ratio (SNR) .
US5933495A
CLAIM 13
. An echo cancellation and noise suppression apparatus comprising : a loudspeaker responsive to an output signal source for generating a corresponding first acoustic signal ;
a microphone responsive to a second acoustic signal which includes a component of the first acoustic signal for generating a corresponding input signal ;
a subband echo cancellation circuit coupled between the output signal source and the microphone for generating a subband reduced-echo signal by reducing the component of the first acoustic signal in the input signal ;
and a subband noise suppression circuit responsive to the subband reduced-echo signal for generating a subband reduced-noise reduced-echo signal without generating an intermediary full-band signal by reducing noise in the subband reduced-echo signal further comprising a near-end speech detector (sound activity detection) comprising : echo detector for determining an echo path energy ratio (EPR) as an energy ratio between the output signal and the input signal ;
an echo canceller output detector for determining an echo canceller energy ratio (ECER) as an energy ratio between an output of the echo cancellation circuit and the output signal ;
a comparator circuit for simultaneously comparing (i) EPR to a first predetermined threshold level and (ii) ECER to a second predetermined threshold level ;
and a detector circuit responsive to the comparator circuit for indicating that near-end speech is present when EPR and ECER simultaneously respectively exceed the first and second predetermined threshold levels .

US8990073B2
CLAIM 14
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection (speech detector) comprises comparing an average signal-to-noise ratio (SNR av ) to a threshold calculated as a function of a long-term signal-to-noise ratio (SNR LT ) .
US5933495A
CLAIM 13
. An echo cancellation and noise suppression apparatus comprising : a loudspeaker responsive to an output signal source for generating a corresponding first acoustic signal ;
a microphone responsive to a second acoustic signal which includes a component of the first acoustic signal for generating a corresponding input signal ;
a subband echo cancellation circuit coupled between the output signal source and the microphone for generating a subband reduced-echo signal by reducing the component of the first acoustic signal in the input signal ;
and a subband noise suppression circuit responsive to the subband reduced-echo signal for generating a subband reduced-noise reduced-echo signal without generating an intermediary full-band signal by reducing noise in the subband reduced-echo signal further comprising a near-end speech detector (sound activity detection) comprising : echo detector for determining an echo path energy ratio (EPR) as an energy ratio between the output signal and the input signal ;
an echo canceller output detector for determining an echo canceller energy ratio (ECER) as an energy ratio between an output of the echo cancellation circuit and the output signal ;
a comparator circuit for simultaneously comparing (i) EPR to a first predetermined threshold level and (ii) ECER to a second predetermined threshold level ;
and a detector circuit responsive to the comparator circuit for indicating that near-end speech is present when EPR and ECER simultaneously respectively exceed the first and second predetermined threshold levels .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection (speech detector) in the sound signal further comprises using noise energy estimates (adaptive filter) calculated in a previous frame in a SNR calculation .
US5933495A
CLAIM 1
. A conditioning circuit , comprising : microphone-in and speaker-line input terminals for respectively receiving microphone and speaker signals , and a microphone-out output terminal ;
an echo canceller circuit , coupled between the microphone-in and speaker-line input terminals , for producing a subband reduced-echo microphone signal by (i) transforming the microphone signal into a subband microphone signal and the speaker signal into a filtered subband speaker signal , and (ii) subband subtracting the filtered subband speaker signal from the subband microphone signal ;
a subband noise-suppresser circuit , coupled to the echo canceller circuit and receiving said subband reduced-echo microphone signal , for producing a subband reduced-noise , reduced-echo microphone signal by subband noise suppression of the subband reduced-echo microphone signal ;
and a synthesis filter , coupled between the noise-suppresser circuit and the microphone-out terminal , for transforming the subband reduced-noise , reduced-echo microphone signal into a fullband reduced-noise , reduced-echo microphone signal ;
wherein the echo canceller circuit comprises an adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) having changeable filter coefficients responsive to feedback of the subband reduced-echo microphone signal , for transforming the speaker signal into a filtered subband speaker signal ;
wherein the detector circuit comprises : a circuit for determining an echo path energy ratio (EPR) as an energy ratio between the output signal and the input signal and an echo canceller energy ratio (ECER) as an energy ratio between an output of the echo cancellation circuit and the output signal ;
a circuit for comparing EPR to a first predetermined threshold level and ECER to a second predetermined threshold level ;
and a sensing circuit for determining that near-end speech is present when EPR exceeds the first predetermined threshold level and at the same time ECER exceeds the second predetermined threshold level , further comprising a detector circuit responsive to the subband microphone signal , subband reduced-echo microphone signal , and reduced-noise , reduced-echo microphone signal for generating a FREEZE control signal only when the microphone signal contains near-end speech (speech actually voiced near the microphone) ;
the adaptive filter being responsive to the FREEZE control signal for disabling the filter coefficients from changing .

US5933495A
CLAIM 13
. An echo cancellation and noise suppression apparatus comprising : a loudspeaker responsive to an output signal source for generating a corresponding first acoustic signal ;
a microphone responsive to a second acoustic signal which includes a component of the first acoustic signal for generating a corresponding input signal ;
a subband echo cancellation circuit coupled between the output signal source and the microphone for generating a subband reduced-echo signal by reducing the component of the first acoustic signal in the input signal ;
and a subband noise suppression circuit responsive to the subband reduced-echo signal for generating a subband reduced-noise reduced-echo signal without generating an intermediary full-band signal by reducing noise in the subband reduced-echo signal further comprising a near-end speech detector (sound activity detection) comprising : echo detector for determining an echo path energy ratio (EPR) as an energy ratio between the output signal and the input signal ;
an echo canceller output detector for determining an echo canceller energy ratio (ECER) as an energy ratio between an output of the echo cancellation circuit and the output signal ;
a comparator circuit for simultaneously comparing (i) EPR to a first predetermined threshold level and (ii) ECER to a second predetermined threshold level ;
and a detector circuit responsive to the comparator circuit for indicating that near-end speech is present when EPR and ECER simultaneously respectively exceed the first and second predetermined threshold levels .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection (speech detector) further comprises updating the noise estimates (adaptive filter) for a next frame .
US5933495A
CLAIM 1
. A conditioning circuit , comprising : microphone-in and speaker-line input terminals for respectively receiving microphone and speaker signals , and a microphone-out output terminal ;
an echo canceller circuit , coupled between the microphone-in and speaker-line input terminals , for producing a subband reduced-echo microphone signal by (i) transforming the microphone signal into a subband microphone signal and the speaker signal into a filtered subband speaker signal , and (ii) subband subtracting the filtered subband speaker signal from the subband microphone signal ;
a subband noise-suppresser circuit , coupled to the echo canceller circuit and receiving said subband reduced-echo microphone signal , for producing a subband reduced-noise , reduced-echo microphone signal by subband noise suppression of the subband reduced-echo microphone signal ;
and a synthesis filter , coupled between the noise-suppresser circuit and the microphone-out terminal , for transforming the subband reduced-noise , reduced-echo microphone signal into a fullband reduced-noise , reduced-echo microphone signal ;
wherein the echo canceller circuit comprises an adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) having changeable filter coefficients responsive to feedback of the subband reduced-echo microphone signal , for transforming the speaker signal into a filtered subband speaker signal ;
wherein the detector circuit comprises : a circuit for determining an echo path energy ratio (EPR) as an energy ratio between the output signal and the input signal and an echo canceller energy ratio (ECER) as an energy ratio between an output of the echo cancellation circuit and the output signal ;
a circuit for comparing EPR to a first predetermined threshold level and ECER to a second predetermined threshold level ;
and a sensing circuit for determining that near-end speech is present when EPR exceeds the first predetermined threshold level and at the same time ECER exceeds the second predetermined threshold level , further comprising a detector circuit responsive to the subband microphone signal , subband reduced-echo microphone signal , and reduced-noise , reduced-echo microphone signal for generating a FREEZE control signal only when the microphone signal contains near-end speech (speech actually voiced near the microphone) ;
the adaptive filter being responsive to the FREEZE control signal for disabling the filter coefficients from changing .

US5933495A
CLAIM 13
. An echo cancellation and noise suppression apparatus comprising : a loudspeaker responsive to an output signal source for generating a corresponding first acoustic signal ;
a microphone responsive to a second acoustic signal which includes a component of the first acoustic signal for generating a corresponding input signal ;
a subband echo cancellation circuit coupled between the output signal source and the microphone for generating a subband reduced-echo signal by reducing the component of the first acoustic signal in the input signal ;
and a subband noise suppression circuit responsive to the subband reduced-echo signal for generating a subband reduced-noise reduced-echo signal without generating an intermediary full-band signal by reducing noise in the subband reduced-echo signal further comprising a near-end speech detector (sound activity detection) comprising : echo detector for determining an echo path energy ratio (EPR) as an energy ratio between the output signal and the input signal ;
an echo canceller output detector for determining an echo canceller energy ratio (ECER) as an energy ratio between an output of the echo cancellation circuit and the output signal ;
a comparator circuit for simultaneously comparing (i) EPR to a first predetermined threshold level and (ii) ECER to a second predetermined threshold level ;
and a detector circuit responsive to the comparator circuit for indicating that near-end speech is present when EPR and ECER simultaneously respectively exceed the first and second predetermined threshold levels .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates (adaptive filter) for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal and a ratio between a second order and a sixteenth order of linear prediction residual error energies (input terminals) .
US5933495A
CLAIM 1
. A conditioning circuit , comprising : microphone-in and speaker-line input terminals (linear prediction residual error energies) for respectively receiving microphone and speaker signals , and a microphone-out output terminal ;
an echo canceller circuit , coupled between the microphone-in and speaker-line input terminals , for producing a subband reduced-echo microphone signal by (i) transforming the microphone signal into a subband microphone signal and the speaker signal into a filtered subband speaker signal , and (ii) subband subtracting the filtered subband speaker signal from the subband microphone signal ;
a subband noise-suppresser circuit , coupled to the echo canceller circuit and receiving said subband reduced-echo microphone signal , for producing a subband reduced-noise , reduced-echo microphone signal by subband noise suppression of the subband reduced-echo microphone signal ;
and a synthesis filter , coupled between the noise-suppresser circuit and the microphone-out terminal , for transforming the subband reduced-noise , reduced-echo microphone signal into a fullband reduced-noise , reduced-echo microphone signal ;
wherein the echo canceller circuit comprises an adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) having changeable filter coefficients responsive to feedback of the subband reduced-echo microphone signal , for transforming the speaker signal into a filtered subband speaker signal ;
wherein the detector circuit comprises : a circuit for determining an echo path energy ratio (EPR) as an energy ratio between the output signal and the input signal and an echo canceller energy ratio (ECER) as an energy ratio between an output of the echo cancellation circuit and the output signal ;
a circuit for comparing EPR to a first predetermined threshold level and ECER to a second predetermined threshold level ;
and a sensing circuit for determining that near-end speech is present when EPR exceeds the first predetermined threshold level and at the same time ECER exceeds the second predetermined threshold level , further comprising a detector circuit responsive to the subband microphone signal , subband reduced-echo microphone signal , and reduced-noise , reduced-echo microphone signal for generating a FREEZE control signal only when the microphone signal contains near-end speech (speech actually voiced near the microphone) ;
the adaptive filter being responsive to the FREEZE control signal for disabling the filter coefficients from changing .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal prevents updating of noise energy estimates (adaptive filter) when a music signal is detected .
US5933495A
CLAIM 1
. A conditioning circuit , comprising : microphone-in and speaker-line input terminals for respectively receiving microphone and speaker signals , and a microphone-out output terminal ;
an echo canceller circuit , coupled between the microphone-in and speaker-line input terminals , for producing a subband reduced-echo microphone signal by (i) transforming the microphone signal into a subband microphone signal and the speaker signal into a filtered subband speaker signal , and (ii) subband subtracting the filtered subband speaker signal from the subband microphone signal ;
a subband noise-suppresser circuit , coupled to the echo canceller circuit and receiving said subband reduced-echo microphone signal , for producing a subband reduced-noise , reduced-echo microphone signal by subband noise suppression of the subband reduced-echo microphone signal ;
and a synthesis filter , coupled between the noise-suppresser circuit and the microphone-out terminal , for transforming the subband reduced-noise , reduced-echo microphone signal into a fullband reduced-noise , reduced-echo microphone signal ;
wherein the echo canceller circuit comprises an adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) having changeable filter coefficients responsive to feedback of the subband reduced-echo microphone signal , for transforming the speaker signal into a filtered subband speaker signal ;
wherein the detector circuit comprises : a circuit for determining an echo path energy ratio (EPR) as an energy ratio between the output signal and the input signal and an echo canceller energy ratio (ECER) as an energy ratio between an output of the echo cancellation circuit and the output signal ;
a circuit for comparing EPR to a first predetermined threshold level and ECER to a second predetermined threshold level ;
and a sensing circuit for determining that near-end speech is present when EPR exceeds the first predetermined threshold level and at the same time ECER exceeds the second predetermined threshold level , further comprising a detector circuit responsive to the subband microphone signal , subband reduced-echo microphone signal , and reduced-noise , reduced-echo microphone signal for generating a FREEZE control signal only when the microphone signal contains near-end speech (speech actually voiced near the microphone) ;
the adaptive filter being responsive to the FREEZE control signal for disabling the filter coefficients from changing .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates (adaptive filter) on the music signal .
US5933495A
CLAIM 1
. A conditioning circuit , comprising : microphone-in and speaker-line input terminals for respectively receiving microphone and speaker signals , and a microphone-out output terminal ;
an echo canceller circuit , coupled between the microphone-in and speaker-line input terminals , for producing a subband reduced-echo microphone signal by (i) transforming the microphone signal into a subband microphone signal and the speaker signal into a filtered subband speaker signal , and (ii) subband subtracting the filtered subband speaker signal from the subband microphone signal ;
a subband noise-suppresser circuit , coupled to the echo canceller circuit and receiving said subband reduced-echo microphone signal , for producing a subband reduced-noise , reduced-echo microphone signal by subband noise suppression of the subband reduced-echo microphone signal ;
and a synthesis filter , coupled between the noise-suppresser circuit and the microphone-out terminal , for transforming the subband reduced-noise , reduced-echo microphone signal into a fullband reduced-noise , reduced-echo microphone signal ;
wherein the echo canceller circuit comprises an adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) having changeable filter coefficients responsive to feedback of the subband reduced-echo microphone signal , for transforming the speaker signal into a filtered subband speaker signal ;
wherein the detector circuit comprises : a circuit for determining an echo path energy ratio (EPR) as an energy ratio between the output signal and the input signal and an echo canceller energy ratio (ECER) as an energy ratio between an output of the echo cancellation circuit and the output signal ;
a circuit for comparing EPR to a first predetermined threshold level and ECER to a second predetermined threshold level ;
and a sensing circuit for determining that near-end speech is present when EPR exceeds the first predetermined threshold level and at the same time ECER exceeds the second predetermined threshold level , further comprising a detector circuit responsive to the subband microphone signal , subband reduced-echo microphone signal , and reduced-noise , reduced-echo microphone signal for generating a FREEZE control signal only when the microphone signal contains near-end speech (speech actually voiced near the microphone) ;
the adaptive filter being responsive to the FREEZE control signal for disabling the filter coefficients from changing .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame and an energy of the sound signal in a previous frame , for frequency bands (band signal, band noise) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US5933495A
CLAIM 1
. A conditioning circuit , comprising : microphone-in and speaker-line input terminals for respectively receiving microphone and speaker signals , and a microphone-out output terminal ;
an echo canceller circuit , coupled between the microphone-in and speaker-line input terminals , for producing a subband reduced-echo microphone signal by (i) transforming the microphone signal into a subband microphone signal and the speaker signal into a filtered subband speaker signal , and (ii) subband subtracting the filtered subband speaker signal from the subband microphone signal ;
a subband noise (frequency bands, first frequency bands, first energy) -suppresser circuit , coupled to the echo canceller circuit and receiving said subband reduced-echo microphone signal , for producing a subband reduced-noise , reduced-echo microphone signal by subband noise suppression of the subband reduced-echo microphone signal ;
and a synthesis filter , coupled between the noise-suppresser circuit and the microphone-out terminal , for transforming the subband reduced-noise , reduced-echo microphone signal into a fullband reduced-noise , reduced-echo microphone signal ;
wherein the echo canceller circuit comprises an adaptive filter having changeable filter coefficients responsive to feedback of the subband reduced-echo microphone signal , for transforming the speaker signal into a filtered subband speaker signal ;
wherein the detector circuit comprises : a circuit for determining an echo path energy ratio (EPR) as an energy ratio between the output signal and the input signal and an echo canceller energy ratio (ECER) as an energy ratio between an output of the echo cancellation circuit and the output signal ;
a circuit for comparing EPR to a first predetermined threshold level and ECER to a second predetermined threshold level ;
and a sensing circuit for determining that near-end speech is present when EPR exceeds the first predetermined threshold level and at the same time ECER exceeds the second predetermined threshold level , further comprising a detector circuit responsive to the subband microphone signal , subband reduced-echo microphone signal , and reduced-noise , reduced-echo microphone signal for generating a FREEZE control signal only when the microphone signal contains near-end speech (speech actually voiced near the microphone) ;
the adaptive filter being responsive to the FREEZE control signal for disabling the filter coefficients from changing .

US5933495A
CLAIM 4
. A method of operating a hands-free telephone comprising a loudspeaker coupled to an output signal source for generating a corresponding acoustic signal and a microphone for generating an input signal , the method comprising the steps of : reducing the presence of the output signal in the input signal by subband domain acoustic echo cancellation for generating a subband reduced-echo signal ;
and suppressing noise in the subband reduced echo signal without generation an intermediary fullband signal (frequency bands, first frequency bands, first energy) , by subband domain noise suppression for generating a subband reduced-noise reduced echo signal ;
further comprising a step of reducing any residual echo signal in the subband reduced-echo reduced-noise signal .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates (adaptive filter) is prevented in response to having simultaneously the activity prediction parameter larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US5933495A
CLAIM 1
. A conditioning circuit , comprising : microphone-in and speaker-line input terminals for respectively receiving microphone and speaker signals , and a microphone-out output terminal ;
an echo canceller circuit , coupled between the microphone-in and speaker-line input terminals , for producing a subband reduced-echo microphone signal by (i) transforming the microphone signal into a subband microphone signal and the speaker signal into a filtered subband speaker signal , and (ii) subband subtracting the filtered subband speaker signal from the subband microphone signal ;
a subband noise-suppresser circuit , coupled to the echo canceller circuit and receiving said subband reduced-echo microphone signal , for producing a subband reduced-noise , reduced-echo microphone signal by subband noise suppression of the subband reduced-echo microphone signal ;
and a synthesis filter , coupled between the noise-suppresser circuit and the microphone-out terminal , for transforming the subband reduced-noise , reduced-echo microphone signal into a fullband reduced-noise , reduced-echo microphone signal ;
wherein the echo canceller circuit comprises an adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) having changeable filter coefficients responsive to feedback of the subband reduced-echo microphone signal , for transforming the speaker signal into a filtered subband speaker signal ;
wherein the detector circuit comprises : a circuit for determining an echo path energy ratio (EPR) as an energy ratio between the output signal and the input signal and an echo canceller energy ratio (ECER) as an energy ratio between an output of the echo cancellation circuit and the output signal ;
a circuit for comparing EPR to a first predetermined threshold level and ECER to a second predetermined threshold level ;
and a sensing circuit for determining that near-end speech is present when EPR exceeds the first predetermined threshold level and at the same time ECER exceeds the second predetermined threshold level , further comprising a detector circuit responsive to the subband microphone signal , subband reduced-echo microphone signal , and reduced-noise , reduced-echo microphone signal for generating a FREEZE control signal only when the microphone signal contains near-end speech (speech actually voiced near the microphone) ;
the adaptive filter being responsive to the FREEZE control signal for disabling the filter coefficients from changing .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (band signal, band noise) into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy (band signal, band noise) value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US5933495A
CLAIM 1
. A conditioning circuit , comprising : microphone-in and speaker-line input terminals for respectively receiving microphone and speaker signals , and a microphone-out output terminal ;
an echo canceller circuit , coupled between the microphone-in and speaker-line input terminals , for producing a subband reduced-echo microphone signal by (i) transforming the microphone signal into a subband microphone signal and the speaker signal into a filtered subband speaker signal , and (ii) subband subtracting the filtered subband speaker signal from the subband microphone signal ;
a subband noise (frequency bands, first frequency bands, first energy) -suppresser circuit , coupled to the echo canceller circuit and receiving said subband reduced-echo microphone signal , for producing a subband reduced-noise , reduced-echo microphone signal by subband noise suppression of the subband reduced-echo microphone signal ;
and a synthesis filter , coupled between the noise-suppresser circuit and the microphone-out terminal , for transforming the subband reduced-noise , reduced-echo microphone signal into a fullband reduced-noise , reduced-echo microphone signal ;
wherein the echo canceller circuit comprises an adaptive filter having changeable filter coefficients responsive to feedback of the subband reduced-echo microphone signal , for transforming the speaker signal into a filtered subband speaker signal ;
wherein the detector circuit comprises : a circuit for determining an echo path energy ratio (EPR) as an energy ratio between the output signal and the input signal and an echo canceller energy ratio (ECER) as an energy ratio between an output of the echo cancellation circuit and the output signal ;
a circuit for comparing EPR to a first predetermined threshold level and ECER to a second predetermined threshold level ;
and a sensing circuit for determining that near-end speech is present when EPR exceeds the first predetermined threshold level and at the same time ECER exceeds the second predetermined threshold level , further comprising a detector circuit responsive to the subband microphone signal , subband reduced-echo microphone signal , and reduced-noise , reduced-echo microphone signal for generating a FREEZE control signal only when the microphone signal contains near-end speech (speech actually voiced near the microphone) ;
the adaptive filter being responsive to the FREEZE control signal for disabling the filter coefficients from changing .

US5933495A
CLAIM 4
. A method of operating a hands-free telephone comprising a loudspeaker coupled to an output signal source for generating a corresponding acoustic signal and a microphone for generating an input signal , the method comprising the steps of : reducing the presence of the output signal in the input signal by subband domain acoustic echo cancellation for generating a subband reduced-echo signal ;
and suppressing noise in the subband reduced echo signal without generation an intermediary fullband signal (frequency bands, first frequency bands, first energy) , by subband domain noise suppression for generating a subband reduced-noise reduced echo signal ;
further comprising a step of reducing any residual echo signal in the subband reduced-echo reduced-noise signal .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates (adaptive filter) is prevented in response to having the noise character parameter inferior than a given fixed threshold .
US5933495A
CLAIM 1
. A conditioning circuit , comprising : microphone-in and speaker-line input terminals for respectively receiving microphone and speaker signals , and a microphone-out output terminal ;
an echo canceller circuit , coupled between the microphone-in and speaker-line input terminals , for producing a subband reduced-echo microphone signal by (i) transforming the microphone signal into a subband microphone signal and the speaker signal into a filtered subband speaker signal , and (ii) subband subtracting the filtered subband speaker signal from the subband microphone signal ;
a subband noise-suppresser circuit , coupled to the echo canceller circuit and receiving said subband reduced-echo microphone signal , for producing a subband reduced-noise , reduced-echo microphone signal by subband noise suppression of the subband reduced-echo microphone signal ;
and a synthesis filter , coupled between the noise-suppresser circuit and the microphone-out terminal , for transforming the subband reduced-noise , reduced-echo microphone signal into a fullband reduced-noise , reduced-echo microphone signal ;
wherein the echo canceller circuit comprises an adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) having changeable filter coefficients responsive to feedback of the subband reduced-echo microphone signal , for transforming the speaker signal into a filtered subband speaker signal ;
wherein the detector circuit comprises : a circuit for determining an echo path energy ratio (EPR) as an energy ratio between the output signal and the input signal and an echo canceller energy ratio (ECER) as an energy ratio between an output of the echo cancellation circuit and the output signal ;
a circuit for comparing EPR to a first predetermined threshold level and ECER to a second predetermined threshold level ;
and a sensing circuit for determining that near-end speech is present when EPR exceeds the first predetermined threshold level and at the same time ECER exceeds the second predetermined threshold level , further comprising a detector circuit responsive to the subband microphone signal , subband reduced-echo microphone signal , and reduced-noise , reduced-echo microphone signal for generating a FREEZE control signal only when the microphone signal contains near-end speech (speech actually voiced near the microphone) ;
the adaptive filter being responsive to the FREEZE control signal for disabling the filter coefficients from changing .

US8990073B2
CLAIM 37
. A device as defined in claim 36 , further comprising a signal-to-noise ratio (SNR)-based sound activity detector (signal source) .
US5933495A
CLAIM 4
. A method of operating a hands-free telephone comprising a loudspeaker coupled to an output signal source (sound activity detector) for generating a corresponding acoustic signal and a microphone for generating an input signal , the method comprising the steps of : reducing the presence of the output signal in the input signal by subband domain acoustic echo cancellation for generating a subband reduced-echo signal ;
and suppressing noise in the subband reduced echo signal without generation an intermediary fullband signal , by subband domain noise suppression for generating a subband reduced-noise reduced echo signal ;
further comprising a step of reducing any residual echo signal in the subband reduced-echo reduced-noise signal .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector (signal source) comprises a comparator of an average signal to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US5933495A
CLAIM 4
. A method of operating a hands-free telephone comprising a loudspeaker coupled to an output signal source (sound activity detector) for generating a corresponding acoustic signal and a microphone for generating an input signal , the method comprising the steps of : reducing the presence of the output signal in the input signal by subband domain acoustic echo cancellation for generating a subband reduced-echo signal ;
and suppressing noise in the subband reduced echo signal without generation an intermediary fullband signal , by subband domain noise suppression for generating a subband reduced-noise reduced echo signal ;
further comprising a step of reducing any residual echo signal in the subband reduced-echo reduced-noise signal .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator for updating noise energy estimates (adaptive filter) in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector (signal source) .
US5933495A
CLAIM 4
. A method of operating a hands-free telephone comprising a loudspeaker coupled to an output signal source (sound activity detector) for generating a corresponding acoustic signal and a microphone for generating an input signal , the method comprising the steps of : reducing the presence of the output signal in the input signal by subband domain acoustic echo cancellation for generating a subband reduced-echo signal ;
and suppressing noise in the subband reduced echo signal without generation an intermediary fullband signal , by subband domain noise suppression for generating a subband reduced-noise reduced echo signal ;
further comprising a step of reducing any residual echo signal in the subband reduced-echo reduced-noise signal .

US5933495A
CLAIM 14
. A conditioning circuit , comprising : microphone-in and speaker-line input terminals for respectively receiving microphone and speaker signals , and a microphone-out output terminal ;
an echo canceller circuit , coupled between the microphone-in and speaker-line input terminals , and comprising an adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) having changeable filter coefficients responsive to feedback of the subband reduced-echo microphone signal , for producing a subband reduced-echo microphone signal by (i) transforming the microphone signal into a subband microphone signal and the speaker signal into a filtered subband speaker signal , and (ii) subband subtracting the filtered subband speaker signal from the subband microphone signal ;
a noise-suppresser circuit , coupled to the echo canceller circuit , for producing a subband reduced-noise , reduced-echo microphone signal by subband noise suppression of the subband reduced-echo microphone signal ;
a detector circuit comprising a circuit for determining an echo path energy ratio (EPR) as an energy ratio between the output signal and the input signal and an echo canceller energy ratio (ECER) as an energy ratio between an output of the echo cancellation circuit and the output signal ;
a circuit for comparing EPR to a first predetermined threshold level and ECER to a second predetermined threshold level ;
and a sensing circuit for determining that near-end speech is present when EPR exceeds the first predetermined threshold level and at the same time ECER exceeds the second predetermined threshold level , and responsive to the subband microphone signal , subband reduced- echo microphone signal , and reduced-noise , reduced-echo microphone signal for generating a FREEZE control signal only when the microphone signal contains near-end speech (speech actually voiced near the microphone) ;
the adaptive filter being responsive to the FREEZE control signal for disabling the filter coefficients from changing ;
and a synthesis filter , coupled between the noise-suppresser circuit and the microphone-out terminal , for transforming the subband reduced-noise , reduced-echo microphone signal into a fullband reduced-noise , reduced-echo microphone signal .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates (adaptive filter) .
US5933495A
CLAIM 14
. A conditioning circuit , comprising : microphone-in and speaker-line input terminals for respectively receiving microphone and speaker signals , and a microphone-out output terminal ;
an echo canceller circuit , coupled between the microphone-in and speaker-line input terminals , and comprising an adaptive filter (noise energy estimates, noise estimates, updating noise energy estimates) having changeable filter coefficients responsive to feedback of the subband reduced-echo microphone signal , for producing a subband reduced-echo microphone signal by (i) transforming the microphone signal into a subband microphone signal and the speaker signal into a filtered subband speaker signal , and (ii) subband subtracting the filtered subband speaker signal from the subband microphone signal ;
a noise-suppresser circuit , coupled to the echo canceller circuit , for producing a subband reduced-noise , reduced-echo microphone signal by subband noise suppression of the subband reduced-echo microphone signal ;
a detector circuit comprising a circuit for determining an echo path energy ratio (EPR) as an energy ratio between the output signal and the input signal and an echo canceller energy ratio (ECER) as an energy ratio between an output of the echo cancellation circuit and the output signal ;
a circuit for comparing EPR to a first predetermined threshold level and ECER to a second predetermined threshold level ;
and a sensing circuit for determining that near-end speech is present when EPR exceeds the first predetermined threshold level and at the same time ECER exceeds the second predetermined threshold level , and responsive to the subband microphone signal , subband reduced- echo microphone signal , and reduced-noise , reduced-echo microphone signal for generating a FREEZE control signal only when the microphone signal contains near-end speech (speech actually voiced near the microphone) ;
the adaptive filter being responsive to the FREEZE control signal for disabling the filter coefficients from changing ;
and a synthesis filter , coupled between the noise-suppresser circuit and the microphone-out terminal , for transforming the subband reduced-noise , reduced-echo microphone signal into a fullband reduced-noise , reduced-echo microphone signal .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US5845243A

Filed: 1997-02-03     Issued: 1998-12-01

Method and apparatus for wavelet based data compression having adaptive bit rate control for compression of audio information

(Original Assignee) U S Robotics Mobile Communications Corp     (Current Assignee) HP Inc ; U S Robotics Mobile Communications Corp ; Hewlett Packard Enterprise Development LP

Kevin Smart, Jiankan J. Yang
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (data frames) , and an initial value of the long term correlation map .
US5845243A
CLAIM 16
. A method for compressing digitally sampled audio data which has been divided into data frames (current frame) containing a predefined number of digital audio data samples , the method comprising the steps of : a) performing a discrete wavelet transform on the data frame to obtain the corresponding wavelet coefficients ;
b) decomposing the resultant wavelet coefficients into critical bands that approximate a psychoacoustic model ;
c) calculating a control parameter used to eliminate wavelet coefficients in this frame in order to achieve a desired average bit rate ;
d) selecting a quantization level for the wavelet coefficients based on a psychoacoustic model which uses one or more parameters derived from the data in the data frame ;
e) quantizing the wavelet coefficients at the selected quantization level ;
f) entropy encoding the quantized wavelet coefficients ;
and g) feeding the number of bits used to represent the entropy encoded coefficients back into the calculation of the control parameter so that the desired average bit rate is achieved .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal in the current frame (data frames) ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US5845243A
CLAIM 16
. A method for compressing digitally sampled audio data which has been divided into data frames (current frame) containing a predefined number of digital audio data samples , the method comprising the steps of : a) performing a discrete wavelet transform on the data frame to obtain the corresponding wavelet coefficients ;
b) decomposing the resultant wavelet coefficients into critical bands that approximate a psychoacoustic model ;
c) calculating a control parameter used to eliminate wavelet coefficients in this frame in order to achieve a desired average bit rate ;
d) selecting a quantization level for the wavelet coefficients based on a psychoacoustic model which uses one or more parameters derived from the data in the data frame ;
e) quantizing the wavelet coefficients at the selected quantization level ;
f) entropy encoding the quantized wavelet coefficients ;
and g) feeding the number of bits used to represent the entropy encoded coefficients back into the calculation of the control parameter so that the desired average bit rate is achieved .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal comprises comparing the summed long-term correlation map with an adaptive threshold (quantized coefficient) indicative of sound activity in the sound signal .
US5845243A
CLAIM 1
. A method for compressing digitally sampled audio data comprising the steps of : a) determining a desired average bit rate ;
b) performing a discrete wavelet transform on the digitally sampled data to obtain the resultant wavelet coefficients in such a manner that the resultant wavelet coefficients fall into critical bands that approximate a psychoacoustic model ;
c) calculating a control parameter related to the fractional percentage of wavelet coefficients which must be eliminated to achieve the desired average bit rate ;
d) using said control parameter to eliminate wavelet coefficients according to a predetermined criteria ;
e) quantizing the wavelet coefficients using a selected quantization level ;
f) entropy encoding the quantized coefficient (adaptive threshold) s ;
and g) feeding the number of bits used to represent the entropy encoded coefficients back into the calculation of the control parameter used to eliminate wavelet coefficients so that the desired average bit rate is achieved .

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (sampled data) from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US5845243A
CLAIM 1
. A method for compressing digitally sampled audio data comprising the steps of : a) determining a desired average bit rate ;
b) performing a discrete wavelet transform on the digitally sampled data (music signal) to obtain the resultant wavelet coefficients in such a manner that the resultant wavelet coefficients fall into critical bands that approximate a psychoacoustic model ;
c) calculating a control parameter related to the fractional percentage of wavelet coefficients which must be eliminated to achieve the desired average bit rate ;
d) using said control parameter to eliminate wavelet coefficients according to a predetermined criteria ;
e) quantizing the wavelet coefficients using a selected quantization level ;
f) entropy encoding the quantized coefficients ;
and g) feeding the number of bits used to represent the entropy encoded coefficients back into the calculation of the control parameter used to eliminate wavelet coefficients so that the desired average bit rate is achieved .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal prevents updating of noise energy estimates when a music signal (sampled data) is detected .
US5845243A
CLAIM 1
. A method for compressing digitally sampled audio data comprising the steps of : a) determining a desired average bit rate ;
b) performing a discrete wavelet transform on the digitally sampled data (music signal) to obtain the resultant wavelet coefficients in such a manner that the resultant wavelet coefficients fall into critical bands that approximate a psychoacoustic model ;
c) calculating a control parameter related to the fractional percentage of wavelet coefficients which must be eliminated to achieve the desired average bit rate ;
d) using said control parameter to eliminate wavelet coefficients according to a predetermined criteria ;
e) quantizing the wavelet coefficients using a selected quantization level ;
f) entropy encoding the quantized coefficients ;
and g) feeding the number of bits used to represent the entropy encoded coefficients back into the calculation of the control parameter used to eliminate wavelet coefficients so that the desired average bit rate is achieved .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter in order to distinguish a music signal (sampled data) from a background noise signal and prevent update of noise energy estimates on the music signal .
US5845243A
CLAIM 1
. A method for compressing digitally sampled audio data comprising the steps of : a) determining a desired average bit rate ;
b) performing a discrete wavelet transform on the digitally sampled data (music signal) to obtain the resultant wavelet coefficients in such a manner that the resultant wavelet coefficients fall into critical bands that approximate a psychoacoustic model ;
c) calculating a control parameter related to the fractional percentage of wavelet coefficients which must be eliminated to achieve the desired average bit rate ;
d) using said control parameter to eliminate wavelet coefficients according to a predetermined criteria ;
e) quantizing the wavelet coefficients using a selected quantization level ;
f) entropy encoding the quantized coefficients ;
and g) feeding the number of bits used to represent the entropy encoded coefficients back into the calculation of the control parameter used to eliminate wavelet coefficients so that the desired average bit rate is achieved .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame (data frames) energy and an average frame energy .
US5845243A
CLAIM 16
. A method for compressing digitally sampled audio data which has been divided into data frames (current frame) containing a predefined number of digital audio data samples , the method comprising the steps of : a) performing a discrete wavelet transform on the data frame to obtain the corresponding wavelet coefficients ;
b) decomposing the resultant wavelet coefficients into critical bands that approximate a psychoacoustic model ;
c) calculating a control parameter used to eliminate wavelet coefficients in this frame in order to achieve a desired average bit rate ;
d) selecting a quantization level for the wavelet coefficients based on a psychoacoustic model which uses one or more parameters derived from the data in the data frame ;
e) quantizing the wavelet coefficients at the selected quantization level ;
f) entropy encoding the quantized wavelet coefficients ;
and g) feeding the number of bits used to represent the entropy encoded coefficients back into the calculation of the control parameter so that the desired average bit rate is achieved .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame (data frames) and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US5845243A
CLAIM 16
. A method for compressing digitally sampled audio data which has been divided into data frames (current frame) containing a predefined number of digital audio data samples , the method comprising the steps of : a) performing a discrete wavelet transform on the data frame to obtain the corresponding wavelet coefficients ;
b) decomposing the resultant wavelet coefficients into critical bands that approximate a psychoacoustic model ;
c) calculating a control parameter used to eliminate wavelet coefficients in this frame in order to achieve a desired average bit rate ;
d) selecting a quantization level for the wavelet coefficients based on a psychoacoustic model which uses one or more parameters derived from the data in the data frame ;
e) quantizing the wavelet coefficients at the selected quantization level ;
f) entropy encoding the quantized wavelet coefficients ;
and g) feeding the number of bits used to represent the entropy encoded coefficients back into the calculation of the control parameter so that the desired average bit rate is achieved .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy (digital audio) value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US5845243A
CLAIM 16
. A method for compressing digitally sampled audio data which has been divided into data frames containing a predefined number of digital audio (second energy, second energy value, second energy values) data samples , the method comprising the steps of : a) performing a discrete wavelet transform on the data frame to obtain the corresponding wavelet coefficients ;
b) decomposing the resultant wavelet coefficients into critical bands that approximate a psychoacoustic model ;
c) calculating a control parameter used to eliminate wavelet coefficients in this frame in order to achieve a desired average bit rate ;
d) selecting a quantization level for the wavelet coefficients based on a psychoacoustic model which uses one or more parameters derived from the data in the data frame ;
e) quantizing the wavelet coefficients at the selected quantization level ;
f) entropy encoding the quantized wavelet coefficients ;
and g) feeding the number of bits used to represent the entropy encoded coefficients back into the calculation of the control parameter so that the desired average bit rate is achieved .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (data frames) , and an initial value of the long-term correlation map .
US5845243A
CLAIM 16
. A method for compressing digitally sampled audio data which has been divided into data frames (current frame) containing a predefined number of digital audio data samples , the method comprising the steps of : a) performing a discrete wavelet transform on the data frame to obtain the corresponding wavelet coefficients ;
b) decomposing the resultant wavelet coefficients into critical bands that approximate a psychoacoustic model ;
c) calculating a control parameter used to eliminate wavelet coefficients in this frame in order to achieve a desired average bit rate ;
d) selecting a quantization level for the wavelet coefficients based on a psychoacoustic model which uses one or more parameters derived from the data in the data frame ;
e) quantizing the wavelet coefficients at the selected quantization level ;
f) entropy encoding the quantized wavelet coefficients ;
and g) feeding the number of bits used to represent the entropy encoded coefficients back into the calculation of the control parameter so that the desired average bit rate is achieved .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame (data frames) , and an initial value of the long-term correlation map .
US5845243A
CLAIM 16
. A method for compressing digitally sampled audio data which has been divided into data frames (current frame) containing a predefined number of digital audio data samples , the method comprising the steps of : a) performing a discrete wavelet transform on the data frame to obtain the corresponding wavelet coefficients ;
b) decomposing the resultant wavelet coefficients into critical bands that approximate a psychoacoustic model ;
c) calculating a control parameter used to eliminate wavelet coefficients in this frame in order to achieve a desired average bit rate ;
d) selecting a quantization level for the wavelet coefficients based on a psychoacoustic model which uses one or more parameters derived from the data in the data frame ;
e) quantizing the wavelet coefficients at the selected quantization level ;
f) entropy encoding the quantized wavelet coefficients ;
and g) feeding the number of bits used to represent the entropy encoded coefficients back into the calculation of the control parameter so that the desired average bit rate is achieved .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal in the current frame (data frames) ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US5845243A
CLAIM 16
. A method for compressing digitally sampled audio data which has been divided into data frames (current frame) containing a predefined number of digital audio data samples , the method comprising the steps of : a) performing a discrete wavelet transform on the data frame to obtain the corresponding wavelet coefficients ;
b) decomposing the resultant wavelet coefficients into critical bands that approximate a psychoacoustic model ;
c) calculating a control parameter used to eliminate wavelet coefficients in this frame in order to achieve a desired average bit rate ;
d) selecting a quantization level for the wavelet coefficients based on a psychoacoustic model which uses one or more parameters derived from the data in the data frame ;
e) quantizing the wavelet coefficients at the selected quantization level ;
f) entropy encoding the quantized wavelet coefficients ;
and g) feeding the number of bits used to represent the entropy encoded coefficients back into the calculation of the control parameter so that the desired average bit rate is achieved .

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal (sampled data) from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US5845243A
CLAIM 1
. A method for compressing digitally sampled audio data comprising the steps of : a) determining a desired average bit rate ;
b) performing a discrete wavelet transform on the digitally sampled data (music signal) to obtain the resultant wavelet coefficients in such a manner that the resultant wavelet coefficients fall into critical bands that approximate a psychoacoustic model ;
c) calculating a control parameter related to the fractional percentage of wavelet coefficients which must be eliminated to achieve the desired average bit rate ;
d) using said control parameter to eliminate wavelet coefficients according to a predetermined criteria ;
e) quantizing the wavelet coefficients using a selected quantization level ;
f) entropy encoding the quantized coefficients ;
and g) feeding the number of bits used to represent the entropy encoded coefficients back into the calculation of the control parameter used to eliminate wavelet coefficients so that the desired average bit rate is achieved .

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal (sampled data) from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US5845243A
CLAIM 1
. A method for compressing digitally sampled audio data comprising the steps of : a) determining a desired average bit rate ;
b) performing a discrete wavelet transform on the digitally sampled data (music signal) to obtain the resultant wavelet coefficients in such a manner that the resultant wavelet coefficients fall into critical bands that approximate a psychoacoustic model ;
c) calculating a control parameter related to the fractional percentage of wavelet coefficients which must be eliminated to achieve the desired average bit rate ;
d) using said control parameter to eliminate wavelet coefficients according to a predetermined criteria ;
e) quantizing the wavelet coefficients using a selected quantization level ;
f) entropy encoding the quantized coefficients ;
and g) feeding the number of bits used to represent the entropy encoded coefficients back into the calculation of the control parameter used to eliminate wavelet coefficients so that the desired average bit rate is achieved .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal for distinguishing a music signal (sampled data) from a background noise signal and preventing update of noise energy estimates .
US5845243A
CLAIM 1
. A method for compressing digitally sampled audio data comprising the steps of : a) determining a desired average bit rate ;
b) performing a discrete wavelet transform on the digitally sampled data (music signal) to obtain the resultant wavelet coefficients in such a manner that the resultant wavelet coefficients fall into critical bands that approximate a psychoacoustic model ;
c) calculating a control parameter related to the fractional percentage of wavelet coefficients which must be eliminated to achieve the desired average bit rate ;
d) using said control parameter to eliminate wavelet coefficients according to a predetermined criteria ;
e) quantizing the wavelet coefficients using a selected quantization level ;
f) entropy encoding the quantized coefficients ;
and g) feeding the number of bits used to represent the entropy encoded coefficients back into the calculation of the control parameter used to eliminate wavelet coefficients so that the desired average bit rate is achieved .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
JPH10214100A

Filed: 1997-01-31     Issued: 1998-08-11

音声合成方法

(Original Assignee) Sony Corp; ソニー株式会社     

Kazuyuki Iijima, Atsushi Matsumoto, Masayuki Nishiguchi, 淳 松本, 正之 西口, 和幸 飯島
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (音声信号) using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
JPH10214100A
CLAIM 1
【請求項1】 音声信号 (sound signal) に基づく入力信号を時間軸上で フレーム単位で区分し、区分された各フレーム毎にピッ チを求めると共に有声音又は無声音のいずれかを判別 し、求められたピッチのハーモニクスを用いて有声音を 合成する音声合成方法において、 上記無声音と判別されたフレームから上記有声音と判別 されたフレームへの遷移時に上記ハーモニクスの位相を 初期化し、奇数番目のハーモニクスと偶数番目のハーモ ニクスとで異なる初期値を与えることを特徴とする音声 合成方法。

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal (音声信号) in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
JPH10214100A
CLAIM 1
【請求項1】 音声信号 (sound signal) に基づく入力信号を時間軸上で フレーム単位で区分し、区分された各フレーム毎にピッ チを求めると共に有声音又は無声音のいずれかを判別 し、求められたピッチのハーモニクスを用いて有声音を 合成する音声合成方法において、 上記無声音と判別されたフレームから上記有声音と判別 されたフレームへの遷移時に上記ハーモニクスの位相を 初期化し、奇数番目のハーモニクスと偶数番目のハーモ ニクスとで異なる初期値を与えることを特徴とする音声 合成方法。

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (音声信号) .
JPH10214100A
CLAIM 1
【請求項1】 音声信号 (sound signal) に基づく入力信号を時間軸上で フレーム単位で区分し、区分された各フレーム毎にピッ チを求めると共に有声音又は無声音のいずれかを判別 し、求められたピッチのハーモニクスを用いて有声音を 合成する音声合成方法において、 上記無声音と判別されたフレームから上記有声音と判別 されたフレームへの遷移時に上記ハーモニクスの位相を 初期化し、奇数番目のハーモニクスと偶数番目のハーモ ニクスとで異なる初期値を与えることを特徴とする音声 合成方法。

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (音声信号) comprises searching in the correlation map for frequency bins having a magnitude that exceeds a given fixed threshold .
JPH10214100A
CLAIM 1
【請求項1】 音声信号 (sound signal) に基づく入力信号を時間軸上で フレーム単位で区分し、区分された各フレーム毎にピッ チを求めると共に有声音又は無声音のいずれかを判別 し、求められたピッチのハーモニクスを用いて有声音を 合成する音声合成方法において、 上記無声音と判別されたフレームから上記有声音と判別 されたフレームへの遷移時に上記ハーモニクスの位相を 初期化し、奇数番目のハーモニクスと偶数番目のハーモ ニクスとで異なる初期値を与えることを特徴とする音声 合成方法。

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (音声信号) comprises comparing the summed long-term correlation map with an adaptive threshold indicative of sound activity in the sound signal .
JPH10214100A
CLAIM 1
【請求項1】 音声信号 (sound signal) に基づく入力信号を時間軸上で フレーム単位で区分し、区分された各フレーム毎にピッ チを求めると共に有声音又は無声音のいずれかを判別 し、求められたピッチのハーモニクスを用いて有声音を 合成する音声合成方法において、 上記無声音と判別されたフレームから上記有声音と判別 されたフレームへの遷移時に上記ハーモニクスの位相を 初期化し、奇数番目のハーモニクスと偶数番目のハーモ ニクスとで異なる初期値を与えることを特徴とする音声 合成方法。

US8990073B2
CLAIM 10
. A method for detecting sound activity in a sound signal (音声信号) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
JPH10214100A
CLAIM 1
【請求項1】 音声信号 (sound signal) に基づく入力信号を時間軸上で フレーム単位で区分し、区分された各フレーム毎にピッ チを求めると共に有声音又は無声音のいずれかを判別 し、求められたピッチのハーモニクスを用いて有声音を 合成する音声合成方法において、 上記無声音と判別されたフレームから上記有声音と判別 されたフレームへの遷移時に上記ハーモニクスの位相を 初期化し、奇数番目のハーモニクスと偶数番目のハーモ ニクスとで異なる初期値を与えることを特徴とする音声 合成方法。

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound signal (音声信号) is detected .
JPH10214100A
CLAIM 1
【請求項1】 音声信号 (sound signal) に基づく入力信号を時間軸上で フレーム単位で区分し、区分された各フレーム毎にピッ チを求めると共に有声音又は無声音のいずれかを判別 し、求められたピッチのハーモニクスを用いて有声音を 合成する音声合成方法において、 上記無声音と判別されたフレームから上記有声音と判別 されたフレームへの遷移時に上記ハーモニクスの位相を 初期化し、奇数番目のハーモニクスと偶数番目のハーモ ニクスとで異なる初期値を与えることを特徴とする音声 合成方法。

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity in the sound signal (音声信号) further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
JPH10214100A
CLAIM 1
【請求項1】 音声信号 (sound signal) に基づく入力信号を時間軸上で フレーム単位で区分し、区分された各フレーム毎にピッ チを求めると共に有声音又は無声音のいずれかを判別 し、求められたピッチのハーモニクスを用いて有声音を 合成する音声合成方法において、 上記無声音と判別されたフレームから上記有声音と判別 されたフレームへの遷移時に上記ハーモニクスの位相を 初期化し、奇数番目のハーモニクスと偶数番目のハーモ ニクスとで異なる初期値を与えることを特徴とする音声 合成方法。

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection comprises detecting the sound signal (音声信号) based on a frequency dependent signal-to-noise ratio (SNR) .
JPH10214100A
CLAIM 1
【請求項1】 音声信号 (sound signal) に基づく入力信号を時間軸上で フレーム単位で区分し、区分された各フレーム毎にピッ チを求めると共に有声音又は無声音のいずれかを判別 し、求められたピッチのハーモニクスを用いて有声音を 合成する音声合成方法において、 上記無声音と判別されたフレームから上記有声音と判別 されたフレームへの遷移時に上記ハーモニクスの位相を 初期化し、奇数番目のハーモニクスと偶数番目のハーモ ニクスとで異なる初期値を与えることを特徴とする音声 合成方法。

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal (音声信号) further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
JPH10214100A
CLAIM 1
【請求項1】 音声信号 (sound signal) に基づく入力信号を時間軸上で フレーム単位で区分し、区分された各フレーム毎にピッ チを求めると共に有声音又は無声音のいずれかを判別 し、求められたピッチのハーモニクスを用いて有声音を 合成する音声合成方法において、 上記無声音と判別されたフレームから上記有声音と判別 されたフレームへの遷移時に上記ハーモニクスの位相を 初期化し、奇数番目のハーモニクスと偶数番目のハーモ ニクスとで異なる初期値を与えることを特徴とする音声 合成方法。

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal (音声信号) and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
JPH10214100A
CLAIM 1
【請求項1】 音声信号 (sound signal) に基づく入力信号を時間軸上で フレーム単位で区分し、区分された各フレーム毎にピッ チを求めると共に有声音又は無声音のいずれかを判別 し、求められたピッチのハーモニクスを用いて有声音を 合成する音声合成方法において、 上記無声音と判別されたフレームから上記有声音と判別 されたフレームへの遷移時に上記ハーモニクスの位相を 初期化し、奇数番目のハーモニクスと偶数番目のハーモ ニクスとで異なる初期値を与えることを特徴とする音声 合成方法。

US8990073B2
CLAIM 18
. A method as defined in claim 14 , comprising classifying the sound signal (音声信号) as one of an inactive sound signal and active sound signal , which comprises determining an inactive sound signal when the average signal-to-noise ratio (SNR av ) is inferior to the calculated threshold .
JPH10214100A
CLAIM 1
【請求項1】 音声信号 (sound signal) に基づく入力信号を時間軸上で フレーム単位で区分し、区分された各フレーム毎にピッ チを求めると共に有声音又は無声音のいずれかを判別 し、求められたピッチのハーモニクスを用いて有声音を 合成する音声合成方法において、 上記無声音と判別されたフレームから上記有声音と判別 されたフレームへの遷移時に上記ハーモニクスの位相を 初期化し、奇数番目のハーモニクスと偶数番目のハーモ ニクスとで異なる初期値を与えることを特徴とする音声 合成方法。

US8990073B2
CLAIM 19
. A method as defined in claim 14 , comprising classifying the sound signal (音声信号) as one of an inactive sound signal and active sound signal , which comprises determining an active sound signal when the average signal-to-noise ratio (SNR av ) is larger than the calculated threshold .
JPH10214100A
CLAIM 1
【請求項1】 音声信号 (sound signal) に基づく入力信号を時間軸上で フレーム単位で区分し、区分された各フレーム毎にピッ チを求めると共に有声音又は無声音のいずれかを判別 し、求められたピッチのハーモニクスを用いて有声音を 合成する音声合成方法において、 上記無声音と判別されたフレームから上記有声音と判別 されたフレームへの遷移時に上記ハーモニクスの位相を 初期化し、奇数番目のハーモニクスと偶数番目のハーモ ニクスとで異なる初期値を与えることを特徴とする音声 合成方法。

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal (音声信号) prevents updating of noise energy estimates when a music signal is detected .
JPH10214100A
CLAIM 1
【請求項1】 音声信号 (sound signal) に基づく入力信号を時間軸上で フレーム単位で区分し、区分された各フレーム毎にピッ チを求めると共に有声音又は無声音のいずれかを判別 し、求められたピッチのハーモニクスを用いて有声音を 合成する音声合成方法において、 上記無声音と判別されたフレームから上記有声音と判別 されたフレームへの遷移時に上記ハーモニクスの位相を 初期化し、奇数番目のハーモニクスと偶数番目のハーモ ニクスとで異なる初期値を与えることを特徴とする音声 合成方法。

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal (音声信号) in a current frame and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
JPH10214100A
CLAIM 1
【請求項1】 音声信号 (sound signal) に基づく入力信号を時間軸上で フレーム単位で区分し、区分された各フレーム毎にピッ チを求めると共に有声音又は無声音のいずれかを判別 し、求められたピッチのハーモニクスを用いて有声音を 合成する音声合成方法において、 上記無声音と判別されたフレームから上記有声音と判別 されたフレームへの遷移時に上記ハーモニクスの位相を 初期化し、奇数番目のハーモニクスと偶数番目のハーモ ニクスとで異なる初期値を与えることを特徴とする音声 合成方法。

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter indicative of an activity of the sound signal (音声信号) .
JPH10214100A
CLAIM 1
【請求項1】 音声信号 (sound signal) に基づく入力信号を時間軸上で フレーム単位で区分し、区分された各フレーム毎にピッ チを求めると共に有声音又は無声音のいずれかを判別 し、求められたピッチのハーモニクスを用いて有声音を 合成する音声合成方法において、 上記無声音と判別されたフレームから上記有声音と判別 されたフレームへの遷移時に上記ハーモニクスの位相を 初期化し、奇数番目のハーモニクスと偶数番目のハーモ ニクスとで異なる初期値を与えることを特徴とする音声 合成方法。

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal (音声信号) and the complementary non-stationarity parameter .
JPH10214100A
CLAIM 1
【請求項1】 音声信号 (sound signal) に基づく入力信号を時間軸上で フレーム単位で区分し、区分された各フレーム毎にピッ チを求めると共に有声音又は無声音のいずれかを判別 し、求められたピッチのハーモニクスを用いて有声音を 合成する音声合成方法において、 上記無声音と判別されたフレームから上記有声音と判別 されたフレームへの遷移時に上記ハーモニクスの位相を 初期化し、奇数番目のハーモニクスと偶数番目のハーモ ニクスとで異なる初期値を与えることを特徴とする音声 合成方法。

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group (えること) of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
JPH10214100A
CLAIM 1
【請求項1】 音声信号に基づく入力信号を時間軸上で フレーム単位で区分し、区分された各フレーム毎にピッ チを求めると共に有声音又は無声音のいずれかを判別 し、求められたピッチのハーモニクスを用いて有声音を 合成する音声合成方法において、 上記無声音と判別されたフレームから上記有声音と判別 されたフレームへの遷移時に上記ハーモニクスの位相を 初期化し、奇数番目のハーモニクスと偶数番目のハーモ ニクスとで異なる初期値を与えること (first group) を特徴とする音声 合成方法。

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal (音声信号) using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
JPH10214100A
CLAIM 1
【請求項1】 音声信号 (sound signal) に基づく入力信号を時間軸上で フレーム単位で区分し、区分された各フレーム毎にピッ チを求めると共に有声音又は無声音のいずれかを判別 し、求められたピッチのハーモニクスを用いて有声音を 合成する音声合成方法において、 上記無声音と判別されたフレームから上記有声音と判別 されたフレームへの遷移時に上記ハーモニクスの位相を 初期化し、奇数番目のハーモニクスと偶数番目のハーモ ニクスとで異なる初期値を与えることを特徴とする音声 合成方法。

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal (音声信号) using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
JPH10214100A
CLAIM 1
【請求項1】 音声信号 (sound signal) に基づく入力信号を時間軸上で フレーム単位で区分し、区分された各フレーム毎にピッ チを求めると共に有声音又は無声音のいずれかを判別 し、求められたピッチのハーモニクスを用いて有声音を 合成する音声合成方法において、 上記無声音と判別されたフレームから上記有声音と判別 されたフレームへの遷移時に上記ハーモニクスの位相を 初期化し、奇数番目のハーモニクスと偶数番目のハーモ ニクスとで異なる初期値を与えることを特徴とする音声 合成方法。

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal (音声信号) in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
JPH10214100A
CLAIM 1
【請求項1】 音声信号 (sound signal) に基づく入力信号を時間軸上で フレーム単位で区分し、区分された各フレーム毎にピッ チを求めると共に有声音又は無声音のいずれかを判別 し、求められたピッチのハーモニクスを用いて有声音を 合成する音声合成方法において、 上記無声音と判別されたフレームから上記有声音と判別 されたフレームへの遷移時に上記ハーモニクスの位相を 初期化し、奇数番目のハーモニクスと偶数番目のハーモ ニクスとで異なる初期値を与えることを特徴とする音声 合成方法。

US8990073B2
CLAIM 34
. A device as defined in claim 31 , further comprising a detector of strong tones in the sound signal (音声信号) .
JPH10214100A
CLAIM 1
【請求項1】 音声信号 (sound signal) に基づく入力信号を時間軸上で フレーム単位で区分し、区分された各フレーム毎にピッ チを求めると共に有声音又は無声音のいずれかを判別 し、求められたピッチのハーモニクスを用いて有声音を 合成する音声合成方法において、 上記無声音と判別されたフレームから上記有声音と判別 されたフレームへの遷移時に上記ハーモニクスの位相を 初期化し、奇数番目のハーモニクスと偶数番目のハーモ ニクスとで異なる初期値を与えることを特徴とする音声 合成方法。

US8990073B2
CLAIM 35
. A device for detecting sound activity in a sound signal (音声信号) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
JPH10214100A
CLAIM 1
【請求項1】 音声信号 (sound signal) に基づく入力信号を時間軸上で フレーム単位で区分し、区分された各フレーム毎にピッ チを求めると共に有声音又は無声音のいずれかを判別 し、求められたピッチのハーモニクスを用いて有声音を 合成する音声合成方法において、 上記無声音と判別されたフレームから上記有声音と判別 されたフレームへの遷移時に上記ハーモニクスの位相を 初期化し、奇数番目のハーモニクスと偶数番目のハーモ ニクスとで異なる初期値を与えることを特徴とする音声 合成方法。

US8990073B2
CLAIM 36
. A device for detecting sound activity in a sound signal (音声信号) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator (フレーム) of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
JPH10214100A
CLAIM 1
【請求項1】 音声信号 (sound signal) に基づく入力信号を時間軸上で フレーム (tonal stability tonal stability estimator) 単位で区分し、区分された各フレーム毎にピッ チを求めると共に有声音又は無声音のいずれかを判別 し、求められたピッチのハーモニクスを用いて有声音を 合成する音声合成方法において、 上記無声音と判別されたフレームから上記有声音と判別 されたフレームへの遷移時に上記ハーモニクスの位相を 初期化し、奇数番目のハーモニクスと偶数番目のハーモ ニクスとで異なる初期値を与えることを特徴とする音声 合成方法。

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal (音声信号) for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates .
JPH10214100A
CLAIM 1
【請求項1】 音声信号 (sound signal) に基づく入力信号を時間軸上で フレーム単位で区分し、区分された各フレーム毎にピッ チを求めると共に有声音又は無声音のいずれかを判別 し、求められたピッチのハーモニクスを用いて有声音を 合成する音声合成方法において、 上記無声音と判別されたフレームから上記有声音と判別 されたフレームへの遷移時に上記ハーモニクスの位相を 初期化し、奇数番目のハーモニクスと偶数番目のハーモ ニクスとで異なる初期値を与えることを特徴とする音声 合成方法。

US8990073B2
CLAIM 41
. A device as defined in claim 36 , further comprising a calculator of a spectral parameter used for detecting spectral changes and spectral attacks in the sound signal (音声信号) .
JPH10214100A
CLAIM 1
【請求項1】 音声信号 (sound signal) に基づく入力信号を時間軸上で フレーム単位で区分し、区分された各フレーム毎にピッ チを求めると共に有声音又は無声音のいずれかを判別 し、求められたピッチのハーモニクスを用いて有声音を 合成する音声合成方法において、 上記無声音と判別されたフレームから上記有声音と判別 されたフレームへの遷移時に上記ハーモニクスの位相を 初期化し、奇数番目のハーモニクスと偶数番目のハーモ ニクスとで異なる初期値を与えることを特徴とする音声 合成方法。




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6097820A

Filed: 1996-12-23     Issued: 2000-08-01

System and method for suppressing noise in digitally represented voice signals

(Original Assignee) Nokia of America Corp     (Current Assignee) Nokia of America Corp

Michael D. Turner
US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin (fast Fourier transform) by frequency bin basis ;

and summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US6097820A
CLAIM 5
. The noise suppressor as recited in claim 1 wherein said frequency domain transformation circuitry and said time domain transformation circuitry each comprise fast Fourier transform (frequency bin) (FFT) circuitry .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection in the sound signal further comprises using noise energy estimates calculated in a previous frame in a SNR calculation (noise ratio) .
US6097820A
CLAIM 1
. A noise suppressor that increases a signal to noise ratio (noise ratio, SNR LT, SNR calculation) of time domain audio data , comprising : frequency domain transformation circuitry that transforms a frame of said time domain audio data into a frame of frequency domain audio data ;
noise background modeling circuitry , coupled to said frequency domain transformation circuitry , that spectrally analyzes said frame of frequency domain audio data and exponentially smooths said frame with past frames of said frequency domain audio data to model an estimated noise background spectrum thereof ;
a frequency domain suppression filter , coupled to said noise background modeling circuitry , that filters at least some of said noise background spectrum from said frame of frequency domain audio data ;
and time domain transformation circuitry , coupled to said frequency domain suppression filter , that transforms said frame back into said time domain , said transformed frame of time domain audio data having an increased signal to noise ratio .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity detection further comprises updating the noise estimates (said model) for a next frame .
US6097820A
CLAIM 13
. A noise suppressor that increases a signal to noise ratio of time domain digital audio data , comprising : a voice activation detector (VAD) that detects when a frame of said time domain digital audio data contains substantially no signal ;
initial fast Fourier transformation (FFT) circuitry that buffers and transforms said frame of time domain digital audio data into a frame of frequency domain digital audio data ;
noise background modeling circuitry , coupled to said VAD and said initial FFT circuitry , that spectrally analyzes said frame of frequency domain digital audio data and exponentially smooths said frame of frequency domain digital audio data with past frames of said frequency domain digital audio data to update a model of an estimated noise background spectrum thereof when said VAD detects that said frame contains substantially no signal ;
a frequency domain suppression filter , coupled to said noise background modeling circuitry , that filters at least some of said noise background spectrum from said frame of said frequency domain digital audio data as a function of said model (noise estimates, noise estimator) ;
and subsequent FFT circuitry , coupled to said frequency domain suppression filter , that transforms said frame of frequency domain digital audio data back into said time domain , said transformed frame of time domain digital audio data having an increased signal to noise ratio .

US8990073B2
CLAIM 21
. A method as defined in claim 10 , further comprising calculating a complementary non-stationarity parameter and a noise character parameter (speech signal, said time) in order to distinguish a music signal from a background noise signal and prevent update of noise energy estimates on the music signal .
US6097820A
CLAIM 1
. A noise suppressor that increases a signal to noise ratio of time domain audio data , comprising : frequency domain transformation circuitry that transforms a frame of said time (noise character parameter, activity prediction parameter) domain audio data into a frame of frequency domain audio data ;
noise background modeling circuitry , coupled to said frequency domain transformation circuitry , that spectrally analyzes said frame of frequency domain audio data and exponentially smooths said frame with past frames of said frequency domain audio data to model an estimated noise background spectrum thereof ;
a frequency domain suppression filter , coupled to said noise background modeling circuitry , that filters at least some of said noise background spectrum from said frame of frequency domain audio data ;
and time domain transformation circuitry , coupled to said frequency domain suppression filter , that transforms said frame back into said time domain , said transformed frame of time domain audio data having an increased signal to noise ratio .

US6097820A
CLAIM 3
. The noise suppressor as recited in claim 1 wherein said noise background modeling circuitry is coupled to a voice activity detector (VAD) , said noise background modeling circuitry modeling said estimated noise background spectrum as a function of a speech/no speech signal (noise character parameter, activity prediction parameter) received from said VAD .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame energy (noise suppressor) and an average frame energy .
US6097820A
CLAIM 1
. A noise suppressor (current frame energy) that increases a signal to noise ratio of time domain audio data , comprising : frequency domain transformation circuitry that transforms a frame of said time domain audio data into a frame of frequency domain audio data ;
noise background modeling circuitry , coupled to said frequency domain transformation circuitry , that spectrally analyzes said frame of frequency domain audio data and exponentially smooths said frame with past frames of said frequency domain audio data to model an estimated noise background spectrum thereof ;
a frequency domain suppression filter , coupled to said noise background modeling circuitry , that filters at least some of said noise background spectrum from said frame of frequency domain audio data ;
and time domain transformation circuitry , coupled to said frequency domain suppression filter , that transforms said frame back into said time domain , said transformed frame of time domain audio data having an increased signal to noise ratio .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter (speech signal, said time) indicative of an activity of the sound signal .
US6097820A
CLAIM 1
. A noise suppressor that increases a signal to noise ratio of time domain audio data , comprising : frequency domain transformation circuitry that transforms a frame of said time (noise character parameter, activity prediction parameter) domain audio data into a frame of frequency domain audio data ;
noise background modeling circuitry , coupled to said frequency domain transformation circuitry , that spectrally analyzes said frame of frequency domain audio data and exponentially smooths said frame with past frames of said frequency domain audio data to model an estimated noise background spectrum thereof ;
a frequency domain suppression filter , coupled to said noise background modeling circuitry , that filters at least some of said noise background spectrum from said frame of frequency domain audio data ;
and time domain transformation circuitry , coupled to said frequency domain suppression filter , that transforms said frame back into said time domain , said transformed frame of time domain audio data having an increased signal to noise ratio .

US6097820A
CLAIM 3
. The noise suppressor as recited in claim 1 wherein said noise background modeling circuitry is coupled to a voice activity detector (VAD) , said noise background modeling circuitry modeling said estimated noise background spectrum as a function of a speech/no speech signal (noise character parameter, activity prediction parameter) received from said VAD .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter (speech signal, said time) comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal and the complementary non-stationarity parameter .
US6097820A
CLAIM 1
. A noise suppressor that increases a signal to noise ratio of time domain audio data , comprising : frequency domain transformation circuitry that transforms a frame of said time (noise character parameter, activity prediction parameter) domain audio data into a frame of frequency domain audio data ;
noise background modeling circuitry , coupled to said frequency domain transformation circuitry , that spectrally analyzes said frame of frequency domain audio data and exponentially smooths said frame with past frames of said frequency domain audio data to model an estimated noise background spectrum thereof ;
a frequency domain suppression filter , coupled to said noise background modeling circuitry , that filters at least some of said noise background spectrum from said frame of frequency domain audio data ;
and time domain transformation circuitry , coupled to said frequency domain suppression filter , that transforms said frame back into said time domain , said transformed frame of time domain audio data having an increased signal to noise ratio .

US6097820A
CLAIM 3
. The noise suppressor as recited in claim 1 wherein said noise background modeling circuitry is coupled to a voice activity detector (VAD) , said noise background modeling circuitry modeling said estimated noise background spectrum as a function of a speech/no speech signal (noise character parameter, activity prediction parameter) received from said VAD .

US8990073B2
CLAIM 27
. A method as defined in claim 25 , wherein the update of the noise energy estimates is prevented in response to having simultaneously the activity prediction parameter (speech signal, said time) larger than a first given fixed threshold and the complementary non-stationarity parameter larger than a second given fixed threshold .
US6097820A
CLAIM 1
. A noise suppressor that increases a signal to noise ratio of time domain audio data , comprising : frequency domain transformation circuitry that transforms a frame of said time (noise character parameter, activity prediction parameter) domain audio data into a frame of frequency domain audio data ;
noise background modeling circuitry , coupled to said frequency domain transformation circuitry , that spectrally analyzes said frame of frequency domain audio data and exponentially smooths said frame with past frames of said frequency domain audio data to model an estimated noise background spectrum thereof ;
a frequency domain suppression filter , coupled to said noise background modeling circuitry , that filters at least some of said noise background spectrum from said frame of frequency domain audio data ;
and time domain transformation circuitry , coupled to said frequency domain suppression filter , that transforms said frame back into said time domain , said transformed frame of time domain audio data having an increased signal to noise ratio .

US6097820A
CLAIM 3
. The noise suppressor as recited in claim 1 wherein said noise background modeling circuitry is coupled to a voice activity detector (VAD) , said noise background modeling circuitry modeling said estimated noise background spectrum as a function of a speech/no speech signal (noise character parameter, activity prediction parameter) received from said VAD .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter (speech signal, said time) comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy (digital audio) value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US6097820A
CLAIM 1
. A noise suppressor that increases a signal to noise ratio of time domain audio data , comprising : frequency domain transformation circuitry that transforms a frame of said time (noise character parameter, activity prediction parameter) domain audio data into a frame of frequency domain audio data ;
noise background modeling circuitry , coupled to said frequency domain transformation circuitry , that spectrally analyzes said frame of frequency domain audio data and exponentially smooths said frame with past frames of said frequency domain audio data to model an estimated noise background spectrum thereof ;
a frequency domain suppression filter , coupled to said noise background modeling circuitry , that filters at least some of said noise background spectrum from said frame of frequency domain audio data ;
and time domain transformation circuitry , coupled to said frequency domain suppression filter , that transforms said frame back into said time domain , said transformed frame of time domain audio data having an increased signal to noise ratio .

US6097820A
CLAIM 3
. The noise suppressor as recited in claim 1 wherein said noise background modeling circuitry is coupled to a voice activity detector (VAD) , said noise background modeling circuitry modeling said estimated noise background spectrum as a function of a speech/no speech signal (noise character parameter, activity prediction parameter) received from said VAD .

US6097820A
CLAIM 13
. A noise suppressor that increases a signal to noise ratio of time domain digital audio (second energy, second energy value, second energy values) data , comprising : a voice activation detector (VAD) that detects when a frame of said time domain digital audio data contains substantially no signal ;
initial fast Fourier transformation (FFT) circuitry that buffers and transforms said frame of time domain digital audio data into a frame of frequency domain digital audio data ;
noise background modeling circuitry , coupled to said VAD and said initial FFT circuitry , that spectrally analyzes said frame of frequency domain digital audio data and exponentially smooths said frame of frequency domain digital audio data with past frames of said frequency domain digital audio data to update a model of an estimated noise background spectrum thereof when said VAD detects that said frame contains substantially no signal ;
a frequency domain suppression filter , coupled to said noise background modeling circuitry , that filters at least some of said noise background spectrum from said frame of said frequency domain digital audio data as a function of said model ;
and subsequent FFT circuitry , coupled to said frequency domain suppression filter , that transforms said frame of frequency domain digital audio data back into said time domain , said transformed frame of time domain digital audio data having an increased signal to noise ratio .

US8990073B2
CLAIM 29
. A method as defined in claim 28 , wherein the update of the noise energy estimates is prevented in response to having the noise character parameter (speech signal, said time) inferior than a given fixed threshold .
US6097820A
CLAIM 1
. A noise suppressor that increases a signal to noise ratio of time domain audio data , comprising : frequency domain transformation circuitry that transforms a frame of said time (noise character parameter, activity prediction parameter) domain audio data into a frame of frequency domain audio data ;
noise background modeling circuitry , coupled to said frequency domain transformation circuitry , that spectrally analyzes said frame of frequency domain audio data and exponentially smooths said frame with past frames of said frequency domain audio data to model an estimated noise background spectrum thereof ;
a frequency domain suppression filter , coupled to said noise background modeling circuitry , that filters at least some of said noise background spectrum from said frame of frequency domain audio data ;
and time domain transformation circuitry , coupled to said frequency domain suppression filter , that transforms said frame back into said time domain , said transformed frame of time domain audio data having an increased signal to noise ratio .

US6097820A
CLAIM 3
. The noise suppressor as recited in claim 1 wherein said noise background modeling circuitry is coupled to a voice activity detector (VAD) , said noise background modeling circuitry modeling said estimated noise background spectrum as a function of a speech/no speech signal (noise character parameter, activity prediction parameter) received from said VAD .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin (fast Fourier transform) by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins so as to produce a summed long-term correlation map .
US6097820A
CLAIM 5
. The noise suppressor as recited in claim 1 wherein said frequency domain transformation circuitry and said time domain transformation circuitry each comprise fast Fourier transform (frequency bin) (FFT) circuitry .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity detector comprises a comparator of an average signal to noise ratio (noise ratio) (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US6097820A
CLAIM 1
. A noise suppressor that increases a signal to noise ratio (noise ratio, SNR LT, SNR calculation) of time domain audio data , comprising : frequency domain transformation circuitry that transforms a frame of said time domain audio data into a frame of frequency domain audio data ;
noise background modeling circuitry , coupled to said frequency domain transformation circuitry , that spectrally analyzes said frame of frequency domain audio data and exponentially smooths said frame with past frames of said frequency domain audio data to model an estimated noise background spectrum thereof ;
a frequency domain suppression filter , coupled to said noise background modeling circuitry , that filters at least some of said noise background spectrum from said frame of frequency domain audio data ;
and time domain transformation circuitry , coupled to said frequency domain suppression filter , that transforms said frame back into said time domain , said transformed frame of time domain audio data having an increased signal to noise ratio .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator (said model) for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity detector .
US6097820A
CLAIM 13
. A noise suppressor that increases a signal to noise ratio of time domain digital audio data , comprising : a voice activation detector (VAD) that detects when a frame of said time domain digital audio data contains substantially no signal ;
initial fast Fourier transformation (FFT) circuitry that buffers and transforms said frame of time domain digital audio data into a frame of frequency domain digital audio data ;
noise background modeling circuitry , coupled to said VAD and said initial FFT circuitry , that spectrally analyzes said frame of frequency domain digital audio data and exponentially smooths said frame of frequency domain digital audio data with past frames of said frequency domain digital audio data to update a model of an estimated noise background spectrum thereof when said VAD detects that said frame contains substantially no signal ;
a frequency domain suppression filter , coupled to said noise background modeling circuitry , that filters at least some of said noise background spectrum from said frame of said frequency domain digital audio data as a function of said model (noise estimates, noise estimator) ;
and subsequent FFT circuitry , coupled to said frequency domain suppression filter , that transforms said frame of frequency domain digital audio data back into said time domain , said transformed frame of time domain digital audio data having an increased signal to noise ratio .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US6570991B1

Filed: 1996-12-18     Issued: 2003-05-27

Multi-feature speech/music discrimination system

(Original Assignee) Interval Research Corp     (Current Assignee) Vulcan Patents LLC

Eric D. Scheirer, Malcolm Slaney
US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value (successive frames) with the previous residual spectrum , over frequency bins between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US6570991B1
CLAIM 11
. The method of claim 1 wherein said audio signal is divided into a sequence of frames and further including the steps of classifying each frame of the test sample as relating to speech or music , examining the classifications for a plurality of successive frames (correlation value) , and determining a final classification on the basis of the examined classifications .

US8990073B2
CLAIM 22
. A method as defined in claim 21 , further comprising : detecting a spectral attack ;

calculating the complementary non-stationarity parameter based on an element selected from the group consisting of a current frame energy and an average frame energy (dimensional feature space, feature values) .
US6570991B1
CLAIM 1
. A method for discriminating between speech and music content in an audio signal , comprising the steps of : selecting a set of audio signal samples ;
measuring values for a plurality of features in each sample of said set of samples ;
defining a multi-dimensional feature space (average frame energy) containing data points which respectively correspond to the measured feature values (average frame energy) for each sample , and labelling each data point as relating to speech or music ;
measuring feature values for a test sample of an audio signal and determining a corresponding data point in said feature space ;
determining the label for at least one data point in said feature space which is close to the data point corresponding to said test sample ;
and classifying the test sample in accordance with the determined label .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal in a current frame and an energy of the sound signal in a previous frame , for frequency bands (frequency bands) higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US6570991B1
CLAIM 6
. The method of claim 1 wherein one of said features is a pulse metric which identifies correspondence of modulation frequency peaks in different respective frequency bands (frequency bands) of the audio signal .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands (frequency bands) into a first group of a certain number of first frequency bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US6570991B1
CLAIM 1
. A method for discriminating between speech and music content in an audio signal , comprising the steps of : selecting a set of audio signal samples ;
measuring values for a plurality of features in each sample of said set of samples ;
defining a multi-dimensional feature space containing data (first frequency bands) points which respectively correspond to the measured feature values for each sample , and labelling each data point as relating to speech or music ;
measuring feature values for a test sample of an audio signal and determining a corresponding data point in said feature space ;
determining the label for at least one data point in said feature space which is close to the data point corresponding to said test sample ;
and classifying the test sample in accordance with the determined label .

US6570991B1
CLAIM 6
. The method of claim 1 wherein one of said features is a pulse metric which identifies correspondence of modulation frequency peaks in different respective frequency bands (frequency bands) of the audio signal .

US6570991B1
CLAIM 7
. The method of claim 1 wherein one of said features is measured by the steps of determining the mean power for a series of frames of said audio signal , and determining the proportion of frames in said series (first frequency) whose power is less than a predetermined fraction of said mean power .




US8990073B2

Filed: 2007-06-22     Issued: 2015-03-24

Method and device for sound activity detection and sound signal classification

(Original Assignee) VoiceAge Corp     (Current Assignee) Voiceage Evs LLC

Vladimir Malenovsky, Milan Jelinek, Tommy Vaillancourt, Redwan Salami
US5839101A

Filed: 1996-12-10     Issued: 1998-11-17

Noise suppressor and method for suppressing background noise in noisy speech, and a mobile station

(Original Assignee) Nokia Mobile Phones Ltd     (Current Assignee) Nokia Technologies Oy

Antti Vahatalo, Juha Hakkinen, Erkki Paajanen, Ville-Veikko Mattila
US8990073B2
CLAIM 1
. A method for estimating a tonal stability of a sound signal (noise component) using a frequency spectrum of the sound signal , the method comprising : calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long term correlation map .
US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US8990073B2
CLAIM 2
. A method as defined in claim 1 , wherein calculating the current residual spectrum comprises : searching for the minima in the frequency spectrum of the sound signal (noise component) in the current frame ;

estimating the spectral floor by connecting the minima of the frequency spectrum with each other ;

and subtracting the estimated spectral floor from the frequency spectrum of the sound signal in the current frame so as to produce the current residual spectrum .
US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US8990073B2
CLAIM 4
. A method as defined in claim 1 , wherein calculating the correlation map comprises : for each detected peak in the current residual spectrum , calculating a normalized correlation value with the previous residual spectrum , over frequency bins (noise estimation) between two consecutive minima in the current residual spectrum that delimit the peak ;

assigning a score to each detected peak , the score corresponding to the normalized correlation value ;

and for each detected peak , assigning the normalized correlation value of the peak over the frequency bins between the two consecutive minima that delimit the peak so as to form the correlation map .
US5839101A
CLAIM 10
. A noise suppressor according to claim 7 , wherein it comprises noise estimation (frequency bins) means for estimating the level of said noise and for storing the value of said level and that during each analyzed speech signal the value of a noise estimate is updated only if the voice activity detector has not detected speech during a certain time before and after each detected speech signal .

US8990073B2
CLAIM 5
. A method as defined in claim 1 , wherein calculating the long-teen correlation map comprises : filtering the correlation map through a one-pole filter on a frequency bin by frequency bin basis ;

and summing the filtered correlation map over the frequency bins (noise estimation) so as to produce a summed long-term correlation map .
US5839101A
CLAIM 10
. A noise suppressor according to claim 7 , wherein it comprises noise estimation (frequency bins) means for estimating the level of said noise and for storing the value of said level and that during each analyzed speech signal the value of a noise estimate is updated only if the voice activity detector has not detected speech during a certain time before and after each detected speech signal .

US8990073B2
CLAIM 6
. A method as defined in claim 1 , further comprising detecting strong tones in the sound signal (noise component) .
US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US8990073B2
CLAIM 7
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (noise component) comprises searching in the correlation map for frequency bins (noise estimation) having a magnitude that exceeds a given fixed threshold .
US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US5839101A
CLAIM 10
. A noise suppressor according to claim 7 , wherein it comprises noise estimation (frequency bins) means for estimating the level of said noise and for storing the value of said level and that during each analyzed speech signal the value of a noise estimate is updated only if the voice activity detector has not detected speech during a certain time before and after each detected speech signal .

US8990073B2
CLAIM 8
. A method as defined in claim 6 , wherein detecting the strong tones in the sound signal (noise component) comprises comparing the summed long-term correlation map with an adaptive threshold (sampling means) indicative of sound activity (noise component) in the sound signal .
US5839101A
CLAIM 3
. A noise suppressor according to claim 1 , wherein it comprises sampling means (adaptive threshold) for sampling the speech signal into samples in time domain , windowing means for framing samples into a frame , processing means for forming frequency domain components of said frame , that the spectrum forming means are arranged to form said spectrum components from the frequency domain components , that the recombination means are arranged to recombine the second amount of spectrum components into a calculation spectrum component representing said calculation signal , that the determination means comprise calculation means for calculating a suppression coefficient for said calculation spectrum component based upon noise contained in the latter , and that the suppression means comprise a multiplier for multiplying the frequency domain components corresponding to the spectrum components recombined into the calculation spectrum component by said suppression coefficient , in order to form noise-suppressed frequency domain components , and that it comprises means for converting said noise-suppressed frequency domain components into a time domain signal and for outputting it as a noise-suppressed output signal .

US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US8990073B2
CLAIM 10
. A method for detecting sound activity (noise component) in a sound signal (noise component) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the method comprising : estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimation is performed according to claim 1 .
US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US8990073B2
CLAIM 11
. A method as defined in claim 10 , further comprising preventing update of noise energy estimates when a tonal sound signal (noise component) is detected .
US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US8990073B2
CLAIM 12
. A method as defined in claim 10 , wherein detecting the sound activity (noise component) in the sound signal (noise component) further comprises using a signal-to-noise ratio (SNR)-based sound activity detection .
US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US8990073B2
CLAIM 13
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (noise component) detection comprises detecting the sound signal (noise component) based on a frequency dependent signal-to-noise ratio (SNR) .
US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US8990073B2
CLAIM 14
. A method as defined in claim 12 , wherein using the signal-to-noise ratio (SNR)-based sound activity (noise component) detection comprises comparing an average signal-to-noise ratio (SNR av ) to a threshold calculated as a function of a long-term signal-to-noise ratio (SNR LT ) .
US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US8990073B2
CLAIM 15
. A method as defined in claim 14 , wherein using the signal-to-noise ratio (SNR)-based sound activity (noise component) detection in the sound signal (noise component) further comprises using noise energy estimates calculated in a previous frame in a SNR calculation .
US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US8990073B2
CLAIM 16
. A method as defined in claim 15 , wherein using the signal-to-noise ratio (SNR)-based sound activity (noise component) detection further comprises updating the noise estimates for a next frame .
US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US8990073B2
CLAIM 17
. A method as defined in claim 16 , wherein updating the noise energy estimates for a next frame comprises calculating an update decision based on at least one of a pitch stability , a voicing , a non-stationarity parameter of the sound signal (noise component) and a ratio between a second order and a sixteenth order of linear prediction residual error energies .
US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US8990073B2
CLAIM 18
. A method as defined in claim 14 , comprising classifying the sound signal (noise component) as one of an inactive sound signal and active sound signal , which comprises determining an inactive sound signal when the average signal-to-noise ratio (SNR av ) is inferior to the calculated threshold .
US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US8990073B2
CLAIM 19
. A method as defined in claim 14 , comprising classifying the sound signal (noise component) as one of an inactive sound signal and active sound signal , which comprises determining an active sound signal when the average signal-to-noise ratio (SNR av ) is larger than the calculated threshold .
US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US8990073B2
CLAIM 20
. A method as defined in claim 10 , wherein estimating the parameter related to the tonal stability tonal stability of the sound signal (noise component) prevents updating of noise energy estimates when a music signal is detected .
US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US8990073B2
CLAIM 24
. A method as defined in claim 23 , wherein calculating the spectral diversity parameter comprises : calculating a ratio between an energy of the sound signal (noise component) in a current frame and an energy of the sound signal in a previous frame , for frequency bands higher than a given number ;

and calculating the spectral diversity as a weighted sum of the computed ratio over all the frequency bands higher than the given number .
US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US8990073B2
CLAIM 25
. A method as defined in claim 22 , wherein calculating the complementary non-stationarity parameter further comprises calculating an activity prediction parameter indicative of an activity of the sound signal (noise component) .
US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US8990073B2
CLAIM 26
. A method as defined in claim 25 , wherein calculating the activity prediction parameter comprises : calculating a long-term value of a binary decision obtained from estimating the parameter related to the tonal stability tonal stability of the sound signal (noise component) and the complementary non-stationarity parameter .
US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US8990073B2
CLAIM 28
. A method as defined in claim 21 , wherein calculating the noise character parameter comprises : dividing a plurality of frequency bands into a first group of a certain number of first frequency (first frequency) bands and a second group of a rest of the frequency bands ;

calculating a first energy value for the first group of frequency bands and a second energy value of the second group of frequency bands ;

calculating a ratio between the first and second energy values so as to produce the noise character parameter ;

and calculating a long-term value of the noise character parameter based on the calculated noise character parameter .
US5839101A
CLAIM 1
. A noise suppressor for suppressing noise in a speech signal , which suppressor comprises means for dividing said speech signal in a first amount of subsignals , which subsignals represent certain first frequency (first frequency) ranges , and suppression means for suppressing noise in a subsignal based upon a determined suppression coefficient , wherein it additionally comprises recombination means for recombining a second amount of subsignals into a calculation signal which represents a certain second frequency range , which is wider than said first frequency ranges , determination means for determining a suppression coefficient for the calculation signal based upon noise contained in it , and that the suppression means are arranged to suppress the subsignals recombined into the calculation signal , with said suppression coefficient determined based upon the calculation signal .

US8990073B2
CLAIM 30
. A device for estimating a tonal stability tonal stability of a sound signal (noise component) using a frequency spectrum of the sound signal , the device comprising : means for calculating a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

means for detecting a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

means for calculating a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and means for identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US8990073B2
CLAIM 31
. A device for estimating a tonal stability tonal stability of a sound signal (noise component) using a frequency spectrum of the sound signal , the device comprising : a calculator of a current residual spectrum of the sound signal by subtracting from the frequency spectrum of the sound signal a spectral floor defined by minima of the frequency spectrum ;

a detector of a plurality of peaks in the current residual spectrum as pieces of the current residual spectrum between pairs of successive minima of the current residual spectrum ;

a calculator of a correlation map between each detected peak of the current residual spectrum and a shape in a previous residual spectrum corresponding to the position of the detected peak ;

and a calculator identifying the tonal stability of the sound signal based on calculating a long-term correlation map , wherein the long-term correlation map is calculated based on an update factor , the correlation map of a current frame , and an initial value of the long-term correlation map .
US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US8990073B2
CLAIM 32
. A device as defined in claim 31 , wherein the calculator of the current residual spectrum comprises : a locator of the minima in the frequency spectrum of the sound signal (noise component) in the current frame ;

an estimator of the spectral floor which connects the minima of the frequency spectrum with each other ;

and a subtractor of the estimated spectral floor from the frequency spectrum so as to produce the current residual spectrum .
US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US8990073B2
CLAIM 33
. A device as defined in claim 31 , wherein the calculator of the long-term correlation map comprises : a filter for filtering the correlation map on a frequency bin by frequency bin basis ;

and an adder for summing the filtered correlation map over the frequency bins (noise estimation) so as to produce a summed long-term correlation map .
US5839101A
CLAIM 10
. A noise suppressor according to claim 7 , wherein it comprises noise estimation (frequency bins) means for estimating the level of said noise and for storing the value of said level and that during each analyzed speech signal the value of a noise estimate is updated only if the voice activity detector has not detected speech during a certain time before and after each detected speech signal .

US8990073B2
CLAIM 34
. A device as defined in claim 31 , further comprising a detector of strong tones in the sound signal (noise component) .
US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US8990073B2
CLAIM 35
. A device for detecting sound activity (noise component) in a sound signal (noise component) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : means for estimating a parameter related to a tonal stability tonal stability of the sound signal used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability parameter estimation means comprises a device according to claim 30 .
US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US8990073B2
CLAIM 36
. A device for detecting sound activity (noise component) in a sound signal (noise component) , wherein the sound signal is classified as one of an inactive sound signal and an active sound signal according to the detected sound activity in the sound signal , the device comprising : a tonal stability tonal stability estimator of the sound signal , used for distinguishing a music signal from a background noise signal ;

wherein the tonal stability tonal stability estimator comprises a device according to claim 31 .
US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US8990073B2
CLAIM 37
. A device as defined in claim 36 , further comprising a signal-to-noise ratio (SNR)-based sound activity (noise component) detector .
US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US8990073B2
CLAIM 38
. A device as defined in claim 37 , wherein the (SNR)-based sound activity (noise component) detector comprises a comparator of an average signal (noise component) to noise ratio (SNR av ) with a threshold which is a function of a long-term signal to noise ratio (SNR LT ) .
US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US8990073B2
CLAIM 39
. A device as defined in claim 37 , further comprising a noise estimator (domain signal) for updating noise energy estimates in a calculation of a signal-to-noise ratio (SNR) in the SNR-based sound activity (noise component) detector .
US5839101A
CLAIM 3
. A noise suppressor according to claim 1 , wherein it comprises sampling means for sampling the speech signal into samples in time domain , windowing means for framing samples into a frame , processing means for forming frequency domain components of said frame , that the spectrum forming means are arranged to form said spectrum components from the frequency domain components , that the recombination means are arranged to recombine the second amount of spectrum components into a calculation spectrum component representing said calculation signal , that the determination means comprise calculation means for calculating a suppression coefficient for said calculation spectrum component based upon noise contained in the latter , and that the suppression means comprise a multiplier for multiplying the frequency domain components corresponding to the spectrum components recombined into the calculation spectrum component by said suppression coefficient , in order to form noise-suppressed frequency domain components , and that it comprises means for converting said noise-suppressed frequency domain components into a time domain signal (noise estimator) and for outputting it as a noise-suppressed output signal .

US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US8990073B2
CLAIM 40
. A device as defined in claim 36 , further comprising a calculator of a complementary non-stationarity parameter and a calculator of a noise character of the sound signal (noise component) for distinguishing a music signal from a background noise signal and preventing update of noise energy estimates .
US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .

US8990073B2
CLAIM 41
. A device as defined in claim 36 , further comprising a calculator of a spectral parameter used for detecting spectral changes and spectral attacks in the sound signal (noise component) .
US5839101A
CLAIM 4
. A noise suppressor according to claim 3 , wherein said calculation means comprise means for determining the mean level of a noise component (average signal, sound signal, sound activity, sound activity detection, sound activity detector, detecting sound activity, sound signal prevents updating) and a speech component contained in the input signal and means for calculating the suppression coefficient for said calculation spectrum component , based upon said noise and speech levels .