术语表¶
- Front end 前端
- vocoder 声音合成机(声码器)
- MFCC
- 受限波尔曼兹机
- bap band aperiodicity
- ASR:Automatic Speech Recognition自动语音识别
- AM:声学模型
- LM:语言模型
- HMM:Hiden Markov Model 输出序列用于描述语音的特征向量,状态序列表示相应的文字
- HTS:HMM-based Speech Synthesis System语音合成工具包
- HTK:Hidden Markov Model Toolkit 语音识别的工具包
- 自编码器
- SPTK:speech signal precessing toolkit
- SPSS : 统计参数语音合成statistical parametric speech synthesis
- pitch 音高:表示声音(基本)频率的高低
- Timbre 音色
- Zero Crossing Rate 过零率
- Volume 音量
- sil silence
- syllable 音节
- intonation 声调,语调,抑扬顿挫
- POS part of speech
- mgc
- mcep Mel-Generalized Cepstral Reprfesentation
- mcc mel cepstral coefficents
- mfcc Mel Frequency Cepstral Coefficents
- LSP: Line Spectral Pair线谱对参数
- 多个音素的 命名规则
- monophone 单音素
- biphone diphone 两音素
- triphone 三音素
- quadphone 四音素
- utterance 语音,发声
- 英语韵律符号系统ToBI(Tone and Break Index)
- CD-DNN-HMM(Context-Dependent DNN-HMM)
- frontend :The part of a TTS system that transforms plain text into a linguistic representation is called a frontend
- .wpa word to phonetic alphabet
- .cmp Composed acoustic features
- .scp system control program
- .mlf master label file
- .pam phonetic alphabets to model
- .mgc mel generalized cepstral feature
- .lf0 log f0 a representation of pitch(音高) 音高用基频表示
- .mgc
- .utt .utt files are the linguistic representation of the text that Festival outputs(full context training labels)
- .cfg
- initial && final 声母和韵母
缩略语表(摘自文献[5])
- AM Acoustic Model,声学模型
- ACR Absolute Category Rating,绝对等级评定
- ASR Automatic Speech Recognition,自动语音识别
- CART Classification and Regression Tree,分类回归树
- CCR Comparison Category Rating,比较等级评定
- CFHMM Continuous F0,连续基频模型
- CMLLR Constrained Maximum Likelihood Linear Regression,受限最大似然线性回归
- CMOS Comparison Mean Opinion Score,比较平均意见分
- CORC Correlation Coefficient,相关系数
- CR Command-Response,命令响应
- CSMAPLR Constrained Structural Maximum A Posterior Linear Regression,受限结构化最大后验概率线性回归
- DBN Dynamic Bayesian Network,动态贝叶斯网络
- DCR Degradation Category Rating,损伤等级评定
- DCT Discrete Cosine Transform,离散余弦变换
- DMOS Degradation Mean Opinion Score,损伤平均意见分
- ED Emotion Dependent,特定情感
- EM Expectation Maximization,期望最大化
- F0 Fundamental Frequency,基音频率
- GMM Gaussian Mixture Model,高斯混合模型
- GTD Global Tied Distribution,全局绑定分布
- HMM Hidden Markov Model,隐马尔科夫模型
- HNR Harmony Noise Ratio,谐波噪声比
- HSS HMM-based Speech Synthesis,基于HMM的语音合成
- HSMM Hidden Semi-Markov Model,隐半马尔科夫模型
- HTK HMM Tool Kit,HMM工具包
- HTS HMM-based Speech Synthesis System,基于HMM的语音合成系统
- LPC Linear Prediction Coefficient,线性预测系数
- MAP Maximum A Posterior,最大后验概率
- MCD Mel-Cepstral Distortion,倒谱系数失真
- MDL Minimum Description Length,最小描述长度
- MDS Multi-Dimensional Scaling,多维标度
- MGCC Mel-Generalized Cepstral Coefficient,梅尔广义倒谱系数
- MLI Maximum Likelihood Increase,最大似然增量
- MLSA Mel Log Spectral Approximation,梅尔对数谱近似
- MLLR Maximum Likelihood Linear Regression,最大似然线性回归
- MLPG Maximum Likelihood Parameter Generation,最大似然参数生成
- MOS Mean Opinion Score,平均意见分
- MSD Multi-Space Distribution,多空间分布
- PiTAR Pitch Target Realisation,基频目标实现
- PM Prosodic Model,韵律模型
- RMSE Root-Mean-Square-Error,根均方误差
- SA Speaker Adaptation,说话人自适应
- SI Speaker Independent,说话人无关
- SMAP Structural Maximum A Posterior,结构化最大后验概率
- SMAPLR Structural Maximum A Posterior Linear Regression,结构化最大后验概率线性回归
- SPTK Speech Processing Tool Kit,语音处理工具包
- SSM Supra-Segmental Model,超音段模型
- SSML Speech Synthesis Markup Language,语音合成标记语言
- TA Target Approximation,目标逼近
- ToBI Tone and Break Index,调式与停顿标记
- TTS Text-To-Speech,文语转换
- VC Voice Conversion,声音转换
- VFS Vector Field Smoothing,矢量场平滑
- VPR Voice Print Recognition,声纹识别
- VTLN Vocal Tract Length Normalization,声道长度规整