术语表

  • Front end 前端
  • vocoder 声音合成机(声码器)
  • MFCC
  • 受限波尔曼兹机
  • bap band aperiodicity
  • ASR:Automatic Speech Recognition自动语音识别
  • AM:声学模型
  • LM:语言模型
  • HMM:Hiden Markov Model 输出序列用于描述语音的特征向量,状态序列表示相应的文字
  • HTS:HMM-based Speech Synthesis System语音合成工具包
  • HTK:Hidden Markov Model Toolkit 语音识别的工具包
  • 自编码器
  • SPTK:speech signal precessing toolkit
  • SPSS : 统计参数语音合成statistical parametric speech synthesis
  • pitch 音高:表示声音(基本)频率的高低
  • Timbre 音色
  • Zero Crossing Rate 过零率
  • Volume 音量
  • sil silence
  • syllable 音节
  • intonation 声调,语调,抑扬顿挫
  • POS part of speech
  • mgc
  • mcep Mel-Generalized Cepstral Reprfesentation
  • mcc mel cepstral coefficents
  • mfcc Mel Frequency Cepstral Coefficents
  • LSP: Line Spectral Pair线谱对参数
  • 多个音素的 命名规则
    • monophone 单音素
    • biphone diphone 两音素
    • triphone 三音素
    • quadphone 四音素
  • utterance 语音,发声
  • 英语韵律符号系统ToBI(Tone and Break Index)
  • CD-DNN-HMM(Context-Dependent DNN-HMM)
  • frontend :The part of a TTS system that transforms plain text into a linguistic representation is called a frontend
  • .wpa word to phonetic alphabet
  • .cmp Composed acoustic features
  • .scp system control program
  • .mlf master label file
  • .pam phonetic alphabets to model
  • .mgc mel generalized cepstral feature
  • .lf0 log f0 a representation of pitch(音高) 音高用基频表示
  • .mgc
  • .utt .utt files are the linguistic representation of the text that Festival outputs(full context training labels)
  • .cfg
  • initial && final 声母和韵母

缩略语表(摘自文献[5])

  • AM Acoustic Model,声学模型
  • ACR Absolute Category Rating,绝对等级评定
  • ASR Automatic Speech Recognition,自动语音识别
  • CART Classification and Regression Tree,分类回归树
  • CCR Comparison Category Rating,比较等级评定
  • CFHMM Continuous F0,连续基频模型
  • CMLLR Constrained Maximum Likelihood Linear Regression,受限最大似然线性回归
  • CMOS Comparison Mean Opinion Score,比较平均意见分
  • CORC Correlation Coefficient,相关系数
  • CR Command-Response,命令响应
  • CSMAPLR Constrained Structural Maximum A Posterior Linear Regression,受限结构化最大后验概率线性回归
  • DBN Dynamic Bayesian Network,动态贝叶斯网络
  • DCR Degradation Category Rating,损伤等级评定
  • DCT Discrete Cosine Transform,离散余弦变换
  • DMOS Degradation Mean Opinion Score,损伤平均意见分
  • ED Emotion Dependent,特定情感
  • EM Expectation Maximization,期望最大化
  • F0 Fundamental Frequency,基音频率
  • GMM Gaussian Mixture Model,高斯混合模型
  • GTD Global Tied Distribution,全局绑定分布
  • HMM Hidden Markov Model,隐马尔科夫模型
  • HNR Harmony Noise Ratio,谐波噪声比
  • HSS HMM-based Speech Synthesis,基于HMM的语音合成
  • HSMM Hidden Semi-Markov Model,隐半马尔科夫模型
  • HTK HMM Tool Kit,HMM工具包
  • HTS HMM-based Speech Synthesis System,基于HMM的语音合成系统
  • LPC Linear Prediction Coefficient,线性预测系数
  • MAP Maximum A Posterior,最大后验概率
  • MCD Mel-Cepstral Distortion,倒谱系数失真
  • MDL Minimum Description Length,最小描述长度
  • MDS Multi-Dimensional Scaling,多维标度
  • MGCC Mel-Generalized Cepstral Coefficient,梅尔广义倒谱系数
  • MLI Maximum Likelihood Increase,最大似然增量
  • MLSA Mel Log Spectral Approximation,梅尔对数谱近似
  • MLLR Maximum Likelihood Linear Regression,最大似然线性回归
  • MLPG Maximum Likelihood Parameter Generation,最大似然参数生成
  • MOS Mean Opinion Score,平均意见分
  • MSD Multi-Space Distribution,多空间分布
  • PiTAR Pitch Target Realisation,基频目标实现
  • PM Prosodic Model,韵律模型
  • RMSE Root-Mean-Square-Error,根均方误差
  • SA Speaker Adaptation,说话人自适应
  • SI Speaker Independent,说话人无关
  • SMAP Structural Maximum A Posterior,结构化最大后验概率
  • SMAPLR Structural Maximum A Posterior Linear Regression,结构化最大后验概率线性回归
  • SPTK Speech Processing Tool Kit,语音处理工具包
  • SSM Supra-Segmental Model,超音段模型
  • SSML Speech Synthesis Markup Language,语音合成标记语言
  • TA Target Approximation,目标逼近
  • ToBI Tone and Break Index,调式与停顿标记
  • TTS Text-To-Speech,文语转换
  • VC Voice Conversion,声音转换
  • VFS Vector Field Smoothing,矢量场平滑
  • VPR Voice Print Recognition,声纹识别
  • VTLN Vocal Tract Length Normalization,声道长度规整