VOICEVOX API

概要

VOICEVOX はローカルで動作する日本語音声合成エンジン。HTTP API を提供し、テキストからずんだもんの音声を生成する。

前提

項目	値
エンドポイント	`http://localhost:50021`
ずんだもん Speaker ID	3 (ノーマル)
出力形式	WAV (PCM 16bit, 24kHz)

ずんだもんのスタイル一覧

Speaker ID	スタイル
3	ずんだもん（ノーマル）
1	ずんだもん（あまあま）
7	ずんだもん（ツンツン）
5	ずんだもん（セクシー）
22	ずんだもん（ささやき）
38	ずんだもん（ヒソヒソ）

API フロー

1. 音声合成クエリ生成

POST /audio_query?text={テキスト}&speaker=3

リクエスト例:

bash

curl -X POST "http://localhost:50021/audio_query?text=こんにちは&speaker=3" \
  -H "Content-Type: application/json"

レスポンス (AudioQuery):

json

{
  "accent_phrases": [
    {
      "moras": [
        {
          "text": "コ",
          "consonant": "k",
          "consonant_length": 0.065,
          "vowel": "o",
          "vowel_length": 0.112,
          "pitch": 5.86
        },
        {
          "text": "ン",
          "consonant": null,
          "consonant_length": null,
          "vowel": "N",
          "vowel_length": 0.089,
          "pitch": 5.92
        }
      ],
      "accent": 3,
      "pause_mora": null
    }
  ],
  "speedScale": 1.0,
  "pitchScale": 0.0,
  "intonationScale": 1.0,
  "volumeScale": 1.0,
  "prePhonemeLength": 0.1,
  "postPhonemeLength": 0.1,
  "outputSamplingRate": 24000,
  "outputStereo": false
}

2. 音声合成

POST /synthesis?speaker=3
Content-Type: application/json
Body: AudioQuery (上記レスポンスをそのまま送信)

リクエスト例:

bash

curl -X POST "http://localhost:50021/synthesis?speaker=3" \
  -H "Content-Type: application/json" \
  -d @audio_query.json \
  --output output.wav

レスポンス: WAV バイナリデータ

TypeScript クライアント実装

typescript

// lib/voicevox.ts

const VOICEVOX_BASE = 'http://localhost:50021';
const ZUNDAMON_SPEAKER_ID = 3;

interface AudioQuery {
  accent_phrases: AccentPhrase[];
  speedScale: number;
  pitchScale: number;
  intonationScale: number;
  volumeScale: number;
  prePhonemeLength: number;
  postPhonemeLength: number;
  outputSamplingRate: number;
  outputStereo: boolean;
}

interface AccentPhrase {
  moras: Mora[];
  accent: number;
  pause_mora: Mora | null;
}

interface Mora {
  text: string;
  consonant: string | null;
  consonant_length: number | null;
  vowel: string;
  vowel_length: number;
  pitch: number;
}

// 音声合成クエリを取得
export async function createAudioQuery(text: string): Promise<AudioQuery> {
  const res = await fetch(
    `${VOICEVOX_BASE}/audio_query?text=${encodeURIComponent(text)}&speaker=${ZUNDAMON_SPEAKER_ID}`,
    { method: 'POST' }
  );
  return res.json();
}

// 音声を合成して WAV バイナリを取得
export async function synthesize(query: AudioQuery): Promise<ArrayBuffer> {
  const res = await fetch(
    `${VOICEVOX_BASE}/synthesis?speaker=${ZUNDAMON_SPEAKER_ID}`,
    {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify(query),
    }
  );
  return res.arrayBuffer();
}

// テキストから音声ファイルを生成し、音素タイムラインも返す
export async function generateVoice(text: string): Promise<{
  audio: ArrayBuffer;
  phonemes: PhonemeEntry[];
}> {
  const query = await createAudioQuery(text);
  const audio = await synthesize(query);
  const phonemes = extractPhonemes(query);
  return { audio, phonemes };
}

音声パラメータ調整

パラメータ	説明	推奨値
`speedScale`	話速 (1.0 = 標準)	1.0 ~ 1.2
`pitchScale`	ピッチ (0.0 = 標準)	0.0
`intonationScale`	抑揚 (1.0 = 標準)	1.0 ~ 1.2
`volumeScale`	音量 (1.0 = 標準)	1.0
`prePhonemeLength`	発話前無音 (秒)	0.1
`postPhonemeLength`	発話後無音 (秒)	0.1

VOICEVOX API ​

概要 ​

前提 ​

ずんだもんのスタイル一覧 ​

API フロー ​

1. 音声合成クエリ生成 ​

2. 音声合成 ​

TypeScript クライアント実装 ​

音声パラメータ調整 ​

VOICEVOX API

概要

前提

ずんだもんのスタイル一覧

API フロー

1. 音声合成クエリ生成

2. 音声合成

TypeScript クライアント実装

音声パラメータ調整