字幕生成パイプライン

概要

画面収録動画の内容に対応する解説字幕を生成し、VOICEVOX での音声合成、Remotion でのレンダリングまでの一連のパイプラインを定義する。

パイプライン全体像

[1] 字幕スクリプト作成 (手動 or AI)
         │
         v
[2] VOICEVOX 音声合成 (バッチ処理)
         │
         v
[3] 音声長に基づくタイミング調整
         │
         v
[4] 口パク用音素データ生成
         │
         v
[5] Remotion 入力データ統合
         │
         v
[6] レンダリング実行

[1] 字幕スクリプト作成

入力フォーマット: `script.json`

json

{
  "title": "画面収録解説動画",
  "videos": [
    {
      "file": "video1.mp4",
      "duration": 301
    },
    {
      "file": "video2.mp4",
      "duration": 413
    }
  ],
  "segments": [
    {
      "id": 1,
      "videoIndex": 0,
      "startTime": 0.0,
      "text": "はい、それではこの画面収録の内容を解説していくのだ"
    },
    {
      "id": 2,
      "videoIndex": 0,
      "startTime": 8.0,
      "text": "まずはこのファイルを開いて作業を始めるのだ"
    }
  ]
}

ポイント

startTime は動画内の相対時刻（秒）
endTime は音声合成後に自動算出される
videoIndex で対象動画を指定

[2] VOICEVOX バッチ音声合成

処理スクリプト: `scripts/generate-voices.ts`

typescript

import { createAudioQuery, synthesize } from '../lib/voicevox';
import { writeFileSync, mkdirSync } from 'fs';
import script from '../data/script.json';

async function generateAllVoices() {
  mkdirSync('public/voices', { recursive: true });

  for (const segment of script.segments) {
    console.log(`[${segment.id}] 音声合成: "${segment.text}"`);

    // 1. AudioQuery 生成
    const query = await createAudioQuery(segment.text);

    // 2. 音声合成
    const audio = await synthesize(query);

    // 3. WAV ファイル出力
    const filename = `segment-${String(segment.id).padStart(3, '0')}.wav`;
    writeFileSync(`public/voices/${filename}`, Buffer.from(audio));

    // 4. 音素データ保存
    const phonemes = extractPhonemes(query);
    // ... phoneme-timeline.json に追記

    // 5. 音声の長さを取得して endTime を算出
    const audioDuration = getWavDuration(audio);
    segment.endTime = segment.startTime + audioDuration;

    console.log(`  → ${filename} (${audioDuration.toFixed(2)}s)`);
  }
}

[3] タイミング自動調整

音声合成後に各セグメントの endTime を確定し、次のセグメントとの重複を防ぐ。

typescript

function adjustTimings(segments: SubtitleSegment[]): SubtitleSegment[] {
  const sorted = [...segments].sort(
    (a, b) => a.videoIndex - b.videoIndex || a.startTime - b.startTime
  );

  for (let i = 0; i < sorted.length - 1; i++) {
    const current = sorted[i];
    const next = sorted[i + 1];

    if (current.videoIndex === next.videoIndex) {
      // 同一動画内で重複チェック
      if (current.endTime > next.startTime) {
        // 次のセグメントの開始を遅らせる
        next.startTime = current.endTime + 0.5; // 0.5秒の間隔
      }
    }
  }

  return sorted;
}

[4] 口パクデータ統合

VOICEVOX の AudioQuery から抽出した音素データを、Remotion コンポジション用のフォーマットに変換する。

typescript

interface PhonemeTimeline {
  segmentId: number;
  // セグメント基準の相対時刻
  phonemes: PhonemeEntry[];
  // 動画基準の絶対時刻（Remotion 用）
  absolutePhonemes: PhonemeEntry[];
}

function toAbsoluteTimeline(
  timeline: PhonemeTimeline,
  segment: SubtitleSegment
): PhonemeTimeline {
  return {
    ...timeline,
    absolutePhonemes: timeline.phonemes.map(p => ({
      ...p,
      time: p.time + segment.startTime,
    })),
  };
}

[5] Remotion 入力データ統合

最終的に Remotion のコンポジションに渡すデータをまとめる。

`remotion-input.json`

json

{
  "composition": {
    "width": 1920,
    "height": 1080,
    "fps": 30,
    "totalDurationFrames": 21420
  },
  "videos": [
    {
      "file": "videos/video1.mp4",
      "startFrame": 0,
      "durationFrames": 9030
    },
    {
      "file": "videos/video2.mp4",
      "startFrame": 9030,
      "durationFrames": 12390
    }
  ],
  "subtitles": [
    {
      "id": 1,
      "startFrame": 0,
      "endFrame": 120,
      "text": "はい、それではこの画面収録の内容を解説していくのだ"
    }
  ],
  "voices": [
    {
      "file": "voices/segment-001.wav",
      "startFrame": 0
    }
  ],
  "phonemes": [
    {
      "segmentId": 1,
      "entries": [
        { "frame": 0, "duration": 5, "vowel": "a" },
        { "frame": 5, "duration": 4, "vowel": "i" }
      ]
    }
  ]
}

[6] レンダリング実行

CLI コマンド

bash

# プレビュー
npx remotion preview

# レンダリング
npx remotion render MainComposition output/final.mp4

# 高品質レンダリング
npx remotion render MainComposition output/final.mp4 \
  --codec h264 \
  --crf 18 \
  --fps 30

プログラムからの実行

typescript

import { bundle } from '@remotion/bundler';
import { renderMedia, selectComposition } from '@remotion/renderer';

async function render() {
  const bundled = await bundle({
    entryPoint: './src/index.ts',
  });

  const composition = await selectComposition({
    serveUrl: bundled,
    id: 'MainComposition',
  });

  await renderMedia({
    composition,
    serveUrl: bundled,
    codec: 'h264',
    outputLocation: 'output/final.mp4',
  });
}

エラーハンドリング

エラー	原因	対処
VOICEVOX 接続エラー	エンジン未起動	`docker run -p 50021:50021 voicevox/voicevox_engine`
音声合成タイムアウト	テキストが長すぎる	1文あたり50文字以内に分割
FFmpeg エラー	未インストール	`brew install ffmpeg`
Remotion メモリ不足	動画が大きすぎる	`--concurrency 1` でレンダリング

字幕生成パイプライン ​

概要 ​

パイプライン全体像 ​

[1] 字幕スクリプト作成 ​

入力フォーマット: script.json ​

[2] VOICEVOX バッチ音声合成 ​

処理スクリプト: scripts/generate-voices.ts ​

[3] タイミング自動調整 ​

[4] 口パクデータ統合 ​

[5] Remotion 入力データ統合 ​

remotion-input.json ​

[6] レンダリング実行 ​

CLI コマンド ​

プログラムからの実行 ​

エラーハンドリング ​

字幕生成パイプライン

概要

パイプライン全体像

[1] 字幕スクリプト作成

入力フォーマット: `script.json`

[2] VOICEVOX バッチ音声合成

処理スクリプト: `scripts/generate-voices.ts`

[3] タイミング自動調整

[4] 口パクデータ統合

[5] Remotion 入力データ統合

`remotion-input.json`

[6] レンダリング実行

CLI コマンド

プログラムからの実行

エラーハンドリング