データフロー

処理パイプライン

Phase 1: 動画前処理

入力: testData/画面収録 *.mov (x2)
  │
  ├─ ffprobe: メタデータ取得
  │   └─ 解像度, FPS, 長さ, コーデック
  │
  ├─ ffmpeg: フォーマット変換
  │   ├─ MOV → MP4 (H.264)
  │   ├─ 解像度を 1920x1080 に正規化
  │   └─ FPS を 30 に正規化
  │
  └─ 出力: public/videos/video1.mp4, video2.mp4

Phase 2: 字幕スクリプト準備

字幕スクリプトは JSON 形式で定義する。

json

{
  "segments": [
    {
      "id": 1,
      "videoIndex": 0,
      "startTime": 0.0,
      "endTime": 5.0,
      "text": "まずは画面の構成を確認するのだ",
      "speaker": "zundamon"
    },
    {
      "id": 2,
      "videoIndex": 0,
      "startTime": 5.5,
      "endTime": 10.0,
      "text": "ここでファイルを開いているのだ",
      "speaker": "zundamon"
    }
  ]
}

Phase 3: VOICEVOX 音声合成

字幕スクリプト (JSON)
  │
  ├─ 各 segment について:
  │   │
  │   ├─ POST /audio_query
  │   │   body: { text, speaker: 3 }
  │   │   response: AudioQuery (音素情報含む)
  │   │
  │   ├─ POST /synthesis
  │   │   body: AudioQuery
  │   │   response: WAV バイナリ
  │   │
  │   └─ 音素タイミング抽出
  │       └─ AudioQuery.accent_phrases[].moras[]
  │           ├─ text: "ま"
  │           ├─ vowel: "a"
  │           └─ vowel_length: 0.15 (秒)
  │
  └─ 出力:
      ├─ public/voices/segment-001.wav
      ├─ public/voices/segment-002.wav
      └─ src/data/phoneme-timeline.json

Phase 4: 音素タイムライン

口パクアニメーション用の音素タイムラインデータ:

json

{
  "segments": [
    {
      "segmentId": 1,
      "phonemes": [
        { "time": 0.0, "duration": 0.15, "vowel": "a" },
        { "time": 0.15, "duration": 0.12, "vowel": "u" },
        { "time": 0.27, "duration": 0.10, "vowel": "silent" },
        { "time": 0.37, "duration": 0.14, "vowel": "a" }
      ]
    }
  ]
}

Phase 5: Remotion レンダリング

入力データ:
  ├─ video1.mp4, video2.mp4 (前処理済み動画)
  ├─ segment-*.wav (音声ファイル群)
  ├─ script.json (字幕スクリプト)
  └─ phoneme-timeline.json (口パクデータ)
  │
  ├─ Remotion Bundle 作成
  │   └─ webpack でコンポーネントをバンドル
  │
  ├─ renderMedia()
  │   ├─ codec: "h264"
  │   ├─ outputLocation: "output/final.mp4"
  │   ├─ fps: 30
  │   └─ compositionId: "MainComposition"
  │
  └─ 出力: output/final.mp4

データ型定義

typescript

// 字幕セグメント
interface SubtitleSegment {
  id: number;
  videoIndex: number;     // 0: 動画1, 1: 動画2
  startTime: number;      // 秒
  endTime: number;        // 秒
  text: string;           // 字幕テキスト
  speaker: 'zundamon';    // キャラクター
}

// 音素データ
interface PhonemeEntry {
  time: number;           // 開始時刻（秒）
  duration: number;       // 長さ（秒）
  vowel: 'a' | 'i' | 'u' | 'e' | 'o' | 'N' | 'silent';
}

// VOICEVOX AudioQuery のモーラ
interface Mora {
  text: string;
  vowel: string;
  vowel_length: number;
  pitch: number;
}

// コンポジション Props
interface CompositionProps {
  videos: string[];
  subtitles: SubtitleSegment[];
  voices: string[];
  phonemes: PhonemeEntry[][];
}

データフロー ​

処理パイプライン ​

Phase 1: 動画前処理 ​

Phase 2: 字幕スクリプト準備 ​

Phase 3: VOICEVOX 音声合成 ​

Phase 4: 音素タイムライン ​

Phase 5: Remotion レンダリング ​

データ型定義 ​

データフロー

処理パイプライン

Phase 1: 動画前処理

Phase 2: 字幕スクリプト準備

Phase 3: VOICEVOX 音声合成

Phase 4: 音素タイムライン

Phase 5: Remotion レンダリング

データ型定義