利用 TextAlive App API 與 three.js 製作互動式 PV - Magical Mirai 2020 Programming Contest 入門教學

Oct 04, 2020

首圖是本教學的 Demo，搭配グリーンライツ・セレナーデ。同時也有ブレス・ユア・ブレス和愛されなくても君がいる的版本。

TextAlive App API 與 Magical Mirai 2020 Programming Contest

TextAlive App API 是在 Web 環境中，針對歌曲 PV、MV 的資料擷取工具。可以取得播放當下的歌詞、旋律、畫面等相關資訊，也可以搭配其他 Web 環境的函式庫製作出像上方 Demo 的互動式 PV。

Magical Mirai 2020 Programming Contest（マジカルミライ 2020 プログラミング・コンテスト）是今年 Magical Mirai 新舉辦的程式設計比賽。比賽方式是由參賽者使用 TextAlive App API 針對指定歌曲創作 WebApps 並投稿。

指定歌曲包含近三年的 Magical Mirai 主題曲，作品必須至少能搭配其中一首：

所投稿的程式必須是靜態網頁，並且不能包含任何 Gulp 或 Webpack 等建置流程。而送件方式是使用 GitHub Private Repo 上傳後與官方帳號共享，並填寫表單。同時也可以錄製額外的 Demo 影片。

本篇教學著重在 API 使用；其他事項如評分標準等，請參考活動官網或未來群像翻譯資訊。

事前準備

實作本文內容前請確保您至少具備 ES6 以上的程式經驗。

three.js 或 A-Frame 等 JS 3D lib 知識非必須，需要的話文末附有參考資料。

基本樂理有點概念會比較好，不過同樣文末附有參考資料。

環境導入

TextAlive App API 部分

TextAlive App API 可以使用官方的 CDN，或是自行安裝。

CDN：

<script src="https://unpkg.com/axios/dist/axios.min.js"></script>
<script src="https://unpkg.com/textalive-app-api/dist/index.js"></script>
<script>
  const { Player } = TextAliveApp;
</script>

npm：

npm install textalive-app-api

import { Player } from "textalive-app-api";

yarn：

yarn add textalive-app-api

import { Player } from "textalive-app-api";

載入後需要先對 Player 物件進行初始化，TextAlive App API 的初始化主要分為兩個階段：

API 物件建立

API 物件建立完成後會觸發 onAppReady event，此時才可以指定影片網址。

const player = new Player({ app: true });
player.addListener({
   onAppReady: () => {
      //App object is ready
      //...
   }
});

影片與歌詞資料載入

API 物件載入後用 createFromSongUrl 載入影片，歌詞資料會由 TextAlive Server 處理。

player.createFromSongUrl("https://www.youtube.com/watch?v=XSLhsjepelI") //グリーンライツ・セレナーデ / Omoi feat. 初音ミク

接著用 addListener 監聽 onTimerReady 確保所有資料載入完畢，播放器準備開始播放。

player.addListener({
   onTimerReady: () => {
      //video ready
      //...
   }
});

事件觸發順序表，來源自 TextAlive App API 官方文件。

之所以使用 onTimerReady 的原因是在 TextAlive 官方範例上有使用多個 event（onAppReady、onVideoReady、onTimerReady、onTimeUpdate、onPlay、onPause、onMediaSeek），但是沒有關於觸發時間或是原因的完整敘述。

筆者實測的結果大略是：

onAppReady 會最先被觸發，觸發後才可以載入影片，或進行除了 addListener 外的所有操作。
onVideoReady 會在指定影片後第一個觸發。但是這個事件觸發不代表「影片」已經載入完成，而是影片的資訊沒有問題。此時播放的話有可能會進入 TextAlive App API 與 Youtube API 狀態脫鉤的 bug，TextAlive 的時間軸會正常運作，但 Youtube 影片會停留在載入中。
onTimerReady 應該是指 TextAlive App API 與 Youtube API 成功同步，此處的 timer 應該是指 TextAlive App API 內部 timer。可以確定的是載入影片後只會觸發一次，在觸發後隨時可以播放影片不會有問題。
onTimeUpdate 會不定時更新。我的猜想是，這是 TextAlive App API 內部排定的更新，所以應該會比實際影片時間稍微慢一點。另外這個 event 觸發的速度會明顯的比瀏覽器 render 慢很多，自己計 timer 會是增加 fps 的好方法。
onPlay 是影片開始播放的事件。
onPause 會在 requestPause、requestStop 或 影片結束後觸發。原因不可考。
onMediaSeek 會在 requestMediaSeek 更改影片時間後、以及影片結束後觸發。同上，不可考。

影片載入後會看見小小的 Youtube 框框在無限轉圈圈，那就是成功了。此時可以用 player.requestPlay() 開始播放。

three.js 部分

three.js 的教學資源相當豐富，這邊就不贅述。

import THREE from "three"; // 或是 CDN https://cdnjs.cloudflare.com/ajax/libs/three.js/121/three.min.js

// 建立 three.js 場景
const scene = new THREE.Scene();
const renderer = new THREE.WebGLRenderer({ antialias: true });
renderer.setSize(window.innerWidth, window.innerHeight);
renderer.setClearColor(0xffffff, 1);
document.getElementById("player-outer").appendChild(renderer.domElement);

const camera = new THREE.PerspectiveCamera(
  70,
  window.innerWidth / window.innerHeight,
  0.1,
  1000
);
// 初始相機位置
camera.position.set(-1.2, 0, 5);

// 永遠直視原點 (0, 0, 0)
camera.lookAt(scene.position);

// 些微環境光方便 debug
ambientLight = new THREE.AmbientLight(0x404040);
scene.add(ambientLight);
render();

function render() {
// 暫停時不進行 render，保留 canvas 既有畫面
if (playerProgress.isPlaying) {
  // 正式 render
  renderer.render(scene, camera);
  requestAnimationFrame(render);
}

歌詞顯示

歌詞可以使用兩種方式取得：

在 onVideoReady 觸發後使用 player.video.firstPhrase 或 firstWord/firstChar 取得。

// 逐句取得歌詞
// phrase.text: 單詞內容字串
// phrase.startTime: 單詞出現時間，毫秒
// phrase.animate: 單詞出現時的 callback function，可綁定自定義函數
// phrase.firstWord: 句子中的第一個單詞
// phrase.next: 下一句，可能為空
// phrase.previous: 上一句，可能為空
let phrase = player.video.firstPhrase;
while (phrase) {
  // 逐詞取得句中歌詞，可再往下取得 Char（字元）
  let wordsOfPhrase = [];
  let textMeshes = [];

  // word.text: 單詞內容字串
  // word.startTime: 單詞出現時間，毫秒
  // word.animate: 單詞出現時的 callback function，可綁定自定義函數
  // word.firstChar: 單詞的第一個字
  // word.next: 下一個單詞，可能為空
  // word.previous: 上一的單詞，可能為空
  // word.pos: 單詞 Part-of-Speech 標籤：
  //   N: 名詞 (Noun)
  //   PN: 代名詞 (ProNoun)
  //   V: 動詞 (Verb)
  //   R: 副詞 (adveRb)
  //   J: 形容詞 (adJective)
  //   A: 連体詞 (Adnominal adjective)
  //   P: 助詞 (Particle)
  //   M: 助動詞 (Modal)
  //   W: 疑問詞 (Wh)
  //   D: 冠詞 (Determiner)
  //   I: 接続詞 (conjunction)
  //   U: 感動詞 (Interjection)
  //   F: 接頭詞 (preFix)
  //   S: 記号 (Symbol)
  //   X: その他 (other)
  let word = phrase.firstWord;

  console.log(`${phrase.startTime}：「${wordsOfPhrase.join(" ")}」`);

  phrase = phrase.next;
}

取得的 phrase、word、char 可以用 .startTime 判斷時間，.text 取得文字內容，.next/.previous 跳往下一個或上一個，或是用 .animate = () => {} 的方式指定到這段歌詞時的 callback function。

而 word 比較特殊，word.pos 是單詞的 part-of-speech，可以理解為詞性。TextAlive 詞性判讀使用的是 MeCab CRF 模型，不保證完全準確。

除了單純印出歌詞內容外，這邊另外對歌詞做了轉換。原因是目前如果要在 Three.js 中增加文字物件需要特殊的字體檔案。而這種檔案要針對漢字使用的話不好生成，載入時也會造成而外負擔。

let phrase = player.video.firstPhrase;
let lyrics = [];
while (phrase) {
  let wordsOfPhrase = [];
  let textMeshes = [];

  let word = phrase.firstWord;
  // 將歌詞轉換為 three.js 的 Mesh，方便在 3D 場景中使用
  while (word && word.startTime < phrase.endTime) {
	wordsOfPhrase.push(`${word.text}（${word.pos}）`);

	// 利用 canvas 將歌詞貼到 Mesh 上
	const canvas = document.createElement('canvas');
	canvas.width = word.text.length * (512 + 32);
	canvas.height = 512 + 32;
	const ctx = canvas.getContext('2d');

	// 隨詞性更改字體
	if (word.pos === "N") {
	  ctx.font = "bold 512px serif";
	} else {
	  ctx.font = "420px sans";
	}
	ctx.fillStyle = "#393939";
	ctx.fillText(word.text, 0, 512);

	textMeshes.push({
	  obj: word,
	  mesh: new THREE.Mesh(
		new THREE.PlaneGeometry(word.text.length, 1),
		new THREE.MeshBasicMaterial({ map: new THREE.CanvasTexture(canvas), transparent: true })
	  ),
	});

	word = word.next;
  }

  lyrics.push(textMeshes);

  console.log(`${phrase.startTime}：「${wordsOfPhrase.join(" ")}」`);

  phrase = phrase.next;
}
lyrics.forEach((line, lineIdx) => {
  line.forEach((word, idx) => {
	// 固定歌詞 y,z 位置，x 由 render 計算
	word.mesh.position.y = lineIdx % 3 - 1;
	word.mesh.position.z = lineIdx % 3 - 1;
	word.mesh.visible = false;
	scene.add(word.mesh);
  })
})

這邊使用的 workaround 是利用瀏覽器本身的字體在虛擬 canvas 上繪製，再將 canvas 當作貼圖貼到 PlaneGeometry 上。缺點是會缺少 3D 該有的 bevel，但整體來看影響不大。（或相當於將 AE 的文字物件轉為 3D，文字本身不會有厚度）

另外一種取得歌詞的方法是利用前面提到的 onTimeUpdate event。onTimeUpdate event 會傳入目前影片時間給 event handler，可以進一步使用 player.video.findPhrase/findWord/findChar 檢查當下應該有的歌詞。

onTimeUpdate(now) {
 // 目前歌詞（句）
 playerProgress.phrase = player.video.findPhrase(now);

 // 目前歌詞（字詞）
 playerProgress.word = player.video.findWord(now);

 // 目前歌詞（字）
 playerProgress.char = player.video.findChar(now);
}

取得的 phrase、word、char 同樣可以取得 .text 與其他屬性。

有歌詞物件後就可以在場景中使用。這邊的方式是讓 camera 固定朝向原點 (0, 0, 0)，移動文字。相對來說會比較浪費效能，不過以一首歌數百字來說還算可以接受。而且後續還有互動功能，為了讓使用者能夠自由操作鏡頭，移動文字算是很合理的。

function render() {
if (playerProgress.isPlaying) {
  // 歌詞物件顯現並更新位置
  lyrics.forEach(line => {
	line.forEach((word, idx) => {
	  if (word.obj.startTime < progress && word.obj.endTime < (progress + 200000)) {
		word.mesh.visible = true;
		// 每句第一個 word mesh 依照時間與 speedFactor 計算位置
		// 其餘跟隨前一個 word mesh 的位置
		word.mesh.position.x = (idx === 0 ?
		  ((word.obj.startTime - (progress || 0)) * speedFactor + 10) :
		  (line[idx - 1] && (line[idx - 1].mesh.position.x + (line[idx - 1].obj.text.length / 2)) || Number.NEGATIVE_INFINITY)
		) + (word.obj.text.length / 2);
	  } else {
		// 超過 200 秒後隱藏
		word.mesh.visible = false;
	  }
	})
  });

  renderer.render(scene, camera);
  requestAnimationFrame(render);
}

音樂互動

TextAlive App API 提供四種比較會用到的音樂資訊：beat、chord progression、chorus、vocal amplitude，同樣可以利用 onTimeUpdate 提供的影片時間取得。

onTimeUpdate(now) {
    // 取得並更新節拍資訊
    // 
    // beat.index: 節拍在樂曲中的位置，從 0 開始
    // beat.length: 小節中的節拍數
    // beat.position: 節拍在小節中的位置，從 1 開始
    // beat.next: 指向下一個節拍
    // beat.previous: 指向上一個節拍
    // beat.duration: 持續時間，毫秒
    // beat.startTime: 開始時間，毫秒
    // beat.endTime: 結束時間，毫秒
    let beat = player.findBeat(now);
    if (playerProgress.beat !== beat) {
      playerProgress.beat = beat;
      console.log("update beat:", beat);
      // 繪製節拍環
      // 5 points ring
      // facing x-axis
      let geometry = new THREE.RingGeometry(4.9, 5, 5);
      let material = new THREE.MeshBasicMaterial({

        // 顏色根據目前和弦進行判斷使用彩色或灰色
        color: colorful ? colorSpinner.pick() : 0x393939,
        side: THREE.DoubleSide
      });

      // 每個小節的開頭使用實心環，其他使用 wireframe
      let circle = beat.position == 1 ? new THREE.Mesh(geometry, material) : new THREE.Line(geometry, material);
      circle.position.x = 15;
      circle.rotateY(90);
      scene.add(circle);
      otherMeshes.push(circle);
    }

    // 取得並更新和弦進行（Chord Progression）資訊
    // 
    // chord.duration: 持續時間，毫秒
    // chord.index: 在樂曲中的位置，從 0 開始
    // chord.next: 指向下一個
    // chord.previous: 指向上一個
    // chord.startTime: 開始時間，毫秒
    // chord.endTime: 結束時間，毫秒
    playerProgress.chord = player.findChord(now);

    // 取得並更新副歌資訊
    // 
    // chorus.duration: 持續時間，毫秒
    // chorus.index: 在樂曲中的位置，從 0 開始
    // chorus.next: 指向下一個副歌
    // chorus.previous: 指向上一個副歌
    // chorus.startTime: 開始時間，毫秒
    // chorus.endTime: 結束時間，毫秒
    let chorus = player.findChorus(now);
    if (playerProgress.chorus !== chorus) {
      playerProgress.chorus = chorus;
      console.log("update chorus:", chorus);
    }

    // 目前歌詞（句）
    playerProgress.phrase = player.video.findPhrase(now);

    // 目前歌詞（字詞）
    playerProgress.word = player.video.findWord(now);

    // 目前歌詞（字）
    playerProgress.char = player.video.findChar(now);

    // 目前主唱音量
    playerProgress.volume = player.getVocalAmplitude(now);
  }

這邊可以利用節拍在 3D 中生成五邊形的節拍環隨著文字移動，讓畫面比較豐富，而且可以感受出 3D 深度。

onTimeUpdate(now) {
    let beat = player.findBeat(now);
    if (playerProgress.beat !== beat) {
      playerProgress.beat = beat;
      console.log("update beat:", beat);
      // 繪製節拍環
      // 5 points ring
      // facing x-axis
      let geometry = new THREE.RingGeometry(4.9, 5, 5);
      let material = new THREE.MeshBasicMaterial({

        // 顏色根據目前副歌判斷使用彩色或灰色
        color: colorful ? colorSpinner.pick() : 0x393939,
        side: THREE.DoubleSide
      });

      // 每個小節的開頭使用實心環，其他使用 wireframe
      let circle = beat.position == 1 ? new THREE.Mesh(geometry, material) : new THREE.Line(geometry, material);
      circle.position.x = 15;
      circle.rotateY(90);
      scene.add(circle);
      otherMeshes.push(circle);
    }

    playerProgress.chord = player.findChord(now);

    let chorus = player.findChorus(now);
    if (playerProgress.chorus !== chorus) {
      playerProgress.chorus = chorus;
      console.log("update chorus:", chorus);

      // 切換五邊環的顯示顏色
      if (chorus) {
        colorful = true;
      } else {
        colorful = false;
      }
    }

    playerProgress.phrase = player.video.findPhrase(now);
    playerProgress.word = player.video.findWord(now);
    playerProgress.char = player.video.findChar(now);
    playerProgress.volume = player.getVocalAmplitude(now);
  }

function render() {
if (playerProgress.isPlaying) {
  lyrics.forEach(line => {
	line.forEach((word, idx) => {
	  if (word.obj.startTime < progress && word.obj.endTime < (progress + 200000)) {
		word.mesh.visible = true;
		word.mesh.position.x = (idx === 0 ?
		  ((word.obj.startTime - (progress || 0)) * speedFactor + 10) :
		  (line[idx - 1] && (line[idx - 1].mesh.position.x + (line[idx - 1].obj.text.length / 2)) || Number.NEGATIVE_INFINITY)
		) + (word.obj.text.length / 2);
	  } else {
		word.mesh.visible = false;
	  }
	})
  });
  // 此處想法是節拍環與影片時間無關，只跟音樂有關
  // 所以另外使用 three.js 內建 clock 取得簡單時間更新
  // 播放時節拍環與歌詞基本上相同
  // 但時間軸平移時會依照原本速率前進
  // 而且因為沒有聲音也就不應該有新的節拍環出現
  const diff = clock.getDelta();
  otherMeshes.forEach((mesh) => {
	if (mesh.position.x > -100) {
	  mesh.position.set(mesh.position.x - (diff * 1000 * speedFactor), 0, 0);
	  mesh.rotateZ(diff * -0.1 * Math.PI);
	} else {
	  mesh.visible = false;
	}
  });
  otherMeshes = otherMeshes.filter(o => o.visible);

  renderer.render(scene, camera);
  requestAnimationFrame(render);
}

使用者互動

使用者互動算是這次競賽比較特別的一點。單純的動畫效果其實跟 PV 差不多，不過加入互動性可以讓使用情境再向外拓展，也是以程式作為媒介的強項。

這裡只簡單增加兩種互動效果：鏡頭擺動跟時間軸控制。比較受歡迎的 VR/AR 或是其他效果也是類似概念。

鏡頭擺動

renderer.domElement.addEventListener("mousemove", (event) => {
  const rect = renderer.domElement.getBoundingClientRect();

  // 計算滑鼠位置百分比並縮放到 [-1.0, 1.0] 之間
  const aspect = - 1 + 2 * ((event.clientY - rect.top) / rect.height);

  // three.js camera 跟隨上下 15 度移動
  // 15deg ~= 0.2618rad
  // 固定看向 0, 0, 0
  camera.position.y = Math.sin(0.2618 * aspect) * 5;
  camera.position.z = Math.cos(0.2618 * aspect) * 5;
  camera.lookAt(scene.position);
});

這樣可以讓鏡頭沿著圓心為原點的圓，在 y 軸與 z 軸間上下移動 ±15 度。

時間軸控制

原始的想法是用 x 軸拖放控制時間軸，但Web event 的缺點之一是難以分辨 click 與 mousedown/up，而且 drag event 也不完善。所以這邊先使用自製的 click event 模擬器，使用現有的 lib 也是 ok 的。

實際做法是控制（移動）的當下先暫停影片播放，在 mouseup 的同時利用 player.requestMediaSeek 把時間寫回給播放器，並啟動播放。

let isPanning = false;
let lastMouseDown = null;
let lastMouseDownAt = null;
let panFrom = null;
dom.addEventListener("mousedown", (event) => {
  lastMouseDown = Date.now();
  const rect = renderer.domElement.getBoundingClientRect();
  lastMouseDownAt = { x: event.clientX, y: null };
  event.preventDefault();
});
renderer.domElement.addEventListener("mouseup", (event) => {
  if (Date.now() - lastMouseDown < 200) {
	// 短於 200ms 的 mousedown-up 視作 click
	if (playerProgress.isPlaying) {
	  player.requestPause();
	} else {
	  player.requestPlay();
	}
  } else if (isPanning) {
	// requestMediaSeek 會有 0.2-0.5 秒左右的延遲
	// 提前更新 groundZero 可以減少畫面跳動的狀況
	groundZero = Date.now() - playerProgress.position;
	player.requestMediaSeek(playerProgress.position);
	player.requestPlay();
  }
  lastMouseDown = null;
  panFrom = null;
  isPanning = false;
  event.preventDefault();
});
renderer.domElement.addEventListener("mousemove", (event) => {
  const rect = renderer.domElement.getBoundingClientRect();

  // 計算滑鼠位置百分比並縮放到 [-1.0, 1.0] 之間
  const aspect = - 1 + 2 * ((event.clientY - rect.top) / rect.height);

  // three.js camera 跟隨上下 15 度移動
  // 15deg ~= 0.2618rad
  // 固定看向 0, 0, 0
  camera.position.y = Math.sin(0.2618 * aspect) * 5;
  camera.position.z = Math.cos(0.2618 * aspect) * 5;
  camera.lookAt(scene.position);
  if (lastMouseDown && (isPanning || playerProgress.isPlaying && (Date.now() - lastMouseDown > 200))) {
	// 播放中拖動時間軸，如果不是播放中可以忽略
	if (!panFrom) {
	  panFrom = playerProgress.position;
	}
	player.requestPause();
	isPanning = true;
	playerProgress.position = panFrom - (30000 * (event.clientX - lastMouseDownAt.x) / rect.width);
  }
});

同時更新 render function，移動時間軸的時候才不會看到畫面卡住。

render() {
if (playerProgress.isPlaying || isPanning) {

  // 時間軸平移時 playerProgress.position 由 mouse event handler 控制，可以自由使用
  // 播放時 playerProgress 會不定期更新，直接使用會掉 frame
  const progress = isPanning ? playerProgress.position : Date.now() - (groundZero || 0);

  // ...

  renderer.render(scene, camera);
}

requestAnimationFrame(render);
}

Source Code

完整程式碼可以在 GitHub 取得：

MagicalMirai2020ProgrammingContest-Tutorial - GitHub

Demo

參考資料

用 Three.js 來當個創世神 - CK Chuang / iT 邦幫忙鐵人賽
為什麼流行歌聽起來都這麼像？ - 好和弦
開発の始め方 / TextAlive App API（日）
App のライフサイクル / TextAlive App API（日）
TextAlive App Examples（日）
マジカルミライ 2020 プログラミング・コンテスト（日）
TextAlive App API Reference（日、英）
MeCab: Yet Another Part-of-Speech and Morphological Analyzer（日）
【初音ミク「マジカルミライ2020」プログラミング・コンテスト】リリックアプリ開発ことはじめ #mm2020procon - OngaACCEL/OngaCREST（日）
textalive-app-api/community on Gitter chat

編修紀錄

8th, Oct 2020: 增加事件觸發順序表；修正 Demo 頁面無法在 Safari 上執行問題

/

利用 TextAlive App API 與 three.js 製作互動式 PV - Magical Mirai 2020 Programming Contest 入門教學

TextAlive App API 與 Magical Mirai 2020 Programming Contest

事前準備

環境導入

TextAlive App API 部分

three.js 部分

歌詞顯示

音樂互動

使用者互動

Source Code

Demo

參考資料

編修紀錄

標籤

延伸閱讀

【譯】學寫程式，就像在下一盤很大的棋

【譯】最佳化：讓 Rust 「RRRRR」

VS Code on-the-go: code-server

這是一個垃圾集中 Blog

catLee

[email protected]