Topic Visualization

Interactive LDA Topic Modeling of Your YouTube Watch History

The system, generated using Google Takeout's Watch History Data, specifically the titles,
is modeled using Latent Dirichlet Allocation (LDA), a generative probabilistic model of a corpus.

Intertopic Distance Map (L)
The circle size indicates the extent of word inclusion within the topic cluster. The circle distance represents similarity between topics.
If two circles overlap, it signifies similarity between the corresponding topics.

Top-30 Most Relevant Terms for Topics (R)
Each bar refers to the list of leading keywords shaping the topics.
For keywords extraction, salience and discriminative power serve as criteria and can be adjusted through a lambda parameter (λ).

  1. λ → 1: prioritize selecting the most frequently occurring words as keywords for each topic
  2. λ → 0: emphasize choosing words with significant differences between topics (a.k.a. frequent appearance within a specific topic)

Watch Period: 28 Jan, 2008 - 27 Oct, 2023

Project Description

This is a data-driven life-logging visualization project done by master's students of ViBA Lab.
Feel free to ask questions.