INDEX
Explanations
specific phrases related to news headlines or video titles
instances of the phrase "JUST WATCHED."
New Auto-Interp
Negative Logits
sis
-0.74
sole
-0.69
ocr
-0.69
Magikarp
-0.67
ogly
-0.67
pill
-0.66
nib
-0.65
nuts
-0.65
nut
-0.65
detached
-0.64
POSITIVE LOGITS
WATCHED
1.37
Transcript
0.81
VIDEOS
0.81
WATCH
0.73
Countdown
0.73
Videos
0.73
Task
0.70
Scenes
0.69
IMAGES
0.69
reacts
0.68
Activations Density 0.008%