INDEX
Explanations
video-related cues or instructions
references to watching videos or content
New Auto-Interp
Negative Logits
ctrl
-0.91
phi
-0.76
jury
-0.71
sembly
-0.67
interstitial
-0.66
ãĥ´
-0.65
VEN
-0.63
rhy
-0.62
bably
-0.61
lettuce
-0.60
POSITIVE LOGITS
tower
1.28
Watching
1.10
dog
1.05
dogs
0.97
ing
0.85
imon
0.84
watch
0.83
points
0.81
watching
0.80
Watch
0.79
Activations Density 0.020%