INDEX
Explanations
YouTube video IDs (v=XXXXXXXXXX)
the presence of URLs and references to video content
New Auto-Interp
Negative Logits
transf
-0.72
derog
-0.70
Palest
-0.70
cler
-0.69
Levant
-0.69
Minor
-0.68
reper
-0.68
Scroll
-0.68
loophole
-0.65
pretext
-0.61
POSITIVE LOGITS
_-
1.26
DAQ
1.03
OY
1.01
ZX
0.96
Ec
0.96
_
0.96
AY
0.95
xus
0.95
uv
0.95
OX
0.94
Activations Density 0.018%