INDEX
Explanations
references to contentious political issues and debates
New Auto-Interp
Negative Logits
gra
-0.15
/popper
-0.14
vertically
-0.14
еÑĨ
-0.14
pud
-0.14
Smile
-0.14
Vall
-0.14
äch
-0.14
OUCH
-0.14
oring
-0.13
POSITIVE LOGITS
gif
0.18
gif
0.18
aya
0.17
DAT
0.16
gifs
0.16
ternet
0.15
DERP
0.14
ube
0.14
efon
0.14
shim
0.14
Activations Density 1.038%