INDEX
Explanations
references to censorship and banned media
New Auto-Interp
Negative Logits
exact
-0.17
exact
-0.15
orr
-0.14
éŁ¿
-0.14
urse
-0.14
باÙĨ
-0.14
uales
-0.14
curse
-0.14
chests
-0.14
yd
-0.13
POSITIVE LOGITS
ë§ŀ
0.16
abic
0.16
onus
0.15
avery
0.15
ĨĴ
0.15
ома
0.15
beeld
0.15
-LAST
0.15
iosper
0.15
extField
0.14
Activations Density 0.245%