INDEX
Explanations
phrases indicative of social or political critique
New Auto-Interp
Negative Logits
erland
-0.15
asher
-0.15
\Carbon
-0.14
plib
-0.14
ucz
-0.14
lopedia
-0.14
ummer
-0.14
ENSE
-0.14
asmus
-0.14
ownik
-0.14
POSITIVE LOGITS
affle
0.14
éłĨ
0.14
pler
0.14
btnCancel
0.14
ĶåĽŀ
0.14
sk
0.14
Vis
0.14
olle
0.13
467
0.13
æºĸ
0.13
Activations Density 0.355%