INDEX
Explanations
references to historical events or figures
New Auto-Interp
Negative Logits
Wanna
-0.16
à¸Ļà¸Ļ
-0.15
Zuk
-0.15
bard
-0.15
telev
-0.15
ırak
-0.14
@js
-0.14
Sadd
-0.14
opsis
-0.14
ause
-0.14
POSITIVE LOGITS
188
0.19
liberalism
0.17
191
0.16
187
0.16
censor
0.15
provisional
0.15
pedestal
0.15
Mason
0.15
liberals
0.15
189
0.15
Activations Density 0.062%