INDEX
Explanations
words that indicate racial or social injustice themes
New Auto-Interp
Negative Logits
consape
-0.52
IMHO
-0.50
可以说
-0.46
可以说是
-0.46
dürfte
-0.46
ครับ
-0.45
IMO
-0.45
かなり
-0.43
imo
-0.42
%@",
-0.42
POSITIVE LOGITS
somehow
1.32
Somehow
0.96
magically
0.96
supposedly
0.96
яко
0.93
angeb
0.88
Somehow
0.87
supuestamente
0.81
irgendwie
0.74
miraculously
0.73
Activations Density 1.087%