INDEX
Explanations
words that express a sense of negation or non-conformity
New Auto-Interp
Negative Logits
isko
-0.14
rai
-0.13
rich
-0.13
927
-0.13
Orr
-0.13
Clem
-0.13
...
-0.13
jie
-0.13
ueva
-0.13
Acres
-0.13
POSITIVE LOGITS
atur
0.19
erken
0.17
oth
0.17
(er
0.17
anou
0.17
alars
0.15
à¹Ĩ
0.15
theless
0.15
facto
0.15
åĵ
0.15
Activations Density 0.109%