INDEX
Explanations
negations and expressions of disagreement or refutation
New Auto-Interp
Negative Logits
863
-0.15
Alive
-0.14
esktop
-0.14
pto
-0.14
à¹ĥห
-0.14
either
-0.14
either
-0.14
нож
-0.13
pite
-0.13
Alive
-0.13
POSITIVE LOGITS
isiyle
0.15
ãĤ¸ãĤª
0.14
Äħż
0.14
ETS
0.14
itution
0.14
_LITERAL
0.14
ensch
0.13
just
0.13
vice
0.13
κÎŃ
0.13
Activations Density 0.033%