INDEX
Explanations
negative contractions, particularly "not" and its variations
New Auto-Interp
Negative Logits
adem
-0.17
گر
-0.15
мага
-0.14
hop
-0.14
unga
-0.14
uit
-0.14
\Mapping
-0.13
shouldBe
-0.13
alth
-0.13
äl
-0.13
POSITIVE LOGITS
know
0.24
care
0.23
necessarily
0.22
have
0.19
Know
0.19
know
0.19
even
0.19
mind
0.18
Know
0.18
quite
0.17
Activations Density 0.195%