INDEX
Explanations
negations of the verb "to be."
New Auto-Interp
Negative Logits
hin
-0.14
not
-0.14
hone
-0.14
es
-0.14
inski
-0.14
no
-0.14
csr
-0.13
din
-0.13
æĹ¦
-0.13
nicht
-0.13
POSITIVE LOGITS
necessarily
0.21
ori
0.18
yet
0.17
anymore
0.17
zsche
0.17
apos
0.17
ches
0.17
ibble
0.16
ango
0.15
quite
0.15
Activations Density 0.178%