INDEX
Explanations
instances of the indefinite article "a"
New Auto-Interp
Negative Logits
embre
-0.15
tolik
-0.15
oki
-0.15
moth
-0.15
itten
-0.14
tok
-0.14
252
-0.14
okoj
-0.14
è¶
-0.14
fov
-0.14
POSITIVE LOGITS
reason
0.25
reasons
0.24
sake
0.24
purposes
0.23
agers
0.18
cela
0.18
instance
0.18
sure
0.17
ay
0.16
reason
0.16
Activations Density 0.051%