INDEX
Explanations
the presence of forms of the verb "to be" in various contexts
New Auto-Interp
Negative Logits
oyer
-0.20
ä¾
-0.14
ovice
-0.14
Unnamed
-0.14
zens
-0.13
armor
-0.13
Animating
-0.13
apter
-0.13
istine
-0.13
vore
-0.13
POSITIVE LOGITS
among
0.18
professor
0.16
native
0.16
nothing
0.15
een
0.15
our
0.15
natives
0.15
from
0.15
originally
0.15
atz
0.15
Activations Density 0.063%