INDEX
Explanations
all forms of the word "to" and its variations
New Auto-Interp
Negative Logits
öh
-0.17
ogram
-0.16
jom
-0.15
compat
-0.15
xba
-0.15
andon
-0.14
uzu
-0.14
i
-0.14
Compat
-0.14
izo
-0.14
POSITIVE LOGITS
admit
0.32
admitting
0.25
confess
0.24
admission
0.23
admits
0.23
confession
0.22
admitted
0.22
honest
0.20
admissions
0.20
confessed
0.19
Activations Density 0.028%