INDEX
Explanations
prepositions linking phrases
prepositions
New Auto-Interp
Negative Logits
in
0.49
ే
0.47
K
0.43
J
0.43
Q
0.41
ა
0.41
G
0.41
Z
0.41
D
0.40
B
0.39
POSITIVE LOGITS
\
0.49
to
0.48
(
0.42
was
0.41
is
0.38
(
0.38
-
0.37
いて
0.36
it
0.35
ного
0.35
Activations Density 7.062%