INDEX
Explanations
prepositions followed by determiners
New Auto-Interp
Negative Logits
7
0.36
۶
0.33
4
0.32
3
0.32
6
0.32
9
0.31
with
0.31
5
0.31
8
0.31
2
0.29
POSITIVE LOGITS
the
0.74
this
0.53
our
0.53
the
0.49
The
0.48
these
0.46
your
0.45
The
0.43
their
0.43
該
0.41
Activations Density 3.725%