INDEX
Explanations
possessive forms or contractions indicating ownership or association
New Auto-Interp
Negative Logits
the
-0.26
a
-0.21
an
-0.19
these
-0.18
those
-0.18
some
-0.18
something
-0.17
人åĵ¡
-0.17
this
-0.17
what
-0.17
POSITIVE LOGITS
own
0.52
newest
0.41
latest
0.38
biggest
0.34
own
0.33
largest
0.32
entire
0.30
'
0.29
Own
0.29
youngest
0.29
Activations Density 0.350%