INDEX
Explanations
phrases that convey possession or existence
New Auto-Interp
Negative Logits
itself
-0.53
its
-0.40
Its
-0.35
Its
-0.32
å®ĥ
-0.28
its
-0.25
оно
-0.25
Ñıке
-0.23
was
-0.20
Ø®ÙĪØ¯Ø´
-0.20
POSITIVE LOGITS
themselves
0.71
their
0.49
Their
0.45
Their
0.45
their
0.44
thems
0.35
leurs
0.35
иÑħ
0.34
jejich
0.33
leur
0.32
Activations Density 0.275%