INDEX
Explanations
possessive pronouns and references to individual identity or relationships
New Auto-Interp
Negative Logits
selves
-0.18
ãĥĥãĥĪ
-0.17
efon
-0.16
ufe
-0.15
emachine
-0.15
æĩ
-0.14
öm
-0.13
mouths
-0.13
áºŃp
-0.13
theses
-0.13
POSITIVE LOGITS
name
0.18
status
0.17
fate
0.17
rud
0.16
ucci
0.16
supporters
0.16
Supporters
0.15
ding
0.15
tatus
0.14
whereabouts
0.14
Activations Density 0.237%