INDEX
Explanations
possessive pronouns and references to individuals' relationships or connections to others
New Auto-Interp
Negative Logits
arkan
-0.08
modes
-0.07
chia
-0.06
ansa
-0.06
weather
-0.06
andro
-0.06
zon
-0.06
BN
-0.06
973
-0.06
actory
-0.06
POSITIVE LOGITS
own
0.08
target
0.07
itself
0.07
_own
0.06
own
0.06
surroundings
0.06
favourite
0.06
opponents
0.06
Maker
0.06
uggy
0.06
Activations Density 0.100%