INDEX
Explanations
reflexive pronouns and phrases indicating self-reference
New Auto-Interp
Negative Logits
hunne
-0.59
rijke
-0.58
itinéraires
-0.56
Bani
-0.56
dars
-0.55
tiros
-0.55
Racine
-0.53
Pik
-0.53
ENOS
-0.52
ary
-0.52
POSITIVE LOGITS
itself
1.35
itself
1.32
Itself
1.24
Roskov
1.03
himself
1.00
sendiri
0.96
Himself
0.95
himself
0.90
themselves
0.89
herself
0.86
Activations Density 0.105%