INDEX
Explanations
instances of self-reference and personal achievement
New Auto-Interp
Negative Logits
yourselves
-0.23
Ñģами
-0.22
themselves
-0.21
itself
-0.18
ÑĪла
-0.18
HIS
-0.17
collectively
-0.17
могла
-0.17
collective
-0.17
His
-0.16
POSITIVE LOGITS
himself
0.52
Himself
0.32
должен
0.22
aped
0.20
deren
0.19
persona
0.19
бÑĭл
0.19
sám
0.19
personally
0.19
ÙĨÙ쨳Ùĩ
0.18
Activations Density 0.075%