INDEX
Explanations
references to individuals and their personal achievements or experiences
New Auto-Interp
Negative Logits
himself
-0.30
妻
-0.25
stesso
-0.22
/she
-0.21
Himself
-0.21
ÙĨÙ쨳Ùĩ
-0.20
Jr
-0.19
koji
-0.17
his
-0.17
ãģıãĤĵ
-0.17
POSITIVE LOGITS
herself
0.48
Ñģама
0.24
/he
0.23
могла
0.23
athed
0.22
ä¸Ī夫
0.21
должна
0.21
ová
0.21
ÑģÑĤала
0.20
misma
0.19
Activations Density 2.455%