INDEX
Explanations
references to female characters and their roles in various contexts
New Auto-Interp
Negative Logits
himself
-0.29
妻
-0.25
stesso
-0.21
/she
-0.19
Himself
-0.19
sing
-0.18
his
-0.18
ÙĨÙ쨳Ùĩ
-0.18
Jr
-0.18
ä¿Ĭ
-0.18
POSITIVE LOGITS
herself
0.54
могла
0.25
Ñģама
0.24
athed
0.22
должна
0.22
ä¸Ī夫
0.21
ÑģÑĤала
0.20
ová
0.20
/he
0.19
Ñģказала
0.19
Activations Density 3.716%