INDEX
    Explanations

    references to female characters and their roles in various contexts

    New Auto-Interp
    Negative Logits
     himself
    -0.29
    妻
    -0.25
     stesso
    -0.21
    /she
    -0.19
     Himself
    -0.19
    sing
    -0.18
    his
    -0.18
     ÙĨÙ쨳Ùĩ
    -0.18
     Jr
    -0.18
    ä¿Ĭ
    -0.18
    POSITIVE LOGITS
     herself
    0.54
     могла
    0.25
     Ñģама
    0.24
    athed
    0.22
     должна
    0.22
    ä¸Ī夫
    0.21
     ÑģÑĤала
    0.20
    ová
    0.20
    /he
    0.19
     Ñģказала
    0.19
    Act Density 3.716%

    No Known Activations