INDEX
    Explanations

    references to individuals and their personal achievements or experiences

    New Auto-Interp
    Negative Logits
     himself
    -0.30
    妻
    -0.25
     stesso
    -0.22
    /she
    -0.21
     Himself
    -0.21
     ÙĨÙ쨳Ùĩ
    -0.20
     Jr
    -0.19
     koji
    -0.17
    his
    -0.17
    ãģıãĤĵ
    -0.17
    POSITIVE LOGITS
     herself
    0.48
     Ñģама
    0.24
    /he
    0.23
     могла
    0.23
    athed
    0.22
    ä¸Ī夫
    0.21
     должна
    0.21
    ová
    0.21
     ÑģÑĤала
    0.20
     misma
    0.19
    Act Density 2.455%

    No Known Activations