INDEX
    Explanations

    references to female characters and their emotions or actions

    New Auto-Interp
    Negative Logits
    妻
    -0.27
     wife
    -0.26
     Wife
    -0.22
    /she
    -0.21
    wife
    -0.21
     himself
    -0.20
     ÙĨÙ쨳Ùĩ
    -0.19
    sing
    -0.17
     seul
    -0.16
    ship
    -0.16
    POSITIVE LOGITS
    /he
    0.39
     herself
    0.39
    athed
    0.36
    pher
    0.35
    esh
    0.33
    pherd
    0.30
    ikh
    0.30
    ffield
    0.27
    athing
    0.27
    pard
    0.26
    Act Density 0.147%

    No Known Activations