INDEX
    Explanations

    references to female characters and their roles in narratives or relationships

    New Auto-Interp
    Negative Logits
     himself
    -0.68
    his
    -0.52
     his
    -0.52
    ä»ĸçļĦ
    -0.41
     zijn
    -0.41
     jeho
    -0.36
     seinem
    -0.35
     jego
    -0.35
     его
    -0.35
    妻
    -0.35
    POSITIVE LOGITS
     herself
    0.94
     her
    0.62
     haar
    0.50
     jejÃŃ
    0.44
    她çļĦ
    0.44
     hers
    0.42
     she
    0.41
     ей
    0.41
     ее
    0.40
     еÑij
    0.40
    Act Density 0.669%

    No Known Activations