INDEX
    Explanations

    references to relationships and personal attributes of individuals

    New Auto-Interp
    Negative Logits
     herself
    -0.33
     Frau
    -0.28
     Woman
    -0.28
    woman
    -0.28
     woman
    -0.27
     female
    -0.27
     actresses
    -0.26
     actress
    -0.26
    atrice
    -0.26
     Actress
    -0.26
    POSITIVE LOGITS
     guy
    0.33
    çĶ·åŃIJ
    0.31
     men
    0.31
     boys
    0.30
     guys
    0.30
     male
    0.30
     gentlemen
    0.30
     boy
    0.28
     handsome
    0.28
     males
    0.28
    Act Density 0.786%

    No Known Activations