INDEX
    Explanations

    pronouns referring to people and their actions

    New Auto-Interp
    Negative Logits
     aig
    -0.61
     inva
    -0.60
    ssz
    -0.58
     mín
    -0.56
     accesso
    -0.56
     incu
    -0.56
     olas
    -0.55
     pomo
    -0.55
     pary
    -0.55
    を取る
    -0.54
    POSITIVE LOGITS
     he
    1.50
     He
    1.36
     she
    1.33
    He
    1.31
    she
    1.25
     himself
    1.24
    She
    1.23
    himself
    1.20
    THEY
    1.18
     She
    1.17
    Act Density 0.181%

    No Known Activations