INDEX
    Explanations

    pronouns and their relationships to actions and states

    New Auto-Interp
    Negative Logits
    stanov
    -0.15
    ih
    -0.15
    orer
    -0.14
    mine
    -0.14
    aro
    -0.14
     Bras
    -0.14
     Trent
    -0.14
    -pages
    -0.14
    Long
    -0.14
    pages
    -0.14
    POSITIVE LOGITS
     Ink
    0.15
    quin
    0.14
     unresolved
    0.14
    /../
    0.14
    ìĹĦ
    0.14
    WI
    0.14
     moms
    0.13
     pol
    0.13
     unlike
    0.13
    Ñģли
    0.13
    Act Density 0.177%

    No Known Activations