INDEX
    Explanations

    references to collective pronouns related to groups or individuals

    New Auto-Interp
    Negative Logits
     itself
    -0.22
    was
    -0.16
    ocate
    -0.15
    nad
    -0.14
    nut
    -0.14
    st
    -0.14
    atti
    -0.13
     isnt
    -0.13
    nbsp
    -0.13
    page
    -0.13
    POSITIVE LOGITS
     themselves
    0.37
    ’re
    0.36
     are
    0.34
    're
    0.34
    've
    0.29
    ’ve
    0.26
     were
    0.25
    'll
    0.25
    ’ll
    0.23
    /she
    0.23
    Act Density 0.213%

    No Known Activations