INDEX
    Explanations

    mentions of groups of people using pronouns like 'they' and 'we'

    auxiliary verbs and pronouns

    New Auto-Interp
    Negative Logits
     their
    -1.17
    their
    -1.17
    Their
    -1.16
     Their
    -1.06
     thier
    -0.91
     kanilang
    -0.89
    他们的
    -0.89
     THEIR
    -0.88
    他們的
    -0.86
     leur
    -0.84
    POSITIVE LOGITS
     are
    1.16
     have
    0.85
     aren
    0.85
     were
    0.79
     don
    0.76
     want
    0.74
     know
    0.73
     introduce
    0.73
     join
    0.71
     come
    0.70
    Act Density 1.892%

    No Known Activations