INDEX
    Explanations

    occurrences of the pronoun "they."

    New Auto-Interp
    Negative Logits
     itself
    -0.28
    was
    -0.21
     isnt
    -0.16
    less
    -0.16
    ial
    -0.16
    st
    -0.15
    (es
    -0.15
    ly
    -0.15
    ther
    -0.15
     isn
    -0.15
    POSITIVE LOGITS
    ’re
    0.45
    're
    0.40
     are
    0.40
     themselves
    0.38
     were
    0.34
    've
    0.33
    ’ve
    0.32
     aren
    0.28
    'll
    0.28
    ’ll
    0.27
    Act Density 0.214%

    No Known Activations