INDEX
    Explanations

    references to groups of people and personal pronouns

    New Auto-Interp
    Negative Logits
    -0.47
     with
    -0.44
     and
    -0.40
    ,
    -0.39
     to
    -0.39
     in
    -0.38
     of
    -0.38
    -0.37
     a
    -0.36
    -
    -0.35
    POSITIVE LOGITS
    [@BOS@]
    1.41
    <unused14>
    1.40
    <unused41>
    1.40
    <unused79>
    1.40
    <unused28>
    1.40
    <unused8>
    1.40
    <unused43>
    1.40
    <unused52>
    1.40
    <unused3>
    1.39
    <unused16>
    1.39
    Act Density 0.159%

    No Known Activations