INDEX
    Explanations

    references to the pronoun "them."

    New Auto-Interp
    Negative Logits
     RTX
    -0.80
    mire
    -0.75
    Limit
    -0.65
    Charg
    -0.64
     Farn
    -0.63
    Press
    -0.63
    083
    -0.61
    Pause
    -0.60
    politics
    -0.60
     Fulton
    -0.60
    POSITIVE LOGITS
    atic
    1.10
    atically
    1.01
    selves
    0.91
     perished
    0.84
     selves
    0.84
     were
    0.82
    alian
    0.81
     individually
    0.80
     are
    0.80
     originated
    0.79
    Act Density 0.023%

    No Known Activations