INDEX
    Explanations

    reflexive pronouns

    New Auto-Interp
    Negative Logits
     itself
    -1.93
    itself
    -1.91
     Itself
    -1.80
    themselves
    -1.70
     herself
    -1.66
     themselves
    -1.66
     himself
    -1.61
    himself
    -1.54
     itſelf
    -1.44
    herself
    -1.42
    POSITIVE LOGITS
    ↵↵
    0.54
    ,
    0.49
    <eos>
    0.47
    '
    0.47
    0.46
     suffices
    0.44
    TestingModule
    0.44
     (
    0.44
     prescribes
    0.43
    gdx
    0.43
    Act Density 0.097%

    No Known Activations