INDEX
    Explanations

    the pronoun "he" within sentences

    New Auto-Interp
    Negative Logits
    hips
    -0.96
     GEAR
    -0.61
    iatrics
    -0.59
     intercept
    -0.56
     Labrador
    -0.56
     Psychiat
    -0.56
    iott
    -0.55
    INGTON
    -0.55
     OPS
    -0.54
     helicopters
    -0.54
    POSITIVE LOGITS
    rency
    1.08
    ller
    1.03
    nder
    1.00
    lling
    0.99
    atre
    0.97
    gan
    0.93
    lda
    0.92
    aven
    0.92
    ttes
    0.91
    ppo
    0.90
    Act Density 0.033%

    No Known Activations