INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    erest
    -0.73
    itative
    -0.72
     subp
    -0.69
    PsyNetMessage
    -0.68
    arios
    -0.66
    VD
    -0.65
    itement
    -0.64
     Democr
    -0.63
    76561
    -0.63
    REDACTED
    -0.62
    POSITIVE LOGITS
    horn
    1.10
     shoes
    1.10
     Shoes
    0.99
    bridge
    0.97
    pee
    0.94
    prints
    0.89
    toe
    0.87
    mens
    0.86
     socks
    0.85
    ets
    0.84
    Act Density 0.013%

    No Known Activations