INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    DERR
    -0.71
     Button
    -0.68
    aged
    -0.64
    AGE
    -0.60
     squats
    -0.59
     Columb
    -0.59
    arsity
    -0.59
     JFK
    -0.58
     [];
    -0.58
     Presidency
    -0.57
    POSITIVE LOGITS
    stal
    1.25
    stals
    1.17
    pter
    1.17
    nda
    1.07
    sta
    1.00
    pheus
    0.97
    tel
    0.95
    sten
    0.94
    tics
    0.94
    gon
    0.93
    Act Density 0.038%

    No Known Activations