INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     WV
    -0.08
    -0.07
    _unix
    -0.07
     Up
    -0.07
    -K
    -0.07
     подк
    -0.07
    WK
    -0.07
    .Tween
    -0.06
     up
    -0.06
     open
    -0.06
    POSITIVE LOGITS
     Saturday
    0.08
    utherford
    0.07
    agner
    0.07
    attached
    0.07
    Saturday
    0.07
    0.07
     homer
    0.07
    atches
    0.06
     partir
    0.06
    foy
    0.06
    Act Density 0.009%

    No Known Activations