INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Bates
    -0.07
    (GET
    -0.07
     lowered
    -0.07
     lowers
    -0.06
     ngOn
    -0.06
    .Change
    -0.06
     sito
    -0.06
     γρα
    -0.06
     Cohen
    -0.06
     magnet
    -0.06
    POSITIVE LOGITS
     Wild
    0.11
    wild
    0.08
    Wild
    0.08
    -web
    0.08
     wild
    0.08
     Wildcats
    0.06
     Sk
    0.06
     Indian
    0.06
    艺术
    0.06
    ill
    0.06
    Act Density 0.009%

    No Known Activations