INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    lict
    -0.08
     suspicious
    -0.07
    passport
    -0.07
     shiny
    -0.07
     stared
    -0.07
    Susp
    -0.07
     Shed
    -0.07
     foolish
    -0.07
     outfit
    -0.07
     sher
    -0.07
    POSITIVE LOGITS
     ww
    0.08
    atás
    0.07
     Ee
    0.07
     alike
    0.07
     programm
    0.07
     pinch
    0.07
    μβ
    0.07
     Albuquerque
    0.07
    ibilities
    0.07
    alsa
    0.07
    Act Density 0.023%

    No Known Activations