INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    removed
    -0.09
    isat
    -0.08
    -first
    -0.08
     Сол
    -0.08
    absence
    -0.08
    borrow
    -0.08
    ttl
    -0.08
    sonian
    -0.08
     Juliet
    -0.08
     نبود
    -0.08
    POSITIVE LOGITS
     pund
    0.08
     scattered
    0.08
     brisk
    0.07
     scroll
    0.07
     windy
    0.07
     przem
    0.07
     scattering
    0.07
    .concat
    0.07
     stre
    0.07
     nau
    0.07
    Act Density 0.001%

    No Known Activations