INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    oler
    -0.09
    onte
    -0.08
    veral
    -0.08
    riterion
    -0.08
     pleased
    -0.08
     مقد
    -0.08
     flare
    -0.07
    onar
    -0.07
    rite
    -0.07
     grazie
    -0.07
    POSITIVE LOGITS
     postcards
    0.08
    Normalized
    0.08
    0.07
    0.07
    自然
    0.07
    Normalization
    0.07
     brukt
    0.07
    0.07
    urals
    0.07
     naturellement
    0.07
    Act Density 0.000%

    No Known Activations