INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    nds
    -0.09
     flattering
    -0.08
    -0.08
     seinen
    -0.08
     silky
    -0.08
     fruition
    -0.07
     rosas
    -0.07
    viet
    -0.07
     frase
    -0.07
     peng
    -0.07
    POSITIVE LOGITS
    ාව
    0.09
    ares
    0.08
     Ortega
    0.08
    .axis
    0.08
    zheimer
    0.07
     ಮುಖ
    0.07
     සිය
    0.07
    claimer
    0.07
     najbardziej
    0.07
    0.07
    Act Density 0.011%

    No Known Activations