INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     יש
    -0.08
     Yuan
    -0.08
    -0.08
     Ade
    -0.08
    brown
    -0.08
    联盟
    -0.07
     הרב
    -0.07
     organs
    -0.07
     Stav
    -0.07
     Tehran
    -0.07
    POSITIVE LOGITS
     Hemingway
    0.09
     pensioen
    0.08
     elegance
    0.08
     elegantly
    0.08
     Beaut
    0.08
     elegante
    0.08
     cottage
    0.07
     grazing
    0.07
     vacation
    0.07
     piccolo
    0.07
    Act Density 0.066%

    No Known Activations