INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ley
    1.12
    lip
    1.08
    ka
    1.05
    ese
    1.03
    ali
    1.02
    man
    1.01
    rip
    0.98
    ree
    0.98
    0.98
    ur
    0.97
    POSITIVE LOGITS
    ח
    1.13
     öğrenciler
    1.10
    ות
    1.08
    ס
    1.06
    ל
    1.05
     superheroes
    1.03
     mundo
    1.02
     siedz
    1.02
    1.01
     allemand
    1.00
    Act Density 0.009%

    No Known Activations