INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1
    1.70
    1.50
    in
    1.41
    ol
    1.38
    ot
    1.32
    b
    1.32
    re
    1.27
    us
    1.27
    v
    1.23
    il
    1.23
    POSITIVE LOGITS
    ς
    1.36
    1.19
    1.15
     be
    1.08
    lığını
    1.06
    dır
    1.05
    所に
    1.05
    ы
    1.05
    اں
    1.04
     clumsy
    1.04
    Act Density 0.012%

    No Known Activations