INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    de
    1.79
    ya
    1.52
    op
    1.49
    KI
    1.46
    Kalau
    1.42
    1.38
    ף
    1.36
    不懂
    1.35
    KA
    1.34
    ñal
    1.32
    POSITIVE LOGITS
    ز
    1.62
    INH
    1.55
     имена
    1.50
    گذاری
    1.46
    मात्र
    1.41
     gọi
    1.38
    iere
    1.38
     voda
    1.38
     имя
    1.37
    РИ
    1.35
    Act Density 0.251%

    No Known Activations