INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     bahwa
    1.19
    1.16
    ı
    1.04
    0.99
    2
    0.98
     maravill
    0.98
     ว่า
    0.96
     że
    0.95
    0.95
     alcune
    0.95
    POSITIVE LOGITS
    ين
    1.64
    the
    1.49
    .
    1.43
    (
    1.23
     (
    1.13
    og
    1.11
    ت
    1.10
    ého
    1.09
    {
    1.06
    يرا
    1.05
    Act Density 0.006%

    No Known Activations