INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     I
    1.16
    I
    0.85
     announc
    0.80
     
    0.78
    د
    0.78
     It
    0.78
    y
    0.76
    أ
    0.76
    0.76
    0.73
    POSITIVE LOGITS
    ные
    1.45
     as
    1.38
     are
    1.13
    {
    1.04
    ‌ها
    1.03
    нда
    1.02
     were
    1.01
    é
    1.01
    ق
    1.01
    dır
    1.00
    Act Density 0.000%

    No Known Activations