INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     
    1.14
     a
    1.09
    ist
    1.09
     i
    1.07
     malicious
    1.05
     maliciously
    0.98
     for
    0.97
     bruises
    0.95
    for
    0.95
     माध्यम
    0.94
    POSITIVE LOGITS
    с
    1.37
    в
    1.34
    s
    1.32
    ים
    1.21
    ות
    1.19
    ى
    1.19
    к
    1.16
    ко
    1.15
    1.12
    жи
    1.08
    Act Density 0.000%

    No Known Activations