INDEX
    Explanations

    "only" in different languages

    New Auto-Interp
    Negative Logits
    unsafe
    -0.08
    ریق
    -0.08
    acol
    -0.08
    acent
    -0.08
    涉嫌
    -0.08
    -0.08
    ��
    -0.08
    άν
    -0.08
    unene
    -0.08
     wrongful
    -0.08
    POSITIVE LOGITS
     مجرد
    0.13
     केवल
    0.11
     merely
    0.11
     그냥
    0.11
     plain
    0.11
     instead
    0.11
     فقط
    0.11
     শুধ
    0.11
     apenas
    0.10
     uniquement
    0.10
    Act Density 0.150%

    No Known Activations