INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ाह
    -0.07
    ایط
    -0.07
    část
    -0.07
    хід
    -0.07
     आध
    -0.07
    γχ
    -0.07
    άσ
    -0.07
    .ph
    -0.06
    Tên
    -0.06
     الشيخ
    -0.06
    POSITIVE LOGITS
     sure
    0.13
    Sure
    0.12
     Sure
    0.11
     surely
    0.11
    sure
    0.10
     Surely
    0.09
    ure
    0.08
     unsure
    0.08
     clear
    0.07
    0.07
    Act Density 0.013%

    No Known Activations