INDEX
    Explanations

    Application rejections

    New Auto-Interp
    Negative Logits
    -box
    -0.08
    .↵
    -0.08
     et
    -0.08
     interaction
    -0.08
    box
    -0.07
    ung
    -0.07
     A
    -0.07
    -intensive
    -0.07
     advantage
    -0.07
    ناية
    -0.07
    POSITIVE LOGITS
     nonetheless
    0.12
     disappointment
    0.11
     trotzdem
    0.11
     comunque
    0.11
     consolation
    0.10
     сожал
    0.10
     disappointed
    0.10
    ご了承
    0.10
     politely
    0.10
     gracefully
    0.10
    Act Density 0.056%

    No Known Activations