INDEX
    Explanations

    expressions of denial or refusal

    New Auto-Interp
    Negative Logits
     AppCompat
    -0.71
    गत
    -0.68
     ‘
    -0.63
     Matth
    -0.60
    ไง
    -0.59
    姆斯
    -0.59
    artament
    -0.59
    Schw
    -0.59
    duct
    -0.58
    hoga
    -0.58
    POSITIVE LOGITS
     Deny
    1.65
     denies
    1.54
     deny
    1.52
     denial
    1.42
     denied
    1.39
     Denial
    1.39
     denying
    1.38
    deny
    1.36
     Denied
    1.36
    Deny
    1.32
    Act Density 0.035%

    No Known Activations