INDEX
    Explanations

    negations or refusal expressions in sentences

    New Auto-Interp
    Negative Logits
    CI
    -0.76
     è£ıè
    -0.67
    å½
    -0.67
    ItemImage
    -0.66
     referen
    -0.63
    WithNo
    -0.62
    velop
    -0.62
    grounds
    -0.62
    jected
    -0.60
     beginnings
    -0.60
    POSITIVE LOGITS
     necessarily
    1.08
     lose
    1.03
     bother
    1.00
     compete
    1.00
     decide
    0.98
     seem
    0.98
     get
    0.96
     hurry
    0.93
     quit
    0.92
     gotta
    0.92
    Act Density 0.040%

    No Known Activations