INDEX
    Explanations

    contractions involving "does not"

    negations or phrases indicating disagreement or denial

    New Auto-Interp
    Negative Logits
     Classification
    -0.68
     protected
    -0.65
     PU
    -0.64
     learning
    -0.63
     Carth
    -0.62
     nearest
    -0.62
     Butt
    -0.60
     Letter
    -0.60
     elimination
    -0.60
     pockets
    -0.59
    POSITIVE LOGITS
    't
    1.64
    ÃŃ
    1.01
    ´
    0.98
    etsk
    0.91
    uts
    0.91
    n
    0.90
    ates
    0.90
    acio
    0.90
    itely
    0.88
    eness
    0.87
    Act Density 0.124%

    No Known Activations