INDEX
    Explanations

    contractions combined with the negation "not"

    negations or phrases indicating disagreement

    New Auto-Interp
    Negative Logits
     fixme
    -0.68
    ership
    -0.66
    IER
    -0.65
     Evaluation
    -0.64
    velt
    -0.61
    ilage
    -0.61
    inav
    -0.61
    ilege
    -0.60
     decency
    -0.59
    cano
    -0.59
    POSITIVE LOGITS
     alone
    1.29
     shy
    1.17
     amused
    1.07
     afraid
    1.02
     immune
    1.01
     necessarily
    0.96
     exactly
    0.93
     ashamed
    0.92
     Alone
    0.92
     kidding
    0.91
    Act Density 0.103%

    No Known Activations