INDEX
    Explanations

    negations or expressions of disapproval

    New Auto-Interp
    Negative Logits
    <bos>
    -1.92
     intersper
    -1.18
     amass
    -0.84
    /***
    
    -0.80
     endow
    -0.79
     disarm
    -0.77
     rouse
    -0.76
     disambigu
    -0.72
     vanqu
    -0.72
     acquaint
    -0.70
    POSITIVE LOGITS
    should
    0.85
     Should
    0.82
    Should
    0.81
     should
    0.80
     SHOULD
    0.75
    hould
    0.69
     shouldn
    0.68
    bekah
    0.60
     noten
    0.60
    cautionary
    0.60
    Act Density 0.113%

    No Known Activations