INDEX
    Explanations

    conjunctions 'or' and 'and' signaling contrasting or additive relationships

    phrases indicating moral comparisons between good and bad

    New Auto-Interp
    Negative Logits
    ivari
    -0.71
    quit
    -0.70
     LEVEL
    -0.68
     veins
    -0.68
    igham
    -0.66
    brates
    -0.66
    ĸļ
    -0.64
    sbm
    -0.64
    igraph
    -0.63
    aturday
    -0.63
    POSITIVE LOGITS
     evil
    1.13
     bad
    1.09
    evil
    1.08
     brightest
    1.00
    Evil
    0.98
    bad
    0.89
     Evil
    0.86
     wrong
    0.84
     evils
    0.83
     BAD
    0.83
    Act Density 0.099%

    No Known Activations