INDEX
    Explanations

    the presence of conjunctions or phrases indicating contrast

    New Auto-Interp
    Head Attr Weights
    0:0.08
    1:0.08
    2:0.09
    3:0.08
    4:0.09
    5:0.07
    6:0.08
    7:0.08
    8:0.09
    9:0.06
    10:0.07
    11:0.07
    Negative Logits
     Cosmetic
    -1.99
     renamed
    -1.89
     warr
    -1.87
     banners
    -1.83
     behav
    -1.83
    $.
    -1.82
     Uniform
    -1.81
     epit
    -1.80
     Proud
    -1.80
    ;;;;;;;;;;;;
    -1.80
    POSITIVE LOGITS
    Todd
    2.19
    ubes
    2.19
    olars
    2.01
    nos
    1.99
    igma
    1.98
    hare
    1.92
    mares
    1.91
    sites
    1.84
    tails
    1.84
    inity
    1.83
    Act Density 0.000%

    No Known Activations