INDEX
    Explanations

    comparisons indicating improvement or preference

    comparative terms indicating preference or judgment

    New Auto-Interp
    Negative Logits
    utm
    -0.75
    eur
    -0.70
    ulating
    -0.65
    ategory
    -0.62
    uly
    -0.62
     Greenpeace
    -0.61
    urer
    -0.61
    inous
    -0.61
    itory
    -0.61
    naires
    -0.60
    POSITIVE LOGITS
     yet
    0.99
     than
    0.93
     Than
    0.88
    than
    0.84
    ment
    0.81
     still
    0.79
     behaved
    0.75
     luck
    0.73
     safe
    0.71
    bye
    0.71
    Act Density 0.033%

    No Known Activations