INDEX
    Explanations

    words related to preference or support

    instances of the word "favor" and its variations, indicating preference or support

    New Auto-Interp
    Negative Logits
    sis
    -0.88
    ı
    -0.83
    pt
    -0.80
    thur
    -0.79
    mberg
    -0.77
    RT
    -0.76
    hid
    -0.76
    att
    -0.75
    gren
    -0.75
    raz
    -0.72
    POSITIVE LOGITS
     favored
    1.16
    itism
    1.07
     favoring
    0.99
     favors
    0.93
     favoured
    0.93
    nesday
    0.86
     favorites
    0.86
     whipping
    0.80
     hitters
    0.74
     favorable
    0.73
    Act Density 0.008%

    No Known Activations