INDEX
    Explanations

    words related to personal preferences or choices

    references to personal preferences

    New Auto-Interp
    Negative Logits
    bane
    -0.81
    mans
    -0.76
    kj
    -0.74
    mberg
    -0.73
    wordpress
    -0.72
    amaz
    -0.71
    Sus
    -0.69
    bold
    -0.69
    WARN
    -0.69
    Adams
    -0.69
    POSITIVE LOGITS
     preferences
    1.07
    yip
    0.96
     preference
    0.92
     favoring
    0.91
     elig
    0.84
     favoured
    0.80
     selection
    0.77
     palate
    0.76
     skew
    0.74
     choice
    0.74
    Act Density 0.012%

    No Known Activations