INDEX
    Explanations

    comparisons that emphasize preference or prioritization

    comparative phrases emphasizing preference or alternatives

    New Auto-Interp
    Negative Logits
    ppo
    -0.75
    amba
    -0.73
    ruary
    -0.72
    mberg
    -0.71
    eria
    -0.71
    ocaust
    -0.70
    erto
    -0.70
    uay
    -0.69
    draft
    -0.69
    adium
    -0.68
    POSITIVE LOGITS
     unimagin
    0.78
     than
    0.69
     innocuous
    0.69
     Ide
    0.69
     distinguish
    0.68
     rather
    0.67
     irrelevant
    0.67
     metic
    0.67
     amusing
    0.66
     preferring
    0.65
    Act Density 0.016%

    No Known Activations