INDEX
    Explanations

    comparisons and evaluations in text, focusing on expressions related to superiority or advancement

    New Auto-Interp
    Negative Logits
    raq
    -1.09
    autions
    -1.09
    Sit
    -1.01
    uctions
    -1.01
    imity
    -0.97
    oyer
    -0.96
    eeper
    -0.95
    negie
    -0.92
    oking
    -0.92
    Remove
    -0.91
    POSITIVE LOGITS
    average
    1.15
     usual
    1.05
     ours
    1.04
     theirs
    1.00
     average
    0.99
     ordinary
    0.97
     anticipated
    0.93
     Gore
    0.91
     anything
    0.90
    usual
    0.89
    Act Density 1.989%

    No Known Activations