INDEX
    Explanations

    phrases indicating comparison or evaluation, focusing on the outcome or result

    phrases that describe conditions or situations and their characteristic qualities

    New Auto-Interp
    Negative Logits
    ————
    -0.63
    MQ
    -0.62
     Doing
    -0.59
    wheel
    -0.59
    SPA
    -0.58
    ugu
    -0.58
    ulk
    -0.57
    onday
    -0.56
    Whe
    -0.56
    ipping
    -0.56
    POSITIVE LOGITS
     resembles
    1.14
     exceeds
    1.08
     justifies
    1.01
     contradicts
    0.99
     mirrors
    0.97
     inspires
    0.95
     surpass
    0.95
     undermines
    0.94
     horr
    0.94
     prevents
    0.93
    Act Density 0.132%

    No Known Activations