INDEX
    Explanations

    the word "rather" and its variations indicating preference or comparison

    New Auto-Interp
    Negative Logits
    ery
    -0.17
    smith
    -0.17
    sert
    -0.16
    tar
    -0.15
    sw
    -0.15
    urator
    -0.15
    system
    -0.15
    ys
    -0.15
    eat
    -0.15
    entre
    -0.15
    POSITIVE LOGITS
     than
    0.18
    -than
    0.16
    ODE
    0.15
     вÑģего
    0.15
    711
    0.15
    ìĦľ
    0.15
    _than
    0.15
    apy
    0.15
    UNIX
    0.15
    rière
    0.15
    Act Density 0.016%

    No Known Activations