INDEX
    Explanations

    phrases indicating some level of certainty or comparison, often involving the phrase "at least."

    phrases indicating a minimum or a baseline condition

    New Auto-Interp
    Negative Logits
    rence
    -0.74
     FANT
    -0.70
    bath
    -0.69
    rend
    -0.67
    icides
    -0.65
    iard
    -0.64
    ãĥ¼ãĤ¯
    -0.63
    bern
    -0.63
    ses
    -0.61
    shr
    -0.60
    POSITIVE LOGITS
     partly
    0.74
    uner
    0.71
     partially
    0.69
    fair
    0.69
     judging
    0.69
    ety
    0.67
     toler
    0.65
     theoretically
    0.65
     temporarily
    0.63
    een
    0.62
    Act Density 0.023%

    No Known Activations