INDEX
    Explanations

    the word "more" and its variations suggesting an additive or comparative sense

    New Auto-Interp
    Negative Logits
     well
    -0.15
    eniable
    -0.15
    owed
    -0.14
    eyse
    -0.14
    atic
    -0.14
     FAR
    -0.14
     addCriterion
    -0.14
     quite
    -0.13
    adele
    -0.13
    uby
    -0.13
    POSITIVE LOGITS
     times
    0.22
    ingly
    0.20
    times
    0.19
     during
    0.18
    次
    0.18
     veces
    0.17
     than
    0.17
    /to
    0.17
     TIMES
    0.16
    /at
    0.16
    Act Density 0.095%

    No Known Activations