INDEX
    Explanations

    phrases indicating quantity or frequency

    New Auto-Interp
    Negative Logits
    iors
    -0.18
    combe
    -0.15
    ceased
    -0.15
    illes
    -0.15
    oders
    -0.14
     confines
    -0.14
    orne
    -0.14
    hy
    -0.14
    ble
    -0.14
    ialis
    -0.14
    POSITIVE LOGITS
    tery
    0.26
    to
    0.21
    nict
    0.20
    ting
    0.19
     fewer
    0.19
    tern
    0.19
    ãĤĵãģ©
    0.18
    tering
    0.17
     more
    0.17
    TA
    0.16
    Act Density 0.032%

    No Known Activations