INDEX
    Explanations

    negations and expressions of uncertainty

    New Auto-Interp
    Negative Logits
    bee
    -0.15
    imony
    -0.14
    ans
    -0.14
    pla
    -0.14
    hev
    -0.14
    anta
    -0.14
     toJSON
    -0.14
    krát
    -0.14
    drv
    -0.14
    ged
    -0.14
    POSITIVE LOGITS
    xious
    0.23
    sey
    0.21
     uncertain
    0.21
    obs
    0.20
    ont
    0.19
     stretch
    0.18
    oses
    0.18
     small
    0.17
    seg
    0.17
    okie
    0.17
    Act Density 0.037%

    No Known Activations