INDEX
    Explanations

    negations and expressions of doubt or uncertainty

    New Auto-Interp
    Negative Logits
    384
    -0.16
    nty
    -0.16
    okers
    -0.14
    antor
    -0.14
     fairly
    -0.14
    ubl
    -0.14
    ran
    -0.14
    xit
    -0.13
    adf
    -0.13
    bj
    -0.13
    POSITIVE LOGITS
     THAT
    0.41
    _that
    0.33
     TOO
    0.33
     that
    0.32
    that
    0.32
     too
    0.31
    That
    0.30
    too
    0.29
     že
    0.29
    	that
    0.28
    Act Density 0.117%

    No Known Activations