INDEX
    Explanations

    references to exceptions in rules, systems, or situations

    instances of the word "exception."

    New Auto-Interp
    Negative Logits
    lab
    -0.69
    ching
    -0.66
    uld
    -0.65
    yang
    -0.63
    roc
    -0.63
    roph
    -0.62
    odium
    -0.62
    Lab
    -0.61
    cong
    -0.61
    itance
    -0.60
    POSITIVE LOGITS
     exception
    1.09
     exceptions
    1.06
    except
    0.85
    witz
    0.73
     Exception
    0.73
    ishly
    0.72
    ptions
    0.71
    DERR
    0.70
     objections
    0.70
    flake
    0.69
    Act Density 0.013%

    No Known Activations