INDEX
    Explanations

    words related to specific cases or instances that deviate from a general rule or norm

    references to exceptions in rules or guidelines

    New Auto-Interp
    Negative Logits
    yss
    -0.74
    ebus
    -0.74
     istg
    -0.72
    âĸ¬
    -0.71
    riz
    -0.69
    æ©Ł
    -0.68
    Delivery
    -0.65
    legram
    -0.65
     Heist
    -0.65
     Dow
    -0.64
    POSITIVE LOGITS
    poons
    0.93
     exceptions
    0.93
    perty
    0.89
    ervative
    0.86
    ensical
    0.85
    afety
    0.81
    uba
    0.81
    ppings
    0.80
     loopholes
    0.78
    cale
    0.77
    Act Density 0.010%

    No Known Activations