INDEX
    Explanations

    words related to restrictions or prohibitions

    words related to negative or limiting conditions

    New Auto-Interp
    Negative Logits
    mble
    -0.67
    ãĥ¼ãĥĨ
    -0.63
    iffin
    -0.61
     Enhancement
    -0.60
    etts
    -0.57
    rane
    -0.57
    irez
    -0.57
    oult
    -0.56
     conver
    -0.54
    arbon
    -0.54
    POSITIVE LOGITS
     whatsoever
    0.94
    urnal
    0.92
    onsense
    0.80
    mber
    0.76
    ilings
    0.70
    hawk
    0.68
     Chomsky
    0.66
    omi
    0.65
     Pradesh
    0.64
    ody
    0.63
    Act Density 0.070%

    No Known Activations