INDEX
    Explanations

    negations combined with specific locations or contexts

    phrases indicating exceptions or limitations

    New Auto-Interp
    Negative Logits
    omas
    -0.68
    icides
    -0.67
    bath
    -0.66
    shown
    -0.65
    ihar
    -0.62
    icide
    -0.61
    inea
    -0.61
    inus
    -0.59
    ao
    -0.59
    ubi
    -0.58
    POSITIVE LOGITS
    vous
    0.66
    ecast
    0.63
    hap
    0.62
     Admir
    0.61
    atican
    0.59
     ones
    0.58
    former
    0.58
    owitz
    0.58
    wives
    0.57
     part
    0.56
    Act Density 0.062%

    No Known Activations