INDEX
    Explanations

    negations and negative phrases

    New Auto-Interp
    Negative Logits
    overy
    -0.18
    ayo
    -0.16
    uzzi
    -0.15
    uze
    -0.14
    ogens
    -0.14
    SSF
    -0.13
    loth
    -0.13
    etty
    -0.13
    ughter
    -0.13
    Quad
    -0.13
    POSITIVE LOGITS
     outlet
    0.21
     sale
    0.19
    -sale
    0.18
     cheap
    0.18
     online
    0.18
    online
    0.17
     Outlet
    0.17
    discount
    0.17
     Sale
    0.16
    agli
    0.16
    Act Density 0.033%

    No Known Activations