INDEX
    Explanations

    negations and words related to restrictions

    New Auto-Interp
    Negative Logits
    ίοÏĤ
    -0.15
    OUS
    -0.14
    eut
    -0.14
     envy
    -0.14
    azz
    -0.14
    gree
    -0.14
    overrides
    -0.14
    611
    -0.14
     fully
    -0.14
     vis
    -0.13
    POSITIVE LOGITS
    uese
    0.15
    å¤
    0.15
     relent
    0.15
    naments
    0.14
    tl
    0.14
    íħ
    0.14
    aben
    0.14
    IFI
    0.14
    entai
    0.14
    atcher
    0.14
    Act Density 0.121%

    No Known Activations