INDEX
    Explanations

    phrases indicating negation or the absence of something

    New Auto-Interp
    Negative Logits
    arse
    -0.15
    öh
    -0.14
    hlas
    -0.14
    hn
    -0.13
    ali
    -0.13
    -utils
    -0.13
    brids
    -0.13
     hypotheses
    -0.13
    ETS
    -0.12
    uhan
    -0.12
    POSITIVE LOGITS
     denying
    0.26
     question
    0.23
     way
    0.23
     disput
    0.23
     reason
    0.23
     need
    0.21
     room
    0.21
     deny
    0.20
     guarantee
    0.20
     telling
    0.20
    Act Density 0.048%

    No Known Activations