INDEX
    Explanations

    negative statements and the concept of impossibility

    New Auto-Interp
    Negative Logits
    â̦)↵↵
    -0.08
    allon
    -0.08
    umd
    -0.08
    ransition
    -0.07
    ="__
    -0.07
    _mC
    -0.07
    æ®Ĭ
    -0.07
    à¸Ļวà¸Ļ
    -0.07
    nung
    -0.07
    код
    -0.07
    POSITIVE LOGITS
     fail
    0.07
     deny
    0.06
     harm
    0.06
     fails
    0.06
     miss
    0.06
     Cotton
    0.06
     ignore
    0.06
     down
    0.06
     not
    0.06
     question
    0.06
    Act Density 0.027%

    No Known Activations