INDEX
    Explanations

    negations and expressions of disagreement or refutation

    New Auto-Interp
    Negative Logits
    863
    -0.15
    Alive
    -0.14
    esktop
    -0.14
    pto
    -0.14
    à¹ĥห
    -0.14
     either
    -0.14
    either
    -0.14
    нож
    -0.13
    pite
    -0.13
     Alive
    -0.13
    POSITIVE LOGITS
    isiyle
    0.15
    ãĤ¸ãĤª
    0.14
    Äħż
    0.14
    ETS
    0.14
    itution
    0.14
    _LITERAL
    0.14
    ensch
    0.13
     just
    0.13
     vice
    0.13
    κÎŃ
    0.13
    Act Density 0.033%

    No Known Activations