INDEX
    Explanations

    words or prefixes related to negation or reversing actions

    words that indicate a lack or negation

    New Auto-Interp
    Negative Logits
    OPLE
    -1.04
    hetti
    -0.82
     Dynamics
    -0.81
    jriwal
    -0.79
    anwhile
    -0.78
    ORY
    -0.75
    utical
    -0.74
    ãĥ¼ãĥĨãĤ£
    -0.74
    ħĭ
    -0.72
    uyomi
    -0.71
    POSITIVE LOGITS
    balanced
    1.15
    cles
    1.10
    confirmed
    1.10
    apolog
    1.10
    ifying
    1.09
    assuming
    1.09
    classified
    1.05
    leased
    1.05
    rep
    1.03
    ruly
    1.02
    Act Density 0.027%

    No Known Activations