INDEX
    Explanations

    words that express a sense of negation or non-conformity

    New Auto-Interp
    Negative Logits
    isko
    -0.14
    rai
    -0.13
    rich
    -0.13
    927
    -0.13
     Orr
    -0.13
     Clem
    -0.13
    ...
    -0.13
    jie
    -0.13
    ueva
    -0.13
     Acres
    -0.13
    POSITIVE LOGITS
    atur
    0.19
    erken
    0.17
    oth
    0.17
    (er
    0.17
    anou
    0.17
    alars
    0.15
    à¹Ĩ
    0.15
    theless
    0.15
     facto
    0.15
    åĵ
    0.15
    Act Density 0.109%

    No Known Activations