INDEX
    Explanations

    expressions of subjective judgments about morality and behavior

    New Auto-Interp
    Negative Logits
    \CMS
    -0.16
    Äĥn
    -0.15
    OLID
    -0.15
    nist
    -0.15
    ãģ¡ãĤĥ
    -0.15
    omas
    -0.14
     Screw
    -0.14
    μοÏĤ
    -0.14
    emes
    -0.14
    è¬
    -0.14
    POSITIVE LOGITS
     posts
    0.20
     Posts
    0.18
     Thread
    0.18
    ä½łçļĦ
    0.18
     straw
    0.17
    /thread
    0.17
     OP
    0.17
     troll
    0.16
     your
    0.16
     posted
    0.16
    Act Density 0.464%

    No Known Activations