INDEX
    Explanations

    statements/denials

    The neuron fires on tokens appearing in formal denial or disclaimer language, such as “defamatory,” “never,” “has,” and other words in statements asserting innocence or refuting claims.

    New Auto-Interp
    Negative Logits
    드로
    -0.07
     Kata
    -0.06
    ीदव
    -0.06
     guise
    -0.06
    -focused
    -0.06
    igrated
    -0.06
    .Auth
    -0.06
     basic
    -0.06
    เล
    -0.06
    .Project
    -0.06
    POSITIVE LOGITS
     approaching
    0.07
     `{
    0.06
    _css
    0.06
     ناب
    0.06
    route
    0.06
     pprint
    0.06
    0.06
    getY
    0.06
     จำนวน
    0.06
    ����
    0.06
    Act Density 0.027%

    No Known Activations