INDEX
    Explanations

    sentences ending with a specific structure of punctuation

    toxic or harmful statements and concepts

    New Auto-Interp
    Negative Logits
     Manhattan
    -0.74
     Glou
    -0.71
     Syd
    -0.67
     Somerset
    -0.67
     scene
    -0.64
     Whit
    -0.64
     Roc
    -0.59
     reception
    -0.59
     Shattered
    -0.59
     RAD
    -0.58
    POSITIVE LOGITS
    ¬
    1.00
    âĢł
    0.90
    agree
    0.85
    Ĵ
    0.85
    âĹ¼
    0.83
    ¯
    0.82
    ÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤ
    0.82
      
    0.80
    §
    0.78
    ú
    0.78
    Act Density 0.230%

    No Known Activations