INDEX
    Explanations

    politically charged words and phrases; specifically, it seems to highlight strong or forceful statements

    repeated characters or stylized characters in text

    New Auto-Interp
    Negative Logits
     disadvant
    -0.86
     Gutenberg
    -0.83
     mathemat
    -0.81
     misunder
    -0.70
    inav
    -0.69
    geries
    -0.69
    merce
    -0.68
     carbohyd
    -0.67
     whiff
    -0.64
    raviolet
    -0.63
    POSITIVE LOGITS
    ï¸ı
    1.17
    uth
    0.95
    女
    0.90
    ¯¯
    0.88
    ï¸
    0.87
    §
    0.87
    ution
    0.81
    ãģ®éŃĶ
    0.81
    ãĤĭ
    0.80
    ··
    0.78
    Act Density 0.387%

    No Known Activations