INDEX
    Explanations

    controversial political topics and statements related to social issues

    New Auto-Interp
    Negative Logits
    anwhile
    -0.82
     Shutterstock
    -0.75
     tremend
    -0.75
    ortium
    -0.73
     seiz
    -0.73
     anecd
    -0.72
     sadly
    -0.72
     mathemat
    -0.70
     patched
    -0.70
     conduc
    -0.70
    POSITIVE LOGITS
    Ĵ
    1.05
    ¡
    1.05
    ĸ
    1.04
    ħ
    1.02
    ĩ
    1.02
    į
    1.02
    ĥ
    1.01
    ¬
    1.01
    ľ
    0.99
    Ĥ
    0.96
    Act Density 0.179%

    No Known Activations