INDEX
    Explanations

    supportive language and words related to advocating for a specific cause or belief

    references to a specific cause or movement

    New Auto-Interp
    Negative Logits
    aeper
    -0.78
    ault
    -0.75
     Leopard
    -0.73
    Ku
    -0.69
     Pione
    -0.69
    olitan
    -0.68
     Sheep
    -0.68
     lav
    -0.66
     Centers
    -0.66
     Technique
    -0.65
    POSITIVE LOGITS
     cele
    1.27
    cause
    0.81
     Cause
    0.79
    way
    0.76
     celeb
    0.74
    facts
    0.73
    wagon
    0.71
    forge
    0.71
    DNA
    0.71
    fare
    0.70
    Act Density 0.028%

    No Known Activations