INDEX
    Explanations

    words related to social or political causes

    references to causes related to various issues and events

    New Auto-Interp
    Negative Logits
    aeper
    -0.81
    Ku
    -0.72
     Seym
    -0.70
    PDATE
    -0.69
    illet
    -0.69
     Leopard
    -0.68
    olitan
    -0.68
    aturdays
    -0.67
     lav
    -0.66
    ilings
    -0.63
    POSITIVE LOGITS
     cele
    1.32
    cause
    0.87
    way
    0.78
    forge
    0.74
    wagon
    0.71
    facts
    0.70
    vier
    0.70
     celeb
    0.70
    ality
    0.70
    fare
    0.70
    Act Density 0.030%

    No Known Activations