INDEX
    Explanations

    phrases related to political speeches or statements

    repeated references to collective pronouns, particularly "we."

    New Auto-Interp
    Negative Logits
    tains
    -0.74
    uces
    -0.71
    advertisement
    -0.64
    lights
    -0.63
    ulence
    -0.60
    VERTISEMENT
    -0.59
    laughs
    -0.59
    mund
    -0.59
     Leopard
    -0.57
    imal
    -0.57
    POSITIVE LOGITS
    akening
    1.26
     owe
    1.23
    're
    1.22
     need
    1.19
     cannot
    1.12
    've
    1.11
     must
    1.10
     ought
    1.05
    'll
    1.00
     deserve
    1.00
    Act Density 0.162%

    No Known Activations