INDEX
    Explanations

    themes related to political discourse and criticism of societal norms

    New Auto-Interp
    Negative Logits
    entina
    -0.17
    iller
    -0.17
     stron
    -0.15
    illard
    -0.15
    ibold
    -0.15
    WithValue
    -0.15
    ailable
    -0.14
    irket
    -0.14
    alah
    -0.14
    achu
    -0.14
    POSITIVE LOGITS
     claim
    0.23
     claims
    0.22
     claiming
    0.21
     claimed
    0.21
     Claim
    0.19
     saying
    0.18
     CLAIM
    0.18
     Claims
    0.18
     argument
    0.17
     complain
    0.17
    Act Density 0.350%

    No Known Activations