INDEX
    Explanations

    phrases questioning or challenging societal norms or behaviors

    phrases that express denial or rejection of certain actions or beliefs

    New Auto-Interp
    Negative Logits
     Vers
    -0.67
    uled
    -0.65
    verning
    -0.64
     Province
    -0.59
     Renew
    -0.58
    orsche
    -0.56
    Located
    -0.56
    iversary
    -0.56
    ouver
    -0.56
    entials
    -0.56
    POSITIVE LOGITS
     trolling
    0.78
     shaming
    0.77
     cynicism
    0.76
     misguided
    0.75
     frankly
    0.74
     honestly
    0.74
    instead
    0.73
     subconscious
    0.72
     goddamn
    0.72
     trolls
    0.71
    Act Density 1.377%

    No Known Activations