INDEX
    Explanations

    phrases related to consequences and actions, especially those involving societal or political implications

    references to abstract concepts or issues related to societal problems

    New Auto-Interp
    Negative Logits
     Yel
    -0.68
     Spoon
    -0.66
     Pirates
    -0.63
    estern
    -0.59
     marines
    -0.57
     Sims
    -0.57
    aughed
    -0.57
     Aluminum
    -0.56
    UGH
    -0.56
     Suzuki
    -0.56
    POSITIVE LOGITS
     oneself
    0.99
    self
    0.97
     ourselves
    0.93
    selves
    0.87
    chy
    0.84
    alian
    0.84
     anew
    0.83
     yourself
    0.81
     selves
    0.80
    obe
    0.80
    Act Density 0.215%

    No Known Activations