INDEX
    Explanations

    instances of words related to social issues and controversies, particularly those related to justice, power, and evidence

    terms and phrases related to discomfort and societal issues

    New Auto-Interp
    Negative Logits
    Tes
    -0.73
    emale
    -0.64
    Ultra
    -0.63
    Teen
    -0.61
    Ay
    -0.60
    cientious
    -0.59
    joining
    -0.59
    senal
    -0.58
     Seventh
    -0.57
    Congratulations
    -0.57
    POSITIVE LOGITS
     ain
    0.81
    "?
    0.77
    ?
    0.76
     refers
    0.74
    !?
    0.74
    ...?
    0.73
    ???
    0.72
    ":["
    0.72
    ?!
    0.72
     equals
    0.71
    Act Density 0.677%

    No Known Activations