INDEX
    Explanations

    actions related to expressing opinions, debating critical issues, and discussing social justice topics

    New Auto-Interp
    Negative Logits
     Bucs
    -0.63
     Uz
    -0.60
     wont
    -0.59
     Gh
    -0.55
     Pis
    -0.55
     Singh
    -0.54
    --------------------
    -0.54
     Falcons
    -0.54
     Goblin
    -0.53
    Dub
    -0.53
    POSITIVE LOGITS
     oneself
    1.49
     ourselves
    0.86
     yourself
    0.84
    uate
    0.83
    enance
    0.82
    itate
    0.80
    entious
    0.76
     them
    0.72
     yourselves
    0.71
    onom
    0.70
    Act Density 0.187%

    No Known Activations