INDEX
    Explanations

    words related to politics and public figures

    expressions of social justice concerns and systemic issues

    New Auto-Interp
    Negative Logits
    uca
    -0.62
    ction
    -0.62
    rad
    -0.60
     distraction
    -0.57
    imity
    -0.56
    activity
    -0.56
    heit
    -0.55
    uto
    -0.55
     wanting
    -0.55
    emies
    -0.53
    POSITIVE LOGITS
    ³³³
    0.83
    ³³³³³³³³³³³³³³³³
    0.75
    ³³³³³³³³
    0.74
    ³³³³
    0.69
    ECK
    0.68
     WD
    0.62
    reditary
    0.61
    ↵Âł
    0.61
    eki
    0.59
    ????????
    0.59
    Act Density 0.542%

    No Known Activations