INDEX
    Explanations

    references to societal blame and fault, particularly in relation to racial or cultural issues

    New Auto-Interp
    Negative Logits
    AxisAlignment
    -0.88
     lediglich
    -0.72
     tevens
    -0.72
     تضيفلها
    -0.70
    互联网档案馆
    -0.69
    ')],
    -0.68
     sought
    -0.67
    に対し
    -0.67
    >"+
    -0.66
     hinweg
    -0.66
    POSITIVE LOGITS
     stuff
    1.04
     fucking
    0.87
     scared
    0.83
     everybody
    0.82
     freaking
    0.81
     guys
    0.80
     scary
    0.80
     stupid
    0.80
     thing
    0.80
     freakin
    0.78
    Act Density 0.866%

    No Known Activations