INDEX
Explanations
references to societal blame and fault, particularly in relation to racial or cultural issues
New Auto-Interp
Negative Logits
AxisAlignment
-0.88
lediglich
-0.72
tevens
-0.72
تضيفلها
-0.70
互联网档案馆
-0.69
')],
-0.68
sought
-0.67
に対し
-0.67
>"+
-0.66
hinweg
-0.66
POSITIVE LOGITS
stuff
1.04
fucking
0.87
scared
0.83
everybody
0.82
freaking
0.81
guys
0.80
scary
0.80
stupid
0.80
thing
0.80
freakin
0.78
Activations Density 0.866%