INDEX
Explanations
discussions related to crime and harm perception
New Auto-Interp
Negative Logits
Bias
-0.16
evolution
-0.15
rei
-0.15
Bias
-0.14
tribal
-0.14
man
-0.14
bias
-0.14
Tribal
-0.14
Ecc
-0.14
Evolution
-0.13
POSITIVE LOGITS
disc
0.22
Fou
0.19
neoliberal
0.18
Contest
0.18
spaces
0.18
spaces
0.18
valor
0.17
Scalar
0.17
meanings
0.17
spatial
0.17
Activations Density 0.384%