INDEX
Explanations
themes of social justice and moral judgment
New Auto-Interp
Negative Logits
ulis
-0.14
.shared
-0.14
enus
-0.14
akov
-0.13
.Shared
-0.13
Laud
-0.13
agem
-0.13
shared
-0.13
bonne
-0.13
zz
-0.13
POSITIVE LOGITS
fü
0.14
ingham
0.14
ÑĩеÑĢ
0.14
vat
0.14
ä»Ķ
0.14
·
0.14
CRET
0.14
510
0.14
ÑĹ
0.14
ضÙĬ
0.13
Activations Density 0.342%