INDEX
Explanations
terms related to ethical and moral judgments
New Auto-Interp
Negative Logits
DEM
-0.57
recoil
-0.55
moratorium
-0.54
unemployed
-0.52
rily
-0.51
curls
-0.50
Const
-0.50
withd
-0.50
)]
-0.49
skelet
-0.49
POSITIVE LOGITS
uary
0.78
ordering
0.74
ivating
0.72
onduct
0.70
odon
0.70
inction
0.68
Studio
0.68
aminer
0.67
urai
0.66
odo
0.66
Activations Density 0.136%