INDEX
Explanations
actions related to conflict and decisions involving moral dilemmas in narratives
New Auto-Interp
Negative Logits
ovna
-0.15
905
-0.15
enco
-0.14
/categories
-0.14
ancers
-0.14
Withdraw
-0.14
Withdraw
-0.14
cede
-0.14
607
-0.14
unj
-0.14
POSITIVE LOGITS
kill
0.37
kills
0.32
killing
0.28
killed
0.26
dispatch
0.25
kill
0.25
Kill
0.24
kills
0.23
Killing
0.23
Kills
0.23
Activations Density 0.334%