INDEX
Explanations
terms related to punishment and revenge
New Auto-Interp
Negative Logits
imanapun
-0.58
Wikimedijinoj
-0.56
Statuses
-0.56
postData
-0.55
rfloor
-0.55
grø
-0.54
OrderStatus
-0.54
########.
-0.53
않았
-0.53
httphttps
-0.51
POSITIVE LOGITS
punishment
1.52
punish
1.49
reward
1.42
punishments
1.36
punished
1.33
punishing
1.32
revenge
1.30
Punishment
1.29
rewards
1.27
Reward
1.24
Activations Density 0.306%