INDEX
Explanations
phrases related to punishment and justice
terms related to punishment and its enforcement
New Auto-Interp
Negative Logits
gow
-0.87
coe
-0.83
park
-0.73
opic
-0.72
leaf
-0.71
Champ
-0.71
yip
-0.70
Herz
-0.69
aug
-0.69
eds
-0.67
POSITIVE LOGITS
punished
1.11
harshly
1.06
punishments
1.03
punish
1.01
punishment
1.01
inflicted
0.83
severely
0.82
punishing
0.80
sanction
0.79
penalties
0.78
Activations Density 0.029%