INDEX
Explanations
references to punitive actions or consequences
references to punishment in various contexts
New Auto-Interp
Negative Logits
eds
-0.89
ergy
-0.80
rote
-0.75
oir
-0.75
NetMessage
-0.73
aug
-0.72
ocal
-0.72
sonian
-0.71
roots
-0.70
soDeliveryDate
-0.69
POSITIVE LOGITS
punishment
1.29
punishments
1.16
punish
1.05
punished
1.04
inflicted
0.97
sanction
0.93
ishment
0.92
humiliation
0.89
lashes
0.89
exha
0.87
Activations Density 0.015%