INDEX
Explanations
mentions or descriptions of penalties in various contexts
references to penalties or punitive measures
New Auto-Interp
Negative Logits
oir
-0.80
atters
-0.77
ership
-0.74
Alive
-0.70
walking
-0.69
IFE
-0.69
rina
-0.69
geist
-0.69
ERN
-0.68
birth
-0.67
POSITIVE LOGITS
penalties
1.28
penalty
1.13
levied
1.05
sanction
0.99
punishment
0.95
punish
0.94
punishments
0.94
imposed
0.92
undermin
0.91
punished
0.90
Activations Density 0.009%