INDEX
Explanations
phrases related to punishment or condemnation
phrases associated with punitive actions or legal consequences
New Auto-Interp
Negative Logits
overlap
-0.74
ggles
-0.71
tilt
-0.67
glimps
-0.67
resid
-0.66
branching
-0.66
preview
-0.64
ynchron
-0.64
narrowing
-0.63
Rhythm
-0.63
POSITIVE LOGITS
ASAP
1.36
lest
0.98
whoever
0.89
wherever
0.87
sooner
0.84
preferably
0.83
someday
0.81
unless
0.81
punishable
0.81
perpetrators
0.80
Activations Density 0.570%