INDEX
Explanations
adjectives related to severity or intensity
references to severe or tough situations
New Auto-Interp
Negative Logits
ovember
-0.92
phis
-0.91
ITNESS
-0.82
cellent
-0.81
ĸļ
-0.78
assies
-0.75
ublic
-0.75
ICLE
-0.75
ividual
-0.72
agine
-0.72
POSITIVE LOGITS
harsh
1.03
ness
0.93
harshly
0.92
retribution
0.88
punishments
0.88
winters
0.86
nesses
0.85
harsher
0.85
punitive
0.84
ened
0.84
Activations Density 0.025%