INDEX
Explanations
superlatives expressing extreme negativity
references to the concept of "worst" or negative experiences
New Auto-Interp
Negative Logits
ynthesis
-0.87
ools
-0.86
andise
-0.86
agine
-0.79
uador
-0.79
sure
-0.79
itialized
-0.78
thouse
-0.78
arya
-0.76
icles
-0.76
POSITIVE LOGITS
offenders
1.03
offender
1.02
nightmare
0.93
nightmares
0.90
behaved
0.80
plag
0.78
catast
0.77
atrocities
0.76
imaginable
0.75
plague
0.75
Activations Density 0.042%