INDEX
Explanations
phrases related to extreme or negative situations
instances of the word "worst" to highlight negative situations or conditions
New Auto-Interp
Negative Logits
agine
-0.81
raltar
-0.80
sure
-0.78
trl
-0.74
dinand
-0.73
andise
-0.73
uador
-0.72
itialized
-0.70
bold
-0.70
leans
-0.69
POSITIVE LOGITS
offender
1.00
imaginable
0.96
offenders
0.94
nightmare
0.93
behaved
0.83
nightmares
0.77
losers
0.75
possible
0.75
liest
0.74
loser
0.73
Activations Density 0.022%