INDEX
Explanations
superlatives indicating extreme negativity such as "worst"
references to the term "worst," particularly in negative contexts
New Auto-Interp
Negative Logits
arij
-0.79
alde
-0.75
ulton
-0.74
ijk
-0.73
iverpool
-0.72
dinand
-0.72
itialized
-0.71
mun
-0.70
ependence
-0.69
bara
-0.68
POSITIVE LOGITS
nightmare
0.92
imaginable
0.89
worst
0.88
Worst
0.86
offender
0.86
worst
0.85
offenders
0.79
EST
0.74
nightmares
0.73
iary
0.73
Activations Density 0.011%