INDEX
Explanations
words related to negative situations or conditions
phrases indicating a decline or worsening situation
New Auto-Interp
Negative Logits
etition
-0.78
oir
-0.78
ieu
-0.73
alde
-0.73
encers
-0.71
orney
-0.69
ivities
-0.69
aut
-0.69
rylic
-0.69
riage
-0.68
POSITIVE LOGITS
worse
0.98
than
0.95
destro
0.91
behaved
0.89
nightmare
0.86
nightmares
0.84
Worse
0.82
nces
0.82
catast
0.78
undermin
0.76
Activations Density 0.010%