INDEX
Explanations
the word "worse"
references to worsening situations or conditions
New Auto-Interp
Negative Logits
ettes
-0.87
encers
-0.82
itialized
-0.79
ools
-0.77
ette
-0.75
etition
-0.75
heter
-0.73
oir
-0.72
ieu
-0.72
trl
-0.72
POSITIVE LOGITS
than
1.12
Than
1.05
behaved
0.92
than
0.85
nightmares
0.83
catast
0.82
nightmare
0.77
nces
0.75
Ukrain
0.68
undermin
0.68
Activations Density 0.020%