INDEX
Explanations
negative outcomes or consequences
phrases indicating a negative assessment or worsening situations
New Auto-Interp
Negative Logits
bern
-0.72
eur
-0.67
#$
-0.64
utm
-0.63
elin
-0.62
ulating
-0.62
landers
-0.60
hyde
-0.60
zyme
-0.60
conservancy
-0.59
POSITIVE LOGITS
yet
1.23
still
1.14
than
1.12
yet
0.96
than
0.94
Than
0.88
Yet
0.83
Still
0.79
Yet
0.74
Still
0.67
Activations Density 0.072%