INDEX
Explanations
sentences containing phrases related to criticism or negative evaluation
phrases indicating negative impacts or consequences
New Auto-Interp
Negative Logits
gency
-0.74
ugu
-0.73
cu
-0.71
ory
-0.69
apter
-0.69
Closing
-0.63
uther
-0.60
monton
-0.60
gery
-0.60
veil
-0.60
POSITIVE LOGITS
also
0.96
downright
0.93
also
0.85
ALSO
0.83
actively
0.82
secondly
0.76
Secondly
0.76
strategically
0.73
cially
0.69
DES
0.69
Activations Density 0.106%