INDEX
Explanations
instances of skepticism or disagreement
expressions of uncertainty or controversy surrounding social issues
New Auto-Interp
Negative Logits
ãĤ§
-0.70
ula
-0.67
alde
-0.66
bowl
-0.64
si
-0.63
orean
-0.61
().
-0.61
atos
-0.61
omever
-0.60
Americ
-0.60
POSITIVE LOGITS
nonetheless
1.16
nevertheless
0.93
etheless
0.92
cautioned
0.74
undeniably
0.71
concedes
0.71
curiously
0.71
balk
0.69
persists
0.69
elusive
0.68
Activations Density 0.797%