INDEX
Explanations
instances where the statement or action is contradictory
the word "but", indicating contrasting statements
New Auto-Interp
Negative Logits
edu
-0.78
gie
-0.70
oire
-0.69
uto
-0.68
ampa
-0.67
esc
-0.67
ige
-0.65
sky
-0.65
tnc
-0.63
Times
-0.63
POSITIVE LOGITS
alas
1.14
nevertheless
1.13
nonetheless
1.12
lacks
0.95
fortunately
0.95
retains
0.94
lacked
0.92
ignores
0.91
luckily
0.91
chery
0.90
Activations Density 0.176%