INDEX
Explanations
instances of the word "but" and variations that signal contrast or objections within the text
New Auto-Interp
Negative Logits
oire
-0.70
ampa
-0.69
onomy
-0.61
Pigs
-0.60
},"
-0.59
Mehran
-0.58
velt
-0.58
wake
-0.57
Highest
-0.57
anthem
-0.56
POSITIVE LOGITS
tons
1.10
chery
1.06
nonetheless
1.02
otherwise
0.98
nevertheless
0.96
alas
0.95
lacks
0.93
lacked
0.91
lacking
0.90
still
0.90
Activations Density 0.079%