INDEX
Explanations
instances of contrasting information or unexpected outcomes
the conjunction "but" to signal contrasting ideas or exceptions
New Auto-Interp
Negative Logits
resa
-0.61
velt
-0.60
olution
-0.58
uphem
-0.58
enced
-0.58
uto
-0.58
oin
-0.57
archment
-0.57
mop
-0.56
nce
-0.56
POSITIVE LOGITS
tons
1.19
chery
1.05
nevertheless
0.87
tered
0.86
chers
0.86
alas
0.83
luckily
0.82
nonetheless
0.82
ler
0.80
fortunately
0.79
Activations Density 0.100%