INDEX
Explanations
contrasting statements, particularly focusing on disproving or negating an initial idea
contrasting statements that introduce clarification or correction
New Auto-Interp
Negative Logits
asks
-0.71
illary
-0.66
meta
-0.64
ILLE
-0.63
nat
-0.62
ct
-0.62
holder
-0.61
enter
-0.60
orter
-0.60
ory
-0.59
POSITIVE LOGITS
rather
1.38
nevertheless
1.12
merely
1.10
rather
1.09
nonetheless
1.02
suffice
0.98
Rather
0.96
instead
0.91
simply
0.90
luckily
0.84
Activations Density 0.110%