INDEX
Explanations
phrases related to contrasting statements or clauses
commas in the text
New Auto-Interp
Negative Logits
è¦ļéĨĴ
-0.62
sth
-0.62
ode
-0.61
eers
-0.60
cial
-0.58
ttes
-0.58
nexus
-0.57
://
-0.57
ruction
-0.56
ze
-0.54
POSITIVE LOGITS
unlike
1.19
alas
1.16
contrary
1.12
despite
1.09
although
1.05
hey
1.04
despite
0.99
barring
0.97
unsurprisingly
0.96
unfortunately
0.95
Activations Density 0.085%