INDEX
Explanations
contrasting elements or concepts
phrases or clauses that contrast or negate a preceding statement
New Auto-Interp
Negative Logits
ct
-0.75
enter
-0.72
nat
-0.69
ory
-0.67
asons
-0.66
uter
-0.65
uther
-0.65
wake
-0.65
tty
-0.61
ctor
-0.61
POSITIVE LOGITS
nevertheless
1.03
nonetheless
0.95
rather
0.95
luckily
0.92
fortunately
0.92
suffice
0.87
alas
0.83
hey
0.81
rather
0.81
thankfully
0.79
Activations Density 0.130%