INDEX
Explanations
sentences with the word "two" or phrases indicating dual aspects/topics/parts
sentences that mention quantities or lists of items
New Auto-Interp
Negative Logits
wonders
-0.66
billions
-0.58
enance
-0.55
oro
-0.55
rup
-0.54
letes
-0.54
earch
-0.54
unning
-0.53
millions
-0.53
uble
-0.52
POSITIVE LOGITS
Firstly
1.32
Both
1.25
One
1.19
one
1.19
first
1.09
Firstly
1.08
one
1.08
Both
1.07
One
1.01
Neither
1.00
Activations Density 0.285%