INDEX
Explanations
references to the number two and its variations in different contexts
New Auto-Interp
Negative Logits
of
-0.17
among
-0.17
fewer
-0.17
inding
-0.15
both
-0.15
Laughs
-0.14
among
-0.14
1
-0.14
amongst
-0.14
ena
-0.13
POSITIVE LOGITS
remaining
0.26
remaining
0.23
aforementioned
0.22
aviest
0.21
newest
0.21
latest
0.21
latest
0.20
Remaining
0.18
fold
0.18
Remaining
0.18
Activations Density 0.123%