INDEX
Explanations
references to the number "two" or pairs of items
instances of the word "two."
New Auto-Interp
Negative Logits
atown
-0.84
renheit
-0.79
Ô
-0.77
ovi
-0.75
Tradable
-0.72
untu
-0.72
fw
-0.72
yz
-0.71
ugu
-0.69
ondon
-0.68
POSITIVE LOGITS
halves
1.18
sexes
0.93
fold
0.93
sides
0.91
aforementioned
0.86
Kore
0.86
thirds
0.82
main
0.80
largest
0.79
finalists
0.79
Activations Density 0.063%