INDEX
Explanations
the word "two" occurring in different contexts
instances of the word "two" in various contexts
New Auto-Interp
Negative Logits
ugu
-0.86
uga
-0.79
Ô
-0.78
ategory
-0.77
schild
-0.76
renheit
-0.76
urgy
-0.75
unta
-0.74
ourke
-0.73
lation
-0.73
POSITIVE LOGITS
halves
1.51
sides
1.40
sexes
1.22
thirds
1.14
parties
1.08
fold
1.05
Kore
1.00
finalists
0.98
extremes
0.98
gentlemen
0.95
Activations Density 0.047%