INDEX
Explanations
occurrences of the word "two" and related quantifiers
New Auto-Interp
Negative Logits
itſelf
-1.09
houſe
-1.07
fubject
-1.04
Majefty
-0.98
Grecs
-0.98
Jefus
-0.98
ſche
-0.95
himſelf
-0.95
Eſ
-0.93
ſtate
-0.93
POSITIVE LOGITS
two
1.24
Two
1.02
TWO
0.91
TWO
0.90
Two
0.87
deux
0.86
two
0.86
zwei
0.81
两
0.81
två
0.80
Activations Density 0.234%