INDEX
Explanations
quantifiers and explicit quantities
New Auto-Interp
Negative Logits
ος
-1.08
}}^{(-0.99
これが
-0.95
mbal
-0.94
blis
-0.94
Substanz
-0.93
Ganzen
-0.93
⎛
-0.91
さん
-0.91
鹇
-0.90
POSITIVE LOGITS
both
1.13
Both
1.06
各有
1.04
都是在
1.02
copious
0.99
both
0.97
BOTH
0.97
EVERY
0.97
считается
0.97
TWO
0.92
Activations Density 0.010%