INDEX
Explanations
qualifying words used during mathematical reasoning.
New Auto-Interp
Negative Logits
iyorum
-0.07
isko
-0.06
ıyorum
-0.06
ãĥ«ãĤ¯
-0.06
ÏĢÏģÎŃÏĢει
-0.06
rire
-0.06
.hasMore
-0.06
دارÙħ
-0.06
ä¸įåı¯
-0.05
(can
-0.05
POSITIVE LOGITS
would
0.45
would
0.38
Would
0.36
Would
0.35
wouldn
0.29
zou
0.24
skulle
0.24
würde
0.23
serait
0.23
Wouldn
0.22
Activations Density 0.356%