INDEX
Explanations
numerical values and data-related figures
New Auto-Interp
Negative Logits
رÙĪØª
-0.14
رش
-0.13
§
-0.13
-handed
-0.13
æĶ¾éĢģ
-0.13
understanding
-0.13
unny
-0.13
333
-0.13
gorith
-0.13
egment
-0.12
POSITIVE LOGITS
uese
0.17
mojom
0.16
uestion
0.15
æ¤
0.14
Fet
0.14
edu
0.14
кÑģ
0.14
ίνη
0.14
анов
0.13
каÑģ
0.13
Activations Density 0.045%