INDEX
Explanations
occurrences of enumerative language and phrases indicating quantity or number
New Auto-Interp
Negative Logits
inde
-0.19
693
-0.17
norge
-0.16
èįĴ
-0.15
ç¨
-0.15
Wilde
-0.15
individuals
-0.14
ramework
-0.14
pais
-0.14
ssel
-0.14
POSITIVE LOGITS
اÙĪÙĬØ©
0.16
zik
0.16
ìĽĮíģ¬
0.15
ÎŁÎĶ
0.15
itest
0.15
ãĥģãĥ¼ãĥł
0.15
ÑĢезÑĥлÑĮÑĤ
0.15
tested
0.14
trial
0.14
Experiment
0.14
Activations Density 0.007%