INDEX
Explanations
references to regulatory and policy-related concepts
New Auto-Interp
Negative Logits
é¾Ħ
-0.15
anders
-0.15
rox
-0.14
apyrus
-0.14
isky
-0.14
idine
-0.13
ascus
-0.13
ziy
-0.13
بÙĦغ
-0.12
Obrázky
-0.12
POSITIVE LOGITS
by
0.95
oleh
0.66
By
0.62
_by
0.56
تÙĪØ³Ø·
0.56
by
0.53
By
0.52
.by
0.48
bợi
0.48
-by
0.44
Activations Density 0.316%