INDEX
Explanations
numerical values and quantifiers
New Auto-Interp
Negative Logits
avez
-0.15
rog
-0.14
ossip
-0.14
ven
-0.14
arger
-0.14
waves
-0.14
Waves
-0.14
rein
-0.14
finish
-0.14
lox
-0.13
POSITIVE LOGITS
ural
0.16
pyx
0.15
zin
0.15
_BEGIN
0.15
اÙĪØ±ÛĮ
0.14
Assigned
0.14
gates
0.14
ivec
0.13
ساÙĨÛĮ
0.13
oly
0.13
Activations Density 0.003%