INDEX
Explanations
instances of numeric values, potentially related to data or configurations
New Auto-Interp
Negative Logits
atham
-0.17
apped
-0.16
راÙĨ
-0.15
vala
-0.14
605
-0.14
ngo
-0.14
pedo
-0.14
acks
-0.13
Lever
-0.13
Assertions
-0.13
POSITIVE LOGITS
fisse
0.17
menin
0.16
mani
0.15
zl
0.14
ily
0.14
rek
0.14
kers
0.14
lash
0.14
jab
0.14
uzzi
0.14
Activations Density 0.013%