INDEX
Explanations
references to double-digit numerical values
New Auto-Interp
Negative Logits
ouf
-0.80
hire
-0.78
dep
-0.74
nd
-0.72
erker
-0.72
enes
-0.72
rael
-0.71
low
-0.71
rup
-0.71
lain
-0.70
POSITIVE LOGITS
ized
0.92
omial
0.91
ised
0.90
itialized
0.88
ally
0.86
digits
0.83
igr
0.83
ographic
0.82
umeric
0.82
ization
0.82
Activations Density 0.031%