INDEX
Explanations
numbers represented in digit format
references to numerical values, particularly digits and their counts
New Auto-Interp
Negative Logits
hire
-0.90
ModLoader
-0.79
roxy
-0.78
nd
-0.77
ouf
-0.74
ritz
-0.72
dep
-0.71
lain
-0.71
Study
-0.70
rael
-0.68
POSITIVE LOGITS
eteen
0.87
omial
0.86
ized
0.86
igr
0.85
itial
0.84
oded
0.83
digits
0.83
ised
0.83
digit
0.82
umeric
0.79
Activations Density 0.022%