INDEX
Explanations
numbers written as words
patterns related to numerical values
New Auto-Interp
Negative Logits
loo
-0.80
REAM
-0.73
WARD
-0.70
ffee
-0.70
hips
-0.70
Accessory
-0.66
hire
-0.64
Denis
-0.63
OHN
-0.63
RAFT
-0.63
POSITIVE LOGITS
eral
0.98
emonic
0.94
num
0.91
pty
0.88
quist
0.87
Num
0.87
phys
0.84
BER
0.77
num
0.75
iatures
0.74
Activations Density 0.023%