INDEX
Explanations
the word "single" associated with a numerical value
New Auto-Interp
Negative Logits
akings
-0.91
apons
-0.84
Downloadha
-0.83
ooks
-0.81
raints
-0.75
ours
-0.75
olas
-0.74
UFF
-0.71
iquette
-0.69
acements
-0.69
POSITIVE LOGITS
handedly
1.18
digit
1.06
ton
1.03
person
0.99
piece
0.99
molecule
0.90
digits
0.90
minute
0.89
minded
0.88
batch
0.87
Activations Density 0.022%