INDEX
Explanations
numerical values or sequences
New Auto-Interp
Negative Logits
fono
-0.17
leston
-0.16
umble
-0.15
edor
-0.15
raison
-0.15
Ñĩила
-0.14
afi
-0.14
illon
-0.14
okino
-0.14
iller
-0.14
POSITIVE LOGITS
acer
0.17
indeed
0.15
han
0.15
unlike
0.14
217
0.14
rob
0.14
osc
0.14
th
0.14
Abram
0.13
Indeed
0.13
Activations Density 0.072%