INDEX
Explanations
numbers, particularly years or dates
New Auto-Interp
Negative Logits
aina
-0.17
ritt
-0.15
eru
-0.15
lez
-0.14
atur
-0.14
amenti
-0.14
auction
-0.14
ano
-0.14
ament
-0.14
oru
-0.13
POSITIVE LOGITS
Burn
0.16
zeros
0.15
sta
0.15
ibble
0.14
loys
0.14
ross
0.14
loe
0.14
Char
0.14
nings
0.14
anik
0.14
Activations Density 0.019%