INDEX
Explanations
punctuation marks and quotes
New Auto-Interp
Negative Logits
alem
-0.16
.infinity
-0.14
.assertIs
-0.14
ży
-0.14
ield
-0.14
ismet
-0.14
ÅŁt
-0.13
zk
-0.13
ashi
-0.13
xfd
-0.13
POSITIVE LOGITS
út
0.18
587
0.16
673
0.15
583
0.15
egin
0.15
577
0.15
fitte
0.15
rest
0.14
690
0.14
çģ
0.14
Activations Density 0.002%