INDEX
Negative Logits
没äºĭ
-0.28
licted
-0.28
avings
-0.26
ewan
-0.26
lanc
-0.25
_Bool
-0.25
è¿IJæ°Ķ
-0.25
leting
-0.25
çͳãģĹ
-0.25
harmless
-0.24
POSITIVE LOGITS
inton
0.28
Courier
0.28
PE
0.27
bows
0.27
bow
0.26
.sep
0.24
fus
0.24
stop
0.24
override
0.24
bul
0.23
Activations Density 0.008%