INDEX
Negative Logits
urai
-0.16
hetto
-0.16
igham
-0.15
dda
-0.14
بÙĪØ§Ø¨Ø©
-0.14
Illegal
-0.14
ouver
-0.14
rana
-0.13
cent
-0.13
Canter
-0.13
POSITIVE LOGITS
://
0.23
-equiv
0.15
ез
0.14
opleft
0.14
ahat
0.14
prising
0.14
.MODE
0.14
irsch
0.14
iping
0.14
otland
0.14
Activations Density 0.031%