INDEX
Negative Logits
freely
-0.30
å¦ĸ
-0.27
èħĶ
-0.27
ç«ĭä½ĵ
-0.26
exit
-0.26
pencil
-0.26
stereotype
-0.25
stere
-0.25
teil
-0.25
æľºç»Ħ
-0.25
POSITIVE LOGITS
æİĸ
0.27
Himself
0.24
åĽŀèIJ½
0.23
PROGMEM
0.23
Guards
0.23
mary
0.23
dõi
0.23
æıIJä¾ĽåķĨ
0.23
ressed
0.23
åζéĢłåķĨ
0.23
Activations Density 0.005%