INDEX
Negative Logits
avage
-0.17
zin
-0.16
aines
-0.15
ardu
-0.14
ired
-0.14
orra
-0.14
hausen
-0.14
wend
-0.14
align
-0.14
cre
-0.14
POSITIVE LOGITS
anka
0.19
blem
0.17
ests
0.16
prav
0.16
286
0.16
ù
0.15
Pri
0.15
iado
0.15
ileged
0.15
incip
0.15
Activations Density 0.010%