INDEX
Negative Logits
gili
-0.17
riad
-0.17
ceb
-0.15
affen
-0.15
lij
-0.14
zig
-0.14
onom
-0.14
Roose
-0.14
STALL
-0.14
eldon
-0.14
POSITIVE LOGITS
ing
0.17
Shr
0.15
Grace
0.15
discrimin
0.15
umont
0.14
grace
0.14
olarity
0.14
Grace
0.14
discriminate
0.13
_defs
0.13
Activations Density 0.008%