INDEX
Negative Logits
abo
-0.18
anou
-0.17
amarin
-0.14
ãģŃ
-0.14
ding
-0.14
hir
-0.14
istribute
-0.14
Ø·ÙĬ
-0.14
ноз
-0.14
sko
-0.14
POSITIVE LOGITS
comb
0.37
bee
0.29
uckle
0.27
comb
0.23
bee
0.23
bees
0.22
ed
0.21
Comb
0.21
trap
0.20
com
0.20
Activations Density 0.005%