INDEX
Negative Logits
ãģķãģĦ
-0.16
div
-0.15
arak
-0.15
otas
-0.14
arkin
-0.14
Guth
-0.14
ottle
-0.14
妮
-0.14
holm
-0.13
orting
-0.13
POSITIVE LOGITS
itto
0.16
krit
0.15
ONGO
0.15
oku
0.15
beck
0.14
orp
0.14
usal
0.14
avery
0.14
ghi
0.14
saints
0.13
Activations Density 0.003%