INDEX
Negative Logits
_war
-0.07
-badge
-0.06
čer
-0.06
об
-0.06
haired
-0.06
squirrel
-0.06
أي
-0.06
有
-0.06
Manit
-0.06
дві
-0.06
POSITIVE LOGITS
discussion
0.07
Counsel
0.07
策
0.06
scribe
0.06
dispro
0.06
ignore
0.06
condemning
0.06
(screen
0.06
idea
0.06
LGBTQ
0.06
Activations Density 0.002%