INDEX
Negative Logits
ω
0.39
BU
0.39
Curtis
0.37
arko
0.37
czeniu
0.37
ап
0.36
td
0.36
apache
0.36
Rad
0.36
lande
0.36
POSITIVE LOGITS
emoticon
0.51
someone
0.48
Emoji
0.48
emotions
0.47
emojis
0.46
elements
0.45
mouth
0.44
mouths
0.44
emoji
0.43
tampon
0.43
Activations Density 0.003%