INDEX
Negative Logits
s
-0.91
i
-0.80
hates
-0.78
surla
-0.77
hate
-0.76
ship
-0.76
NSCoder
-0.76
ه
-0.75
Hate
-0.74
e
-0.74
POSITIVE LOGITS
bige
0.42
RunAsync
0.40
bewerken
0.37
wikimedia
0.37
wildcard
0.36
cuma
0.36
Thunk
0.35
abras
0.35
ucca
0.35
Cuz
0.35
Activations Density 0.083%