INDEX
Negative Logits
Christopher
-0.08
valign
-0.08
zwa
-0.07
chrono
-0.07
확보
-0.07
�ราะห์
-0.07
Lagu
-0.07
busc
-0.07
("//*[@-0.07
ROR
-0.07
POSITIVE LOGITS
profanity
0.17
misconduct
0.14
offend
0.14
obscene
0.13
offending
0.13
derog
0.12
vulgar
0.12
extremist
0.12
hateful
0.12
taboo
0.12
Activations Density 0.012%