INDEX
Explanations
content that violates community guidelines or contains harmful behavior
New Auto-Interp
Negative Logits
TexParameter
-0.14
efon
-0.14
ìĭ¤í
-0.14
igos
-0.14
RedirectTo
-0.14
stery
-0.14
WithEmail
-0.14
iosper
-0.14
flix
-0.13
HEMA
-0.13
POSITIVE LOGITS
offensive
0.26
inflammatory
0.22
def
0.22
objection
0.22
hate
0.22
lib
0.21
copyrighted
0.21
obj
0.20
Offensive
0.20
violent
0.19
Activations Density 0.092%