INDEX
Explanations
terms related to censorship and control, especially in the context of offensive language
words and phrases related to assistance or prevention
discussions related to the prevention of harmful or offensive actions and terms
New Auto-Interp
Negative Logits
uploads
-0.50
largeDownload
-0.49
Morty
-0.49
pse
-0.46
Hide
-0.42
nutshell
-0.40
âĶľ
-0.40
pmwiki
-0.40
Brewers
-0.40
Fine
-0.40
POSITIVE LOGITS
livion
0.50
footing
0.50
izont
0.45
omever
0.44
$.
0.44
burse
0.44
someday
0.43
ictionary
0.43
ãģ¾
0.43
poke
0.43
Activations Density 6.560%