INDEX
Explanations
content with explicit or graphic content
references to social and cultural commentary, often with informal and humorous undertones
New Auto-Interp
Negative Logits
isSpecialOrderable
-0.73
asury
-0.69
Reward
-0.67
safegu
-0.66
Import
-0.66
åĬ
-0.66
qualitative
-0.65
Clear
-0.65
¿½
-0.65
Regulatory
-0.63
POSITIVE LOGITS
lol
1.41
haha
1.39
;)
1.31
LOL
1.31
?!
1.30
!!!!
1.27
???
1.23
!!!!!
1.23
!!!
1.22
shit
1.22
Activations Density 0.618%