INDEX
Explanations
profanity and vulgar expressions
New Auto-Interp
Negative Logits
lähe
-0.68
كومونز
-0.67
Adagio
-0.66
समीक्षक
-0.66
lewati
-0.66
Bisch
-0.64
تفصیلات
-0.64
awtextra
-0.64
Gale
-0.63
,–
-0.62
POSITIVE LOGITS
fuck
1.10
Fuck
1.09
fuck
1.07
fucking
1.06
FUCK
1.04
Fuck
1.03
damn
1.01
shit
1.00
Fucking
0.99
goddamn
0.98
Activations Density 0.142%