INDEX
Explanations
inappropriate language or offensive terms
intense emotional expressions, particularly those involving strong profanity
New Auto-Interp
Negative Logits
ère
-0.76
Msg
-0.76
Folder
-0.74
anu
-0.73
BIL
-0.72
laus
-0.71
lication
-0.70
NetMessage
-0.70
inel
-0.69
ilus
-0.69
POSITIVE LOGITS
kidding
0.95
fucking
0.89
fuck
0.82
asshole
0.81
bast
0.81
goddamn
0.80
shit
0.78
idiots
0.75
dick
0.75
fuckin
0.75
Activations Density 0.027%