INDEX
Explanations
profane language and swear words
instances of strong profanity
New Auto-Interp
Negative Logits
inel
-0.78
eus
-0.76
anu
-0.73
Msg
-0.73
ère
-0.73
Var
-0.71
Mesh
-0.71
BIL
-0.71
idon
-0.70
NetMessage
-0.69
POSITIVE LOGITS
fucking
1.00
kidding
0.92
fuck
0.84
goddamn
0.84
shit
0.84
fuckin
0.82
FUCK
0.81
prick
0.81
piss
0.78
asshole
0.78
Activations Density 0.016%