INDEX
Explanations
strong curse words
strong profanity and expressions of frustration
New Auto-Interp
Negative Logits
Edit
-0.86
BIL
-0.86
inel
-0.82
ãĤ¢ãĥ«
-0.82
knit
-0.81
irtual
-0.81
behavi
-0.75
Flavoring
-0.75
opian
-0.74
Msg
-0.74
POSITIVE LOGITS
kidding
0.97
hell
0.89
bastard
0.89
idiot
0.87
retard
0.84
stink
0.81
asshole
0.80
shit
0.80
thing
0.79
idiots
0.79
Activations Density 0.054%