INDEX
Explanations
strong emotional language, particularly profanity
intense vulgar language and expressions of frustration or anger
New Auto-Interp
Negative Logits
BIL
-0.84
opian
-0.79
knit
-0.75
Msg
-0.74
oppers
-0.73
anwhile
-0.71
ère
-0.71
Folder
-0.70
laus
-0.68
endant
-0.68
POSITIVE LOGITS
kidding
0.89
bastard
0.78
thing
0.78
fucking
0.78
hell
0.73
idiot
0.73
THING
0.72
damn
0.70
prick
0.70
mess
0.70
Activations Density 0.037%