INDEX
Explanations
profane language, specifically the word "shit"
occurrences of strong negative expressions
New Auto-Interp
Negative Logits
gary
-0.81
HCR
-0.80
================================
-0.74
PsyNetMessage
-0.73
fman
-0.70
AUT
-0.69
Expend
-0.67
NetMessage
-0.67
BIL
-0.65
Minor
-0.63
POSITIVE LOGITS
shit
0.99
heads
0.93
storm
0.93
lord
0.92
bags
0.92
wit
0.90
loads
0.90
faced
0.90
lords
0.88
shots
0.86
Activations Density 0.014%