INDEX
Explanations
explicit language or profanity
instances of the word "shit" in various contexts
New Auto-Interp
Negative Logits
HCR
-0.74
fman
-0.71
Expend
-0.71
arb
-0.67
Somers
-0.66
PsyNetMessage
-0.65
ervation
-0.58
NetMessage
-0.57
Austral
-0.56
hani
-0.56
POSITIVE LOGITS
bags
1.02
heads
0.98
pants
0.93
loads
0.92
detector
0.92
posts
0.92
storm
0.91
shots
0.90
lords
0.87
shit
0.85
Activations Density 0.040%