INDEX
Explanations
profanity and vulgar language
occurrences of the word "shit" in various contexts
New Auto-Interp
Negative Logits
Flavoring
-0.75
obser
-0.65
PsyNetMessage
-0.65
NetMessage
-0.62
Ctrl
-0.62
================================
-0.59
tnc
-0.58
investig
-0.57
fman
-0.56
Blanc
-0.56
POSITIVE LOGITS
loads
1.34
storm
1.24
bags
1.19
holes
1.18
lords
1.17
faced
1.16
load
1.15
hole
1.13
bag
1.11
lord
1.08
Activations Density 0.091%