INDEX
Explanations
profane and offensive language
words related to bodily actions and expressions, often with a humorous or vulgar tone
New Auto-Interp
Negative Logits
eting
-0.69
eters
-0.64
subcontract
-0.63
ipers
-0.62
eder
-0.59
Guards
-0.59
ļéĨĴ
-0.58
rence
-0.57
atu
-0.57
ration
-0.56
POSITIVE LOGITS
vana
0.82
lucky
0.71
bit
0.69
licks
0.68
darn
0.67
THING
0.66
messy
0.66
unlucky
0.64
anked
0.64
luck
0.63
Activations Density 0.316%