INDEX
Explanations
variations of the word "profanity"
New Auto-Interp
Negative Logits
edes
-0.17
arch
-0.16
ENS
-0.16
dbl
-0.16
cust
-0.15
주ëĬĶ
-0.15
ero
-0.15
oggles
-0.15
unched
-0.14
Dynamo
-0.14
POSITIVE LOGITS
anity
0.28
prof
0.27
ane
0.20
essed
0.19
essional
0.19
ANE
0.18
essions
0.18
aned
0.18
PROF
0.17
umo
0.17
Activations Density 0.007%