INDEX
Explanations
specific characters or symbols that may indicate formatting or coding elements within the text
New Auto-Interp
Negative Logits
"`
-0.16
"
-0.15
adversely
-0.15
hubby
-0.14
Totally
-0.14
.connector
-0.13
totally
-0.13
":-
-0.13
Oops
-0.13
SF
-0.13
POSITIVE LOGITS
fucking
0.33
fucked
0.29
fuck
0.27
Fucking
0.27
fucks
0.25
FUCK
0.25
Fuck
0.24
fuck
0.24
cunt
0.23
–↵
0.22
Activations Density 0.004%