INDEX
Explanations
phrases related to confrontations or arguments
expressions of dark humor or sarcasm
New Auto-Interp
Negative Logits
²¾
-0.75
hement
-0.73
Orig
-0.70
assessments
-0.68
ij士
-0.68
ĨĴ
-0.68
erve
-0.67
isans
-0.67
Scope
-0.67
Asset
-0.67
POSITIVE LOGITS
masturb
1.27
kissed
1.19
dunk
1.18
kiss
1.13
pee
1.13
shave
1.13
wear
1.09
fart
1.08
poop
1.07
lick
1.06
Activations Density 0.609%