INDEX
Explanations
names of people and places
references to food and dining experiences
New Auto-Interp
Negative Logits
anecd
-0.57
encount
-0.57
Ambro
-0.56
hemor
-0.54
phasis
-0.54
eatures
-0.53
pestic
-0.53
Strait
-0.50
Hilbert
-0.50
Azerb
-0.49
POSITIVE LOGITS
anymore
0.54
?!
0.52
tattoo
0.52
selfies
0.50
raping
0.49
Joker
0.49
quit
0.49
tattoos
0.49
porn
0.49
issors
0.48
Activations Density 1.466%