INDEX
Explanations
references to notable individuals and their contributions in film and literature
New Auto-Interp
Negative Logits
constrained
-0.18
famously
-0.17
messaging
-0.17
mojo
-0.17
tasked
-0.17
marginalized
-0.17
gender
-0.16
genders
-0.16
gender
-0.16
backstory
-0.16
POSITIVE LOGITS
Negro
0.21
gadget
0.19
gimm
0.17
showdown
0.17
licked
0.17
waterfront
0.16
setups
0.16
cigaret
0.16
gadgets
0.15
breakdown
0.15
Activations Density 0.710%