INDEX
Explanations
references to music, films, and their titles
New Auto-Interp
Negative Logits
chron
-0.20
aud
-0.19
grav
-0.18
parent
-0.18
craft
-0.18
gender
-0.18
birth
-0.17
smart
-0.17
dish
-0.17
equal
-0.17
POSITIVE LOGITS
Murder
0.26
Poison
0.26
Ghost
0.26
Midnight
0.26
Dead
0.26
Broken
0.25
Lonely
0.24
Kiss
0.24
Dirty
0.24
Night
0.24
Activations Density 0.128%