INDEX
Explanations
proper nouns
names of characters and references to popular culture
New Auto-Interp
Negative Logits
itutional
-0.70
URA
-0.70
EMBER
-0.69
elig
-0.69
IENT
-0.69
tert
-0.68
sympathetic
-0.65
unde
-0.65
.''.
-0.65
latitude
-0.64
POSITIVE LOGITS
Soup
1.10
Girl
1.10
Pants
1.09
Pie
1.07
Mania
1.03
Alley
1.03
Balls
1.02
Bunny
1.02
pants
1.01
Juice
1.00
Activations Density 0.313%