INDEX
Explanations
names of famous individuals
references to notable directors, films, and cultural icons
New Auto-Interp
Negative Logits
shown
-0.72
leased
-0.70
large
-0.70
profits
-0.69
events
-0.66
olars
-0.65
LIMITED
-0.65
ESCO
-0.64
aughters
-0.64
anwhile
-0.64
POSITIVE LOGITS
esque
1.56
mentality
1.15
vibe
1.14
style
1.07
ian
1.06
kind
1.02
Syndrome
0.98
type
0.96
bullshit
0.96
proportions
0.96
Activations Density 0.381%