INDEX
Explanations
the name "Pence" at various activation levels
references to political figures, specifically Mike Pence and Tim Kaine
New Auto-Interp
Negative Logits
ivities
-0.69
actic
-0.69
odon
-0.68
opic
-0.67
variable
-0.65
izations
-0.63
Flavoring
-0.63
ivation
-0.63
liner
-0.63
lined
-0.61
POSITIVE LOGITS
Pence
1.06
mire
0.96
OTUS
0.89
cair
0.78
eln
0.77
lette
0.77
versa
0.75
etz
0.75
pard
0.75
tti
0.73
Activations Density 0.009%