INDEX
Explanations
the name "Joe" followed by a high activation value
mentions of the name "Joe."
New Auto-Interp
Negative Logits
glim
-0.89
mble
-0.87
igators
-0.84
hips
-0.82
NESS
-0.79
igator
-0.75
raints
-0.74
igated
-0.74
chwitz
-0.71
rights
-0.71
POSITIVE LOGITS
Biden
0.94
Arpaio
0.88
Rog
0.82
zzi
0.82
ppo
0.82
xtap
0.78
Danger
0.76
pport
0.76
Walsh
0.74
Scarborough
0.73
Activations Density 0.013%