INDEX
Explanations
mentions of the name "Joe" followed by a numerical activation value
the repeated mention of the name "Joe."
New Auto-Interp
Negative Logits
NESS
-0.91
hips
-0.78
rawdownloadcloneembedreportprint
-0.77
ties
-0.70
ample
-0.68
ancy
-0.67
ioned
-0.65
imental
-0.65
peed
-0.64
seeing
-0.63
POSITIVE LOGITS
Biden
1.10
Arpaio
1.07
Rog
0.91
Russo
0.88
Pes
0.87
Scarborough
0.87
Camel
0.85
Gibbs
0.84
Rao
0.82
ppo
0.80
Activations Density 0.033%