INDEX
Explanations
mentions of the name "Biden", particularly focusing on those with higher activation values
mentions of the name "Biden."
New Auto-Interp
Negative Logits
Ö¼
-0.79
ILCS
-0.78
Reviewer
-0.77
ivities
-0.73
ãĤ®
-0.71
ortmund
-0.70
ateur
-0.68
ELF
-0.68
orically
-0.68
ivated
-0.67
POSITIVE LOGITS
Biden
1.26
zag
0.85
Caucus
0.80
mire
0.80
jug
0.75
appoint
0.71
aide
0.71
nominated
0.69
Bros
0.69
Jr
0.68
Activations Density 0.014%