INDEX
Explanations
proper nouns
mentions of the name "Joan."
New Auto-Interp
Negative Logits
nut
-0.73
rals
-0.68
phrine
-0.68
ï¸ı
-0.66
rave
-0.65
ork
-0.63
NESS
-0.63
regulated
-0.63
mented
-0.62
ropolitan
-0.62
POSITIVE LOGITS
Joan
0.80
naire
0.75
Anne
0.74
Rei
0.72
Crawford
0.72
nee
0.72
ja
0.72
Doe
0.72
istic
0.70
Hamm
0.70
Activations Density 0.030%