INDEX
Explanations
references to people's names, specifically with the name "Anna"
mentions of the name "Anna."
New Auto-Interp
Negative Logits
rance
-0.71
inct
-0.70
aneously
-0.68
ours
-0.68
enegger
-0.68
inction
-0.67
ember
-0.66
ierrez
-0.64
staff
-0.63
ted
-0.63
POSITIVE LOGITS
Karen
0.98
Maria
0.95
Nicole
0.92
issance
0.83
Anna
0.81
Kendrick
0.81
Anth
0.81
ette
0.81
isle
0.80
Louise
0.80
Activations Density 0.025%