INDEX
Explanations
references to specific individuals and their familial connections
Last names starting with prefixes
people's names
New Auto-Interp
Negative Logits
Diſ
-0.95
Perſ
-0.91
Reſ
-0.88
Houſe
-0.88
Conſ
-0.86
Anſ
-0.84
Inſ
-0.81
abſ
-0.81
Beſ
-0.78
uſ
-0.77
POSITIVE LOGITS
✨:
0.73
Smith
0.73
De
0.70
Johnson
0.67
Mc
0.66
White
0.66
Green
0.65
O
0.65
Le
0.64
Brown
0.63
Activations Density 0.781%