INDEX
Explanations
proper names related to people
names of prominent individuals, particularly those associated with specific events or contexts
New Auto-Interp
Negative Logits
\/\/
-0.84
女
-0.83
dinand
-0.78
orpor
-0.72
astics
-0.70
reated
-0.70
reatment
-0.70
eenth
-0.69
ugal
-0.69
spr
-0.68
POSITIVE LOGITS
Conrad
0.95
aults
0.86
Cummings
0.69
Mai
0.69
Codex
0.68
Rogers
0.67
vier
0.66
ian
0.66
Akin
0.66
Orig
0.65
Activations Density 0.011%