INDEX
Explanations
proper nouns related to individuals or entities
proper nouns, specifically names of people and organizations
New Auto-Interp
Negative Logits
iden
-0.81
icum
-0.75
alia
-0.71
icro
-0.68
opsis
-0.68
esthes
-0.68
alysis
-0.67
uning
-0.66
chers
-0.66
akedown
-0.66
POSITIVE LOGITS
vernment
1.04
glers
0.89
ORGE
0.83
Pengu
0.81
irlfriend
0.80
CHQ
0.74
iants
0.72
raphic
0.72
finger
0.71
stones
0.70
Activations Density 0.074%