INDEX
Explanations
proper nouns, particularly those related to political figures and events
specific political figures or titles associated with events
New Auto-Interp
Negative Logits
Philadelphia
-0.76
crew
-0.67
chem
-0.65
Seattle
-0.65
lectic
-0.65
Midwest
-0.64
harmonic
-0.64
webs
-0.64
CRE
-0.63
Denver
-0.63
POSITIVE LOGITS
jri
1.30
oglu
1.12
ÄŁ
1.12
oÄŁ
1.10
Sr
0.95
presided
0.95
Vaj
0.94
resigned
0.92
ovich
0.92
llah
0.90
Activations Density 0.238%