INDEX
Explanations
phrases related to prominent people or events
proper nouns and specific terms related to entities or events
New Auto-Interp
Negative Logits
princ
-0.78
decl
-0.72
subp
-0.71
Princ
-0.70
McGu
-0.69
mos
-0.68
Chap
-0.68
apr
-0.68
Uni
-0.67
hemor
-0.67
POSITIVE LOGITS
ing
1.57
ings
1.28
ed
1.27
ING
1.09
edIn
1.07
edly
1.05
ington
1.01
ment
1.00
ization
0.98
ership
0.98
Activations Density 0.165%