INDEX
Explanations
mentions of individuals and their professional roles or affiliations
New Auto-Interp
Negative Logits
еÑĩ
-0.15
aylor
-0.15
(es
-0.14
iyan
-0.13
.tap
-0.13
Pole
-0.13
Ens
-0.13
_ED
-0.13
å¹³æĪIJ
-0.13
ìĦŃ
-0.13
POSITIVE LOGITS
spokesman
0.28
spokeswoman
0.22
spokesperson
0.20
who
0.20
professor
0.19
spokes
0.19
research
0.17
director
0.16
speaking
0.16
author
0.15
Activations Density 0.124%