INDEX
Explanations
names related to individuals and their affiliations
New Auto-Interp
Negative Logits
lington
-0.17
elsing
-0.17
ols
-0.15
ries
-0.15
riend
-0.15
adol
-0.15
etas
-0.14
le
-0.14
WWW
-0.14
yleft
-0.14
POSITIVE LOGITS
ibu
0.16
ongo
0.16
akis
0.16
ib
0.15
ONGO
0.15
ong
0.15
926
0.14
pret
0.14
Saud
0.14
opez
0.14
Activations Density 0.044%