INDEX
Explanations
references to government and political figures
New Auto-Interp
Negative Logits
ROTO
-0.09
atem
-0.08
eldom
-0.08
eldorf
-0.08
bbe
-0.07
.vaadin
-0.07
grily
-0.07
radan
-0.07
ikal
-0.07
redd
-0.07
POSITIVE LOGITS
exp
0.06
federal
0.05
Fluent
0.05
NAS
0.05
demon
0.05
predicate
0.05
stripe
0.05
Liz
0.05
åºŃ
0.05
iyel
0.05
Activations Density 0.092%