INDEX
Explanations
references to halls of fame
New Auto-Interp
Negative Logits
gel
-0.16
elman
-0.15
öl
-0.15
INC
-0.15
nt
-0.14
geh
-0.14
-feed
-0.14
ington
-0.14
ocrat
-0.14
upil
-0.14
POSITIVE LOGITS
Fame
0.39
fame
0.33
fam
0.21
shame
0.19
indu
0.18
induction
0.18
Fam
0.18
Shame
0.17
duct
0.16
legg
0.16
Activations Density 0.006%