INDEX
Explanations
proper nouns, particularly names of individuals and organizations
New Auto-Interp
Negative Logits
lingen
-0.16
lymp
-0.16
stown
-0.15
ród
-0.15
adelphia
-0.15
atown
-0.14
ervised
-0.14
governing
-0.14
orama
-0.14
ợ
-0.14
POSITIVE LOGITS
iaux
0.18
CONTRIBUT
0.18
aint
0.17
ides
0.17
III
0.15
essian
0.15
ä¸ī级
0.15
avage
0.15
ìϏ
0.15
.gmail
0.15
Activations Density 0.308%