INDEX
Explanations
proper nouns, particularly names and organizations
New Auto-Interp
Negative Logits
æł·çļĦ
-0.20
pawn
-0.17
-ÑĤо
-0.15
erm
-0.15
er
-0.14
lluminate
-0.14
agini
-0.14
lings
-0.14
xuyên
-0.14
lot
-0.14
POSITIVE LOGITS
ors
0.20
ively
0.17
ters
0.17
uers
0.16
ussen
0.15
mere
0.15
vast
0.14
ough
0.14
ìĦł
0.14
ãĥ«ãĥī
0.14
Activations Density 0.771%