INDEX
Explanations
proper nouns and names, particularly those related to individuals and organizations
New Auto-Interp
Negative Logits
ed
-0.17
eper
-0.16
اÙħØ©
-0.14
ãģļ
-0.13
mie
-0.13
seller
-0.13
mium
-0.13
itos
-0.13
Tong
-0.13
s
-0.13
POSITIVE LOGITS
ÅĽmy
0.20
yas
0.17
yat
0.17
pter
0.16
apolis
0.16
ement
0.15
ach
0.15
rices
0.15
ylland
0.15
ÅĽcie
0.15
Activations Density 0.053%