INDEX
Explanations
proper nouns, particularly names of individuals and organizations
New Auto-Interp
Negative Logits
.intellij
-0.16
ors
-0.15
æĹ
-0.15
653
-0.15
UPLE
-0.14
оÑģп
-0.14
dut
-0.14
eland
-0.14
ahat
-0.14
ito
-0.14
POSITIVE LOGITS
gloss
0.17
eneg
0.15
-Cs
0.15
swick
0.14
Bri
0.14
isci
0.14
warz
0.14
/terms
0.14
closely
0.14
ynes
0.14
Activations Density 0.046%