INDEX
Explanations
proper nouns related to people's names and roles
New Auto-Interp
Negative Logits
-0.08
tober
-0.08
illow
-0.07
otropic
-0.07
linger
-0.07
ëģĶ
-0.06
-fw
-0.06
uation
-0.06
etary
-0.06
yd
-0.06
POSITIVE LOGITS
à¹Īวม
0.08
/or
0.08
stown
0.07
izes
0.07
Alexand
0.07
ÑģÑĮ
0.07
/as
0.07
urm
0.07
mente
0.07
prites
0.07
Activations Density 0.120%