INDEX
Explanations
personal names
proper names, particularly those related to individuals and their affiliations
New Auto-Interp
Negative Logits
eter
-0.93
arian
-0.88
ition
-0.83
axter
-0.82
iaries
-0.81
entimes
-0.78
ary
-0.78
erry
-0.77
eln
-0.76
VB
-0.76
POSITIVE LOGITS
xia
0.75
cipline
0.74
ãĤ¼ãĤ¦ãĤ¹
0.74
=-=-
0.72
��������
0.71
@@@@
0.70
ãĥīãĥ©
0.70
ãĥĨ
0.70
sshd
0.70
66666666
0.69
Activations Density 0.031%