INDEX
Explanations
references to geographical locations and cities
New Auto-Interp
Negative Logits
lix
-0.17
oman
-0.16
/Gate
-0.15
nej
-0.14
andy
-0.14
sed
-0.14
nap
-0.14
akis
-0.14
yna
-0.14
751
-0.14
POSITIVE LOGITS
Perm
0.28
Astr
0.28
Perm
0.26
Od
0.24
Saint
0.23
Vor
0.22
Sm
0.22
Nov
0.22
Mos
0.21
Barn
0.20
Activations Density 0.079%