INDEX
Explanations
geographical locations and their descriptions
New Auto-Interp
Negative Logits
capitals
-0.17
holm
-0.16
cities
-0.15
Bever
-0.15
Manila
-0.14
UBLE
-0.14
Cities
-0.14
angkan
-0.14
amburger
-0.13
andum
-0.13
POSITIVE LOGITS
Oregon
0.18
Illinois
0.17
Maryland
0.16
Pennsylvania
0.16
Ohio
0.16
Kentucky
0.15
LETE
0.15
Michigan
0.15
Missouri
0.15
SE
0.15
Activations Density 0.133%