INDEX
Explanations
mentions of specific geographic locations and institutions associated with them
New Auto-Interp
Negative Logits
et
-0.32
um
-0.31
hip
-0.29
ets
-0.28
erver
-0.27
c
-0.26
h
-0.26
elf
-0.26
ox
-0.26
oft
-0.26
POSITIVE LOGITS
y
0.25
dale
0.18
bum
0.17
datal
0.16
linger
0.15
olutely
0.15
plorer
0.15
dsl
0.15
dag
0.15
dig
0.15
Activations Density 0.230%