INDEX
Explanations
references to geographic or administrative regions, particularly ones labeled as "central."
New Auto-Interp
Negative Logits
ecs
-0.17
RIORITY
-0.16
full
-0.15
off
-0.15
kı
-0.15
aroo
-0.14
igkeit
-0.14
edom
-0.14
hood
-0.14
jar
-0.14
POSITIVE LOGITS
most
0.24
ised
0.21
ized
0.20
ization
0.19
amo
0.19
ities
0.18
cott
0.18
izing
0.17
izes
0.17
-most
0.16
Activations Density 0.024%