INDEX
Explanations
locations mentioned in the text
geographical locations and cities
New Auto-Interp
Negative Logits
zers
-0.74
odic
-0.71
thood
-0.70
agine
-0.68
sbm
-0.66
iosyncr
-0.65
afer
-0.64
cules
-0.63
omorph
-0.63
onym
-0.63
POSITIVE LOGITS
WA
1.02
VA
0.98
TX
0.97
TN
0.94
MA
0.89
Ontario
0.89
KS
0.88
CA
0.88
KY
0.88
FL
0.87
Activations Density 0.063%