INDEX
Explanations
dates or locations from news articles
references to "New" in various contexts and locations
New Auto-Interp
Negative Logits
stood
-0.76
ppe
-0.74
osate
-0.72
agn
-0.70
uddin
-0.68
mop
-0.68
abet
-0.65
ascript
-0.64
heit
-0.63
aminer
-0.63
POSITIVE LOGITS
YORK
1.29
foundland
1.15
NEW
1.03
PORT
1.02
CAST
0.89
Orleans
0.86
TERN
0.84
York
0.83
Zealand
0.82
ARK
0.82
Activations Density 0.005%