INDEX
Explanations
references to locations, specifically the mention of "New York City"
New Auto-Interp
Negative Logits
rar
-0.80
terness
-0.78
hement
-0.75
riet
-0.75
icum
-0.73
iru
-0.72
nir
-0.71
scrut
-0.71
igham
-0.70
phabet
-0.70
POSITIVE LOGITS
subway
1.09
borough
1.00
FC
0.97
skyline
0.97
scape
0.95
landmarks
0.93
neighborhoods
0.92
Mayor
0.92
skysc
0.91
streets
0.91
Activations Density 0.049%