INDEX
Explanations
references to the city of London
New Auto-Interp
Negative Logits
adu
-0.16
fas
-0.15
agon
-0.15
lessly
-0.15
219
-0.15
tors
-0.15
tie
-0.14
220
-0.14
hardt
-0.14
ez
-0.14
POSITIVE LOGITS
ers
0.28
Bridge
0.24
er
0.24
Underground
0.23
istan
0.20
WC
0.20
Borough
0.19
WC
0.19
inium
0.19
ERS
0.19
Activations Density 0.013%