INDEX
Explanations
references to occurrences in cities, particularly related to atrocities or significant events
New Auto-Interp
Negative Logits
Son
-0.17
uld
-0.16
htags
-0.16
odont
-0.15
Williamson
-0.15
afen
-0.15
umbs
-0.15
Ģ
-0.14
unga
-0.14
Pod
-0.14
POSITIVE LOGITS
itos
0.16
ornado
0.15
ripp
0.15
cracking
0.15
éĴŁ
0.15
.de
0.15
ephir
0.15
èªł
0.14
-Mart
0.14
ooter
0.14
Activations Density 0.059%