INDEX
Explanations
places and names related to political events or international disputes
New Auto-Interp
Negative Logits
imore
-0.77
aroo
-0.76
ruary
-0.74
quished
-0.69
>>\
-0.65
Pigs
-0.63
birth
-0.61
rules
-0.61
Engels
-0.59
yrim
-0.58
POSITIVE LOGITS
ãĤ£
0.69
ĸļ
0.64
ecast
0.61
inki
0.60
Marie
0.59
Cth
0.58
ciating
0.56
ilo
0.56
ibur
0.56
uku
0.56
Activations Density 2.010%