INDEX
Explanations
references to countries and their contributions or roles in various contexts
New Auto-Interp
Negative Logits
ÃŃda
-0.17
orem
-0.15
avel
-0.15
fte
-0.14
olar
-0.14
avan
-0.14
atron
-0.14
Antar
-0.13
hrd
-0.13
ãĥ¼ãĥĪ
-0.13
POSITIVE LOGITS
backgrounds
0.26
whom
0.23
across
0.21
throughout
0.20
background
0.19
around
0.19
Background
0.17
Background
0.17
different
0.17
diverse
0.16
Activations Density 0.055%