INDEX
Explanations
references to "nation" or its variations, signaling a focus on national topics or issues
New Auto-Interp
Negative Logits
orie
-0.17
sse
-0.17
ly
-0.17
lyn
-0.17
ors
-0.16
ory
-0.15
dater
-0.15
ÑģÑı
-0.15
leaf
-0.15
orer
-0.14
POSITIVE LOGITS
wide
0.28
hood
0.25
nal
0.23
-wide
0.22
alse
0.22
ally
0.22
istic
0.21
/world
0.21
-states
0.19
ality
0.19
Activations Density 0.015%