INDEX
Explanations
instances of the word "national" and its variations in various contexts
New Auto-Interp
Negative Logits
ext
-0.18
gh
-0.18
nice
-0.17
ory
-0.17
nations
-0.16
h
-0.16
nice
-0.15
ature
-0.15
Nations
-0.15
eden
-0.14
POSITIVE LOGITS
istic
0.31
ities
0.28
/local
0.27
ized
0.25
istically
0.24
izing
0.23
/reg
0.23
anthem
0.22
ization
0.21
/global
0.21
Activations Density 0.035%