INDEX
Explanations
references to national identity and related themes
New Auto-Interp
Negative Logits
orch
-0.17
liness
-0.16
ext
-0.16
ally
-0.16
ius
-0.15
gh
-0.15
oro
-0.15
als
-0.15
ing
-0.15
wards
-0.15
POSITIVE LOGITS
ities
0.30
istic
0.30
istically
0.23
-security
0.21
Anthem
0.21
ités
0.21
ité
0.20
anthem
0.20
ISTIC
0.19
ity
0.19
Activations Density 0.026%