INDEX
Explanations
statements or claims about facts and attributes related to government or national identity
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.06
3:0.15
4:0.07
5:0.07
6:0.05
7:0.12
8:0.03
9:0.03
10:0.14
11:0.20
Negative Logits
idth
-1.56
ciation
-1.31
pad
-1.30
earch
-1.26
speak
-1.26
HUD
-1.25
landing
-1.24
placeholder
-1.22
hiring
-1.19
ifully
-1.17
POSITIVE LOGITS
worms
1.49
wcsstore
1.43
龍�
1.29
Tablet
1.24
idium
1.21
apo
1.18
Invaders
1.18
ROR
1.18
();
1.17
Gems
1.17
Activations Density 0.011%