INDEX
Explanations
concepts related to worldview or perspective
New Auto-Interp
Head Attr Weights
0:0.12
1:0.05
2:0.07
3:0.05
4:0.17
5:0.14
6:0.08
7:0.05
8:0.07
9:0.06
10:0.04
11:0.05
Negative Logits
icient
-2.10
alone
-2.04
unda
-1.96
ombies
-1.96
dule
-1.94
ccoli
-1.93
oreal
-1.91
ayn
-1.89
worth
-1.89
ptin
-1.88
POSITIVE LOGITS
natureconservancy
2.22
Pirate
1.83
Democracy
1.77
�士
1.74
Pug
1.73
Lobby
1.72
country
1.71
Pirates
1.70
CLIENT
1.69
backdrop
1.68
Activations Density 0.000%