INDEX
Explanations
mentions of goals, issues, and perspectives related to societal and economic concerns
New Auto-Interp
Head Attr Weights
0:0.14
1:0.13
2:0.02
3:0.11
4:0.07
5:0.05
6:0.04
7:0.01
8:0.17
9:0.15
10:0.05
11:0.01
Negative Logits
sembly
-2.12
elect
-2.01
liament
-1.90
ixed
-1.89
semble
-1.88
airo
-1.86
lett
-1.79
ogram
-1.78
BO
-1.77
helial
-1.77
POSITIVE LOGITS
these
2.47
madness
2.26
notions
2.25
scenarios
2.22
shenanigans
2.21
misconceptions
2.17
assumptions
2.17
these
2.16
occurrences
2.14
motivations
2.13
Activations Density 0.110%