INDEX
Explanations
references to decision-making and change in context of political or social issues
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.05
3:0.04
4:0.14
5:0.02
6:0.08
7:0.44
8:0.03
9:0.03
10:0.04
11:0.06
Negative Logits
cade
-1.86
imentary
-1.73
dding
-1.55
Cru
-1.54
cour
-1.52
staking
-1.50
PLIED
-1.47
tips
-1.47
enary
-1.41
Dig
-1.41
POSITIVE LOGITS
radically
1.75
fortunes
1.70
colours
1.68
drastically
1.60
colors
1.57
habits
1.55
tone
1.53
wording
1.51
dramatically
1.49
appearance
1.48
Activations Density 0.088%