INDEX
Explanations
references to personal stories and shared experiences
New Auto-Interp
Head Attr Weights
0:0.08
1:0.02
2:0.05
3:0.29
4:0.02
5:0.12
6:0.01
7:0.05
8:0.02
9:0.02
10:0.26
11:0.02
Negative Logits
ineligible
-2.29
incumb
-2.14
PACs
-2.07
tarians
-2.01
Defendants
-1.95
acists
-1.93
Presumably
-1.93
Liberals
-1.93
exceptions
-1.89
incumbent
-1.88
POSITIVE LOGITS
secrets
2.84
insights
2.78
discoveries
2.56
tips
2.51
wisdom
2.49
Secrets
2.36
findings
2.36
insight
2.24
discovery
2.24
thoughts
2.23
Activations Density 0.079%