INDEX
Explanations
phrases indicating claims, decisions, or assertions regarding political actions or accountability
New Auto-Interp
Head Attr Weights
0:0.03
1:0.02
2:0.11
3:0.08
4:0.13
5:0.05
6:0.05
7:0.05
8:0.08
9:0.08
10:0.13
11:0.13
Negative Logits
#$
-1.43
(&
-1.37
Bastard
-1.33
verages
-1.32
sbm
-1.30
Bundy
-1.27
Belief
-1.24
Awesome
-1.24
Break
-1.23
CLASS
-1.22
POSITIVE LOGITS
).[
1.30
boutique
1.25
hers
1.24
later
1.23
ordable
1.22
respectively
1.19
domestically
1.19
creditor
1.16
".[
1.15
ourt
1.15
Activations Density 0.017%