INDEX
Explanations
references to repetition or similarity
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.16
3:0.05
4:0.14
5:0.03
6:0.16
7:0.20
8:0.06
9:0.02
10:0.04
11:0.04
Negative Logits
quartered
-1.92
Cosponsors
-1.40
weight
-1.36
arning
-1.32
elected
-1.31
arra
-1.28
Compare
-1.28
WATCHED
-1.26
condem
-1.24
adelphia
-1.20
POSITIVE LOGITS
goodies
1.50
bish
1.44
uid
1.43
originals
1.37
goodness
1.36
nel
1.35
rio
1.35
clutter
1.20
lik
1.18
arious
1.18
Activations Density 0.006%