INDEX
Explanations
references to encyclopedic content
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.08
3:0.05
4:0.04
5:0.02
6:0.15
7:0.28
8:0.04
9:0.06
10:0.17
11:0.05
Negative Logits
veto
-1.82
vs
-1.74
opposition
-1.72
retaliation
-1.69
versus
-1.68
boo
-1.66
retali
-1.62
AME
-1.58
overpower
-1.55
trem
-1.55
POSITIVE LOGITS
mathemat
2.05
Journals
1.94
cyclopedia
1.94
Teaching
1.90
externalActionCode
1.84
arthed
1.84
lished
1.78
endium
1.77
readable
1.76
enment
1.75
Activations Density 0.000%