INDEX
Explanations
references to significant awards or recognitions
New Auto-Interp
Head Attr Weights
0:0.17
1:0.04
2:0.01
3:0.10
4:0.24
5:0.08
6:0.07
7:0.03
8:0.10
9:0.09
10:0.01
11:0.02
Negative Logits
inappropriately
-1.83
icient
-1.72
gging
-1.65
mism
-1.63
indefinite
-1.63
sideline
-1.63
emetery
-1.62
innocence
-1.61
contradictory
-1.61
conflicting
-1.60
POSITIVE LOGITS
)
2.47
EVER
2.42
►
2.39
]
2.27
),
2.22
):
2.22
✓
2.19
·
2.19
Posted
2.19
!
2.16
Activations Density 0.019%