INDEX
Explanations
instances of disappointment or failed expectations
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.09
3:0.39
4:0.06
5:0.03
6:0.04
7:0.07
8:0.05
9:0.05
10:0.06
11:0.08
Negative Logits
":["
-1.76
"?
-1.63
icago
-1.57
?,
-1.53
Annotations
-1.48
Conquer
-1.43
WATCH
-1.42
Reviewed
-1.38
?:
-1.38
":[
-1.36
POSITIVE LOGITS
succumbed
2.10
etheless
1.97
nonetheless
1.52
conflic
1.51
watered
1.50
tragically
1.48
unnecess
1.46
corrupted
1.41
quite
1.40
nevertheless
1.40
Activations Density 0.064%