INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.07
1:0.05
2:0.09
3:0.08
4:0.09
5:0.09
6:0.08
7:0.08
8:0.09
9:0.07
10:0.08
11:0.08
Negative Logits
Downloadha
-1.93
DOI
-1.78
disproportionately
-1.73
averaged
-1.71
Zup
-1.70
Wem
-1.66
totaled
-1.63
traditionally
-1.61
median
-1.58
doi
-1.54
POSITIVE LOGITS
Dialogue
1.77
SHIP
1.58
NK
1.58
RAFT
1.55
DAY
1.54
assi
1.53
UGH
1.51
lett
1.49
UGE
1.48
ERROR
1.47
Activations Density 0.000%
No Known Activations
This feature has no known activations.