INDEX
Explanations
attends to error-related tokens from code segments that handle exceptions or error checking
New Auto-Interp
Head Attr Weights
0:0.02
1:0.55
2:0.06
3:0.02
4:0.04
5:0.17
6:0.05
7:0.05
Negative Logits
myſelf
-1.70
itſelf
-1.66
Efq
-1.58
Theſe
-1.51
Monfieur
-1.48
ſelf
-1.44
―――――
-1.43
pleaſure
-1.42
ſeveral
-1.41
Jefus
-1.41
POSITIVE LOGITS
0.85
0.84
(
0.79
.
0.74
in
0.71
<eos>
0.69
,
0.67
of
0.67
0.66
[
0.66
Activations Density 0.033%