INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
<eos>
2.37
↵↵
2.04
<start_of_image>
1.83
<strong>
1.78
↵↵↵
1.69
<b>
1.58
↵↵↵↵
1.49
.
1.47
<em>
1.44
1.42
POSITIVE LOGITS
<unused1316>
2.28
<unused1324>
2.26
<unused1520>
2.24
<unused1398>
2.23
<unused1322>
2.23
<unused1525>
2.23
<unused1293>
2.22
<unused1517>
2.22
<unused1291>
2.22
<unused1333>
2.22
Activations Density 0.164%