INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    <eos>
    2.37
    ↵↵
    2.04
    <start_of_image>
    1.83
    <strong>
    1.78
    ↵↵↵
    1.69
    <b>
    1.58
    ↵↵↵↵
    1.49
    .
    1.47
    <em>
    1.44
     
    1.42
    POSITIVE LOGITS
    <unused1316>
    2.28
    <unused1324>
    2.26
    <unused1520>
    2.24
    <unused1398>
    2.23
    <unused1322>
    2.23
    <unused1525>
    2.23
    <unused1293>
    2.22
    <unused1517>
    2.22
    <unused1291>
    2.22
    <unused1333>
    2.22
    Act Density 0.164%

    No Known Activations