INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     This
    0.74
     
    0.72
    This
    0.70
     this
    0.68
    ↵↵
    0.67
    [
    0.67
     h
    0.67
     H
    0.66
     P
    0.66
    <start_of_image>
    0.66
    POSITIVE LOGITS
    <unused369>
    1.15
    <unused427>
    1.13
    <unused327>
    1.12
    <unused735>
    1.11
    <unused309>
    1.10
    <unused307>
    1.09
    <unused724>
    1.08
    <unused325>
    1.08
    <unused505>
    1.08
    <unused389>
    1.07
    Act Density 0.001%

    No Known Activations