INDEX
    Explanations

    various numeric patterns or representations

    New Auto-Interp
    Negative Logits
    ness
    -0.81
    ers
    -0.76
    ↵↵
    -0.75
    -0.75
    <sup>
    -0.70
    er
    -0.70
    an
    -0.69
    ja
    -0.67
    <h2>
    -0.67
    en
    -0.66
    POSITIVE LOGITS
     }}$}
    1.40
    "}
    1.38
    "]}
    1.36
    '}
    1.31
    ']}
    1.30
    ")}
    1.29
    ]")]
    1.27
     "}
    1.22
    ).}
    1.12
    ')}
    1.11
    Act Density 0.288%

    No Known Activations