INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     
    1.26
     (
    1.22
     #
    1.13
     {
    1.13
     a
    1.12
    s
    1.10
     [
    1.10
     un
    1.09
     ​​
    1.08
     ein
    1.04
    POSITIVE LOGITS
    <unused1224>
    1.76
    <unused932>
    1.75
    <unused1939>
    1.74
    <unused313>
    1.72
    <unused1871>
    1.72
    <unused2025>
    1.71
    <unused167>
    1.71
    <unused2037>
    1.70
    <unused312>
    1.70
    <unused1182>
    1.69
    Act Density 0.120%

    No Known Activations