INDEX
    Explanations

    user requests and content generation

    New Auto-Interp
    Negative Logits
    ","",
    1.29
    %","
    1.11
    ",'
    1.11
    (),"
    1.08
     ","
    1.03
     ()=>
    1.00
     :"
    0.99
     ','
    0.99
    *****",
    0.94
     ,"
    0.93
    POSITIVE LOGITS
    </h2>
    2.36
    </h1>
    2.01
    </h3>
    1.87
    <start_of_image>
    1.80
     阅读全文
    1.61
    </blockquote>
    1.51
     […]
    1.51
    </td>
    1.45
     [...]
    1.43
    <0x0D>
    1.42
    Act Density 6.753%

    No Known Activations