INDEX
    Explanations

    HTML table elements and their attributes

    New Auto-Interp
    Negative Logits
    {{/
    -0.82
    neſs
    -0.79
    leſs
    -0.78
     itſelf
    -0.78
    verläs
    -0.74
     ―――――
    -0.72
     myſelf
    -0.71
     tfsi
    -0.70
    ſelves
    -0.70
    nefs
    -0.68
    POSITIVE LOGITS
    <td>
    1.81
    <th>
    1.13
    <h1>
    0.90
    <b>
    0.86
    <h3>
    0.82
    <blockquote>
    0.80
    (
    0.79
     yoksa
    0.78
    <i>
    0.78
    <strong>
    0.76
    Act Density 0.001%

    No Known Activations