INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     bezeichneter
    -0.66
    -0.61
    GNA
    -0.61
    égories
    -0.60
     hers
    -0.60
     housse
    -0.58
    =$?
    -0.57
    Приятного
    -0.57
     Segal
    -0.57
     \/
    -0.57
    POSITIVE LOGITS
    <blockquote>
    3.63
    </blockquote>
    1.32
    <h5>
    1.11
    <h4>
    1.09
    blockquote
    1.04
    <code>
    1.02
    <h1>
    1.01
    <em>
    0.99
    <h6>
    0.97
    <td>
    0.95
    Act Density 0.045%

    No Known Activations