INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    <i>
    -0.65
    <sup>
    -0.63
    <blockquote>
    -0.61
    <bos>
    -0.59
    <u>
    -0.59
    <em>
    -0.58
    <b>
    -0.57
    <h2>
    -0.54
    -0.50
    /
    -0.50
    POSITIVE LOGITS
    \%
    1.37
    \%)
    0.95
    \%,
    0.89
    \#
    0.84
    \#:
    0.81
     \%
    0.81
    ValueStyle
    0.76
     queſta
    0.75
     betweenstory
    0.75
     (\%
    0.73
    Act Density 0.035%

    No Known Activations