INDEX
    Explanations

    numerical values and their significance in a context

    New Auto-Interp
    Negative Logits
    y
    -0.28
    a
    -0.27
    bas
    -0.27
    p
    -0.27
    is
    -0.27
    by
    -0.26
    w
    -0.26
    -0.25
    ja
    -0.24
    g
    -0.24
    POSITIVE LOGITS
    <unused41>
    0.99
    [@BOS@]
    0.99
    <unused79>
    0.99
    <unused17>
    0.98
    <unused28>
    0.98
    <unused14>
    0.98
    <unused42>
    0.98
    <unused47>
    0.98
    <unused43>
    0.98
    <unused3>
    0.98
    Act Density 0.145%

    No Known Activations