INDEX
    Explanations

    punctuation marks and their variations

    numbered lists and references

    New Auto-Interp
    Negative Logits
    ad
    -0.29
    Why
    -0.27
    -0.27
    (
    -0.27
    What
    -0.26
    -0.26
    ta
    -0.25
    P
    -0.24
    De
    -0.24
    -0.24
    POSITIVE LOGITS
     Infórmanos
    0.98
    enterOuterAlt
    0.95
     EconPapers
    0.90
    [@BOS@]
    0.89
    <unused23>
    0.88
    <unused8>
    0.88
    <unused43>
    0.88
    <unused68>
    0.88
    <unused79>
    0.88
    <pad>
    0.88
    Act Density 0.105%

    No Known Activations