INDEX
    Explanations

    instances of numbers and code formatting elements

    code or sentence starters

    New Auto-Interp
    Negative Logits
     zwiſchen
    -0.72
    majánló
    -0.71
    [@BOS@]
    -0.69
    <unused8>
    -0.69
    <unused47>
    -0.69
    <unused79>
    -0.69
    <unused28>
    -0.69
    <unused23>
    -0.69
    <unused14>
    -0.69
    <unused16>
    -0.69
    POSITIVE LOGITS
     originally
    0.44
     nahilalakip
    0.39
     Originally
    0.39
     angelegt
    0.37
    Originally
    0.36
     The
    0.35
     Although
    0.34
     Though
    0.33
    Though
    0.32
    The
    0.32
    Act Density 0.004%

    No Known Activations