INDEX
    Explanations

    file paths and references in code

    New Auto-Interp
    Negative Logits
    <
    -0.46
    to
    -0.44
     (
    -0.44
    4
    -0.43
    <u>
    -0.43
    8
    -0.43
      
    -0.42
    -0.42
    1
    -0.42
    a
    -0.41
    POSITIVE LOGITS
    majánló
    0.95
    adaptiveStyles
    0.93
    ſammen
    0.93
    <unused52>
    0.91
    <unused79>
    0.91
    <unused14>
    0.91
    <unused23>
    0.91
    [@BOS@]
    0.91
    <unused28>
    0.91
    <unused8>
    0.90
    Act Density 0.347%

    No Known Activations