INDEX
    Explanations

    code, neural network size parameters

    New Auto-Interp
    Negative Logits
     Carolyn
    -0.08
     tipping
    -0.08
     Maggie
    -0.07
    Plural
    -0.07
    _green
    -0.07
    قد
    -0.07
     Willie
    -0.07
     contours
    -0.07
    कारी
    -0.07
    ioxide
    -0.07
    POSITIVE LOGITS
     currently
    0.08
    вам
    0.08
     oth
    0.08
    [random
    0.08
    <a
    0.08
     temperat
    0.07
    .learning
    0.07
     mutation
    0.07
     afer
    0.07
     new
    0.07
    Act Density 0.001%

    No Known Activations