INDEX
    Explanations

    gradient descent

    New Auto-Interp
    Negative Logits
     Huffman
    -0.06
    	Document
    -0.06
     vot
    -0.06
    ifiant
    -0.06
    kh
    -0.06
    whether
    -0.06
    /ay
    -0.06
    -0.06
    preg
    -0.06
     buiten
    -0.06
    POSITIVE LOGITS
     pardon
    0.06
    Dt
    0.06
    cuador
    0.06
    SD
    0.06
     ascent
    0.06
     Dt
    0.06
    ọi
    0.06
     Onion
    0.06
    _DEL
    0.06
     شن
    0.06
    Act Density 0.002%

    No Known Activations