INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    colhead
    -0.56
    şört
    -0.53
     poř
    -0.51
    nup
    -0.47
    wnież
    -0.47
    Chham
    -0.47
    picasso
    -0.46
     rhestr
    -0.46
    hold
    -0.45
     ovp
    -0.45
    POSITIVE LOGITS
    */
    1.06
     */
    1.05
    */
    
    0.87
     */
    
    0.86
    )*/
    0.80
    .*/
    0.73
    ]-->
    0.73
    })*/
    0.71
    **/
    0.70
    };*/
    0.70
    Act Density 0.049%

    No Known Activations