INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    nos
    -1.67
    ```
    -1.64
    istan
    -1.52
    ulos
    -1.46
    tery
    -1.46
    holder
    -1.43
    vous
    -1.42
    ocha
    -1.41
    rais
    -1.38
    conviction
    -1.37
    POSITIVE LOGITS
    ĥ½
    2.31
    ij
    1.91
    ĨĴ
    1.90
    ĸ
    1.82
    Ĵ
    1.82
                                                                                                    
    1.80
                                                                                                                                                                                                                                    
    1.80
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
    1.80
    ↵↵     
    1.80
    1.80
    Act Density 0.269%

    No Known Activations