INDEX
    Explanations

    endings and conclusions

    New Auto-Interp
    Negative Logits
    '
    1.88
    ъ
    1.34
    ב
    1.30
    		
    1.30
    1.27
    ار
    1.26
    ע
    1.25
                    
    1.18
    é
    1.18
                  
    1.16
    POSITIVE LOGITS
    the
    1.52
    end
    1.23
    ris
    1.20
    tta
    1.17
    ла
    1.09
    time
    1.06
    端的
    1.06
    ran
    1.05
    at
    1.04
    ك
    1.03
    Act Density 0.115%

    No Known Activations