INDEX
    Explanations

    references to actions or instructions, particularly related to reducing something

    New Auto-Interp
    Negative Logits
    <bos>
    -3.12
    <?
    -0.97
    /**
    -0.94
    -0.88
    /***
    
    -0.83
    
    
    -0.81
    ///**
    -0.73
    <?
    
    -0.67
    /*
    -0.66
    //---
    -0.59
    POSITIVE LOGITS
     kasa
    1.34
     lele
    1.34
     bandung
    1.29
     jaya
    1.25
     saar
    1.20
     jati
    1.20
     Minang
    1.19
     emphat
    1.17
     hina
    1.17
     ftu
    1.16
    Act Density 0.177%

    No Known Activations