INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    <bos>
    -3.14
    /***
    
    -0.91
    <?
    -0.73
    ///**
    -0.73
    
    
    -0.69
    //---
    -0.67
    /*
    -0.66
    #
    -0.66
    /*!
    
    -0.65
    -0.65
    POSITIVE LOGITS
     volunte
    1.78
     ecru
    1.75
     fortn
    1.70
     impra
    1.60
     unlaw
    1.60
     affor
    1.59
     ibiza
    1.58
     maneu
    1.58
     madonna
    1.53
     tolerably
    1.52
    Act Density 0.084%

    No Known Activations