INDEX
    Explanations

    pronouns referring to the reader or audience

    New Auto-Interp
    Negative Logits
    <bos>
    -1.56
    -1.33
    <?
    -1.07
    
    
    -1.06
    /**
    -0.92
    /***
    
    -0.84
    /*
    -0.81
    <?
    
    -0.74
    #
    -0.70
    /**
    
    
    -0.66
    POSITIVE LOGITS
     maroc
    1.09
     meis
    1.08
     maneu
    0.96
     disreg
    0.95
     désol
    0.95
     endom
    0.92
     lamborghini
    0.91
     impra
    0.90
     italia
    0.90
     ibiza
    0.90
    Act Density 0.107%

    No Known Activations