INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    <bos>
    -1.72
    /*
    -0.83
    -0.81
    <?
    -0.80
    
    
    -0.75
    public
    -0.74
    /**
    -0.73
    //
    -0.72
    #
    -0.70
    protected
    -0.68
    POSITIVE LOGITS
     maneu
    2.18
     accla
    2.18
     affor
    2.17
     impra
    2.12
     disagre
    2.12
     increa
    2.04
     reluct
    2.01
     emphat
    1.98
     excru
    1.97
     strick
    1.89
    Act Density 0.071%

    No Known Activations