INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    <bos>
    -2.61
    <?
    -0.77
    -0.72
    /***
    
    -0.67
    
    
    -0.64
     spek
    -0.62
     /**
    
    -0.61
     dras
    -0.59
     kast
    -0.58
     elek
    -0.58
    POSITIVE LOGITS
     unlaw
    1.00
     maneu
    1.00
     unwarran
    0.98
     toledo
    0.94
     perfon
    0.94
     increa
    0.92
     affor
    0.91
     tucson
    0.90
     chrysler
    0.89
     accla
    0.89
    Act Density 1.519%

    No Known Activations