INDEX
    Explanations

    phrases indicating contrast or opposition

    New Auto-Interp
    Negative Logits
    <bos>
    -1.56
    
    
    -0.89
    -0.79
    <?
    
    -0.78
    public
    -0.77
    <?
    -0.77
    /**
    -0.71
    /*
    -0.71
    <>
    
    -0.64
    /*!
    
    -0.64
    POSITIVE LOGITS
     maneu
    2.10
     affor
    1.97
     increa
    1.86
     impra
    1.82
     stockholm
    1.78
     wien
    1.78
     lidl
    1.72
     inev
    1.71
     aen
    1.67
     accla
    1.65
    Act Density 0.109%

    No Known Activations