INDEX
    Explanations

    phrases indicating a cause and effect relationship

    New Auto-Interp
    Negative Logits
    <bos>
    -0.98
    /**
    -0.59
    -0.58
     push
    -0.52
    -0.52
    
    
    -0.50
     увиде
    -0.50
    /*!
    
    -0.49
    leşti
    -0.49
     pushed
    -0.48
    POSITIVE LOGITS
     uhr
    1.44
     Minang
    1.40
     saar
    1.36
     thereby
    1.35
     maksi
    1.30
     seksi
    1.27
     Meksi
    1.26
     keramik
    1.25
     lemp
    1.25
     Strukt
    1.24
    Act Density 0.270%

    No Known Activations