INDEX
    Explanations

    apologies and negations

    New Auto-Interp
    Negative Logits
     seriously
    -0.10
     lis
    -0.10
     ill
    -0.10
     thank
    -0.10
     pall
    -0.09
    -Sah
    -0.09
     und
    -0.09
     ful
    -0.08
    AspNet
    -0.08
     yes
    -0.08
    POSITIVE LOGITS
     sorry
    0.35
    Sorry
    0.31
     Sorry
    0.31
    sorry
    0.30
     tiế
    0.29
    éģĹ
    0.27
    Unfortunately
    0.26
     regret
    0.26
    SOR
    0.24
    æĬ±
    0.24
    Act Density 0.322%

    No Known Activations