INDEX
    Explanations

    apology, ultimatum, extend, alibi, excuse

    New Auto-Interp
    Negative Logits
     Announced
    -0.85
    Enlight
    -0.85
     Announcements
    -0.84
    anap
    -0.83
    Noice
    -0.75
    peny
    -0.75
    anonym
    -0.75
    Doo
    -0.73
     kirim
    -0.73
    -0.73
    POSITIVE LOGITS
     an
    1.77
     Ul
    1.47
     ultimatum
    1.30
    ulti
    1.18
    Ul
    1.16
     ulti
    1.09
     olive
    0.98
     uli
    0.90
     apology
    0.88
    HandleFunc
    0.87
    Act Density 0.052%

    No Known Activations