INDEX
    Explanations

    ending, vanishing, destroying

    New Auto-Interp
    Negative Logits
     inégal
    0.83
    缺少
    0.81
     inconsistent
    0.79
     desigual
    0.78
    不稳定
    0.77
    müş
    0.76
     تاثیر
    0.75
    umsuz
    0.72
     beeinfl
    0.72
     अतिक
    0.72
    POSITIVE LOGITS
     inciner
    0.99
     exorc
    0.89
     DELETE
    0.88
     vanish
    0.87
     drown
    0.82
     BURN
    0.82
     suic
    0.81
     refunds
    0.80
     delete
    0.80
     rewrite
    0.80
    Act Density 0.208%

    No Known Activations