INDEX
    Explanations

    rejection and repelling advances

    New Auto-Interp
    Negative Logits
    ัก
    0.60
    রা
    0.57
    ोत्
    0.55
    0.53
    बलेट
    0.52
    0.52
     hound
    0.52
     아니
    0.51
     बनवा
    0.51
    明治
    0.49
    POSITIVE LOGITS
     Rejected
    1.04
     rejected
    0.96
     rejection
    0.86
    rejected
    0.86
    Rejected
    0.82
     reject
    0.79
    reject
    0.79
     rejects
    0.78
     Reject
    0.78
     rejet
    0.72
    Act Density 0.027%

    No Known Activations