INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    angered
    -0.73
    ité
    -0.71
    rang
    -0.69
    ÑĮ
    -0.67
    RAY
    -0.67
    kson
    -0.66
     interns
    -0.66
    rified
    -0.66
    linger
    -0.65
    ISON
    -0.65
    POSITIVE LOGITS
     court
    0.90
     succession
    0.86
     spite
    0.82
     battle
    0.80
     finals
    0.79
     favor
    0.77
     terms
    0.77
     rematch
    0.76
     favour
    0.76
     sight
    0.75
    Act Density 0.141%

    No Known Activations