INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     unilaterally
    0.75
     guarantees
    0.72
    igit
    0.68
    0.67
    isola
    0.65
    δα
    0.65
    ͯ
    0.64
    discrete
    0.62
     discrete
    0.62
    فق
    0.62
    POSITIVE LOGITS
     saludos
    0.84
     Distress
    0.84
    0.84
    0.84
     exposed
    0.84
     celeste
    0.83
     tejto
    0.82
     unsure
    0.82
     বেড়ে
    0.82
     bellissimo
    0.81
    Act Density 0.048%

    No Known Activations