INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     sfera
    -0.90
     Consiglio
    -0.82
     Dalla
    -0.82
     apos
    -0.80
     ($('#
    -0.80
    -0.79
    備考
    -0.77
     transmite
    -0.76
     identifica
    -0.76
     limite
    -0.75
    POSITIVE LOGITS
    upp
    0.88
    bara
    0.87
    甜點
    0.84
    agles
    0.81
     petición
    0.77
    Horário
    0.77
    torio
    0.73
    enuine
    0.73
     vuelve
    0.73
     إلى
    0.73
    Act Density 0.003%

    No Known Activations