INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Prüfung
    -0.76
    ویژگی
    -0.74
    rouille
    -0.73
    duga
    -0.72
    arté
    -0.71
    entier
    -0.71
     Präsentation
    -0.70
    clientes
    -0.69
     Zurück
    -0.69
     Betrachtung
    -0.69
    POSITIVE LOGITS
     limão
    0.79
     renovating
    0.77
    seamnă
    0.76
     Ponto
    0.76
    ctuation
    0.75
     Wellesley
    0.73
     Liberation
    0.73
    0.73
    глу
    0.72
     liberating
    0.71
    Act Density 0.004%

    No Known Activations