INDEX
    Explanations

    phrases indicating positive outcomes or success

    because, due to, thanks to

    New Auto-Interp
    Negative Logits
    poň
    -0.40
     Alfaro
    -0.38
     sorpresa
    -0.38
    tuur
    -0.36
     zaten
    -0.36
     mł
    -0.35
    processable
    -0.35
    izowane
    -0.35
     Penga
    -0.34
     consigo
    -0.34
    POSITIVE LOGITS
     due
    0.98
     because
    0.91
    due
    0.88
     BECAUSE
    0.87
     thanks
    0.85
     DUE
    0.84
    because
    0.84
     благодаря
    0.83
    Because
    0.83
     Because
    0.83
    Act Density 0.025%

    No Known Activations