INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     egens
    -2.03
    After
    -1.86
     virke
    -1.84
     millioner
    -1.82
     after
    -1.81
     samti
    -1.77
     ansatte
    -1.76
    Despite
    -1.74
     stål
    -1.69
     tett
    -1.69
    POSITIVE LOGITS
     Brings
    1.98
     prachtige
    1.90
     astounding
    1.82
     and
    1.78
     a
    1.73
    了些
    1.69
    了许多
    1.68
     verschillende
    1.68
    了一些
    1.67
     lauded
    1.66
    Act Density 0.004%

    No Known Activations