INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     AFR
    -0.07
    ando
    -0.07
     finir
    -0.07
     AUD
    -0.07
    adding
    -0.07
     dollars
    -0.07
    уй
    -0.07
    alore
    -0.07
     оди
    -0.07
    abilidad
    -0.07
    POSITIVE LOGITS
     Roles
    0.08
     poi
    0.08
    ยัง
    0.08
     sill
    0.08
     ayrıca
    0.08
    ропа
    0.08
    _roles
    0.08
     nanti
    0.08
     lagu
    0.08
     Poc
    0.08
    Act Density 0.128%

    No Known Activations