INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    =+
    0.47
    ຜະລິດຕະພັນ
    0.46
     atteinte
    0.46
     wounding
    0.45
     encroach
    0.45
     стан
    0.44
     circumvent
    0.44
    brown
    0.44
     먼저
    0.44
     विषम
    0.43
    POSITIVE LOGITS
    0.48
    на
    0.48
    ã
    0.47
    نش
    0.47
    Grazie
    0.44
    0.43
    ovina
    0.43
    SER
    0.41
     Isso
    0.41
    лада
    0.41
    Act Density 0.001%

    No Known Activations