INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    <bos>
    -0.60
     שוליים
    -0.57
    abestanden
    -0.56
    inspir
    -0.54
     concealer
    -0.53
     crystal
    -0.52
     noten
    -0.52
     recy
    -0.51
    amarin
    -0.51
     eddy
    -0.51
    POSITIVE LOGITS
    principalColumn
    0.60
     مشين
    0.55
    umumkan
    0.53
     promozione
    0.52
     varandra
    0.50
     promoção
    0.49
    Jeografia
    0.49
     pessoais
    0.49
    qrstuvwxyz
    0.49
     peindre
    0.49
    Act Density 0.002%

    No Known Activations