INDEX
    Explanations

    labels and explanations

    New Auto-Interp
    Negative Logits
     apprend
    0.62
     intelligents
    0.60
     internationaux
    0.59
     armes
    0.58
     italiana
    0.57
     auront
    0.57
     organiques
    0.57
     immagine
    0.56
     azienda
    0.55
     exécut
    0.55
    POSITIVE LOGITS
    f
    0.54
    la
    0.49
    int
    0.47
    one
    0.46
    T
    0.46
     Minority
    0.45
    ash
    0.45
    ne
    0.44
    st
    0.44
    ane
    0.44
    Act Density 1.259%

    No Known Activations