INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     respectivamente
    -0.89
     enfans
    -0.81
     natten
    -0.79
     amitié
    -0.79
     midnight
    -0.77
     respectivement
    -0.77
     touristes
    -0.75
     normaux
    -0.75
     artesanales
    -0.72
     rispet
    -0.70
    POSITIVE LOGITS
     []:
    0.59
     Giles
    0.52
    tas
    0.51
     trunks
    0.50
    tle
    0.49
     letters
    0.48
     RTL
    0.48
    tl
    0.47
    ness
    0.47
     alt
    0.47
    Act Density 0.087%

    No Known Activations