INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Ressource
    -0.91
     Cartes
    -0.90
     quí
    -0.85
     cannes
    -0.84
     incess
    -0.83
     siff
    -0.80
     glan
    -0.80
     Rois
    -0.79
     prodi
    -0.79
     doman
    -0.78
    POSITIVE LOGITS
     ironic
    0.91
     irony
    0.86
     posX
    0.65
     ironically
    0.64
     paradoxical
    0.61
     blest
    0.58
     xPos
    0.57
     overcrow
    0.57
     umożli
    0.56
     hypocritical
    0.56
    Act Density 0.125%

    No Known Activations