INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     un
    -0.88
     uncut
    -0.68
     une
    -0.60
     uns
    -0.59
     Un
    -0.58
     unchecked
    -0.56
     undis
    -0.56
     unre
    -0.55
     unde
    -0.55
     una
    -0.54
    POSITIVE LOGITS
     élas
    0.87
     cérami
    0.85
     vérit
    0.78
     lèvres
    0.77
     pertes
    0.77
     stället
    0.77
     âmes
    0.75
     écout
    0.75
     fenêtres
    0.73
     étoient
    0.72
    Act Density 0.055%

    No Known Activations