INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     étoient
    -1.33
     avoient
    -1.33
     feroit
    -1.32
     démocr
    -1.30
     cérami
    -1.27
     quelcon
    -1.27
     auroit
    -1.26
     enfans
    -1.24
     normaux
    -1.24
     définiti
    -1.23
    POSITIVE LOGITS
     the
    0.65
    ,
    0.63
    ↵↵
    0.63
     in
    0.62
     and
    0.60
     a
    0.60
    0.60
    0.58
    .
    0.58
     for
    0.57
    Act Density 0.182%

    No Known Activations