INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     répondit
    -1.13
     afferma
    -1.04
    noted
    -1.04
     fermés
    -0.98
     sauvages
    -0.98
     scolaires
    -0.98
     noted
    -0.98
     mondta
    -0.98
     katanya
    -0.96
     avoient
    -0.96
    POSITIVE LOGITS
    <
    0.36
     look
    0.33
     them
    0.32
    ан
    0.30
     couple
    0.30
     els
    0.26
    le
    0.25
     turn
    0.24
     suppose
    0.24
     reg
    0.24
    Act Density 0.002%

    No Known Activations