INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     lié
    0.94
     daba
    0.86
     deux
    0.84
     sirop
    0.80
     liberté
    0.77
     constitue
    0.75
     dedans
    0.75
     déterminé
    0.75
     médian
    0.74
     demander
    0.73
    POSITIVE LOGITS
    m
    1.05
    re
    1.04
    1.02
    вых
    0.90
     Initially
    0.88
    s
    0.86
    h
    0.86
    ы
    0.85
    ת
    0.84
    oriented
    0.84
    Act Density 0.002%

    No Known Activations