INDEX
    Explanations

    we you they give tell provide

    New Auto-Interp
    Negative Logits
    Was
    2.11
     Was
    2.04
    was
    1.99
     was
    1.88
     быть
    1.82
     wasnt
    1.70
     WAS
    1.69
     Wasn
    1.64
     был
    1.63
     wasn
    1.62
    POSITIVE LOGITS
     vengono
    1.99
     می‌شوند
    1.72
     đều
    1.70
     помогают
    1.59
     fazem
    1.59
     начинают
    1.56
     जातात
    1.52
     ficam
    1.52
     doivent
    1.52
     are
    1.50
    Act Density 0.345%

    No Known Activations