INDEX
    Explanations

    Romance languages

    New Auto-Interp
    Negative Logits
    _et
    -0.06
     concat
    -0.06
    -0.06
    -0.06
     voc
    -0.06
    ods
    -0.06
     چین
    -0.06
     contribution
    -0.06
    ayas
    -0.06
    Enc
    -0.06
    POSITIVE LOGITS
     Applied
    0.07
    0.07
     lender
    0.07
    anned
    0.06
     Planned
    0.06
     ateş
    0.06
     gratitude
    0.06
    ,pos
    0.06
    Denied
    0.06
     naive
    0.06
    Act Density 0.013%

    No Known Activations