INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     he
    -1.09
     diputados
    -0.98
    -0.92
    MLA
    -0.89
     mere
    -0.88
    Σε
    -0.88
    否则
    -0.85
    -0.85
     same
    -0.85
    uParam
    -0.84
    POSITIVE LOGITS
    DED
    0.98
     Emb
    0.95
    𖥧
    0.90
    𝙻
    0.89
     стал
    0.89
     iklan
    0.85
    </strong>
    0.85
    tk
    0.84
    0.82
    0.81
    Act Density 0.024%

    No Known Activations