INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    د
    2.05
    d
    2.02
    ل
    1.63
    1.44
    ד
    1.37
    {,}
    1.35
    д
    1.30
    1.27
    {
    1.26
     powied
    1.19
    POSITIVE LOGITS
    </h3>
    1.44
    </h2>
    1.42
    ö
    1.27
    </h1>
    1.25
    1.22
    </h4>
    1.20
    ia
    1.17
    -
    1.17
    </span>
    1.16
    </h5>
    1.15
    Act Density 0.045%

    No Known Activations