INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ین
    1.06
    </h2>
    0.91
    ید
    0.84
    </h4>
    0.84
    ک
    0.83
    𝗻
    0.79
     are
    0.79
     ہے
    0.75
    ında
    0.72
    </h3>
    0.71
    POSITIVE LOGITS
    ي
    1.11
    '
    0.97
    厚的
    0.93
    Z
    0.93
     Thick
    0.90
    ل
    0.89
     épais
    0.87
    По
    0.83
    ت
    0.82
    lend
    0.80
    Act Density 0.013%

    No Known Activations