INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    𝘁
    0.72
    𝗹
    0.72
    𝗱
    0.71
    𝗵
    0.63
    𝘀
    0.62
    𝘂
    0.59
    𝗿
    0.59
    ية
    0.59
    𝗴
    0.59
    0.57
    POSITIVE LOGITS
    or
    0.54
    Finale
    0.52
    et
    0.52
     употре
    0.50
     profesora
    0.49
     sûr
    0.48
     pensée
    0.48
     Alpes
    0.46
    și
    0.45
    em
    0.45
    Act Density 0.028%

    No Known Activations