INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Sei
    1.11
    0.95
     primer
    0.94
     creo
    0.92
     estatales
    0.90
    ydı
    0.89
    ked
    0.89
     stijl
    0.89
     Bremen
    0.89
    ви
    0.87
    POSITIVE LOGITS
    o
    1.36
    ه
    1.32
    𝐍
    1.29
    𝓾
    1.19
    ρθ
    1.11
    ozo
    1.10
    le
    1.06
    лакти
    1.06
    1.05
    𝓷
    1.05
    Act Density 0.031%

    No Known Activations