INDEX
    Explanations

    essential, primary, highly, machine

    New Auto-Interp
    Negative Logits
    ل
    3.75
    м
    2.52
    з
    2.50
    م
    2.33
    ра
    2.06
     harassing
    1.97
    1.95
    ar
    1.91
    1.87
    1.85
    POSITIVE LOGITS
    2.39
    𝘈
    1.95
    𝒟
    1.95
    1.75
    Ін
    1.70
    Есть
    1.69
     Един
    1.69
    Під
    1.68
     mejora
    1.67
    𝙳
    1.66
    Act Density 0.813%

    No Known Activations