INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    wl
    0.71
     straighten
    0.71
    fl
    0.68
    ")
    0.64
    "]
    0.64
    Advertise
    0.62
    ')
    0.61
     التعليم
    0.61
    ]
    0.61
     الي
    0.60
    POSITIVE LOGITS
    ین
    1.14
    ی
    0.97
    й
    0.95
    م
    0.78
    یان
    0.74
    0.72
    ю
    0.72
    0.71
    𝒊
    0.70
    5
    0.70
    Act Density 0.001%

    No Known Activations