INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1.07
    ج
    0.94
    0.90
    ش
    0.89
    ق
    0.86
    0.83
    ва
    0.82
    ない
    0.81
    この
    0.81
    0.81
    POSITIVE LOGITS
    B
    1.02
    on
    0.89
    N
    0.86
    اك
    0.83
    S
    0.82
    C
    0.79
    L
    0.77
    W
    0.76
    P
    0.75
    U
    0.75
    Act Density 0.001%

    No Known Activations