INDEX
    Explanations

    abbreviated units and codes

    New Auto-Interp
    Negative Logits
    l
    0.95
     a
    0.93
     I
    0.83
     A
    0.67
    A
    0.66
     B
    0.63
    0.61
     J
    0.58
    0.58
     D
    0.57
    POSITIVE LOGITS
    на
    1.14
    ش
    0.92
     on
    0.89
    د
    0.83
    ح
    0.78
    0.77
    ص
    0.77
    و
    0.76
    ج
    0.75
    0.75
    Act Density 0.439%

    No Known Activations