INDEX
    Explanations

    business, outcomes, or types

    New Auto-Interp
    Negative Logits
    ج
    1.10
    ش
    0.93
    0.90
    0.89
    ت
    0.88
    ك
    0.84
    0.82
     in
    0.79
    ع
    0.77
    g
    0.74
    POSITIVE LOGITS
     a
    0.84
    ↵↵
    0.76
     v
    0.75
    いて
    0.73
    ;
    0.73
     
    0.71
    hta
    0.70
    ou
    0.68
    8
    0.68
    0.68
    Act Density 0.004%

    No Known Activations