INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    A
    0.48
    -
    0.46
    ilah
    0.46
    ises
    0.44
    iveness
    0.43
    ’.
    0.42
    agan
    0.42
     Disabilities
    0.42
    ].
    0.41
    istic
    0.41
    POSITIVE LOGITS
    ت
    0.93
    as
    0.92
    ك
    0.90
    i
    0.88
    ي
    0.88
    c
    0.82
     on
    0.79
    0.78
    ed
    0.77
    т
    0.77
    Act Density 3.429%

    No Known Activations