INDEX
    Explanations

    square shapes, faces, panels

    New Auto-Interp
    Negative Logits
    ت
    0.73
    т
    0.68
     It
    0.65
    ي
    0.65
     on
    0.64
    يّ
    0.64
     On
    0.60
    وفر
    0.59
     If
    0.59
     理解
    0.58
    POSITIVE LOGITS
     as
    0.84
    f
    0.70
    c
    0.70
    0.70
    ۰
    0.70
    ast
    0.69
     და
    0.68
    and
    0.66
    0.66
    im
    0.65
    Act Density 0.001%

    No Known Activations