INDEX
    Explanations

    types of written content

    New Auto-Interp
    Negative Logits
    0.46
     كلمات
    0.42
    制品
    0.41
     Bücher
    0.41
     كلام
    0.40
    يدي
    0.38
    0.38
     الي
    0.38
    相同
    0.38
     großes
    0.38
    POSITIVE LOGITS
    ک
    0.46
    ok
    0.46
    il
    0.45
    ulating
    0.44
    f
    0.44
    id
    0.43
    لی
    0.42
    optera
    0.42
    -
    0.42
    0.42
    Act Density 0.007%

    No Known Activations