INDEX
    Explanations

    most followed by adjective

    New Auto-Interp
    Negative Logits
    ك
    1.60
    1.48
     I
    1.28
    2
    1.28
    ির
    1.27
    จะ
    1.23
    ка
    1.20
    ной
    1.20
    '}}
    1.18
    أ
    1.15
    POSITIVE LOGITS
    t
    2.50
    tained
    1.34
    تان
    1.26
    تها
    1.23
    ták
    1.22
    p
    1.21
    tive
    1.19
    (
    1.18
    ts
    1.13
    tu
    1.13
    Act Density 0.098%

    No Known Activations