INDEX
    Explanations

    possessive or attributive relation

    New Auto-Interp
    Negative Logits
    h
    1.65
    (
    1.40
    ad
    1.27
    \
    1.23
    >
    1.21
    Q
    1.19
    E
    1.18
    S
    1.16
    ض
    1.13
    Z
    1.12
    POSITIVE LOGITS
    с
    2.13
    ра
    1.80
    س
    1.50
    то
    1.47
    ли
    1.44
    ре
    1.41
    ла
    1.29
    1.25
    ס
    1.25
    ро
    1.23
    Act Density 0.065%

    No Known Activations