INDEX
    Explanations

    network followed by attribute

    New Auto-Interp
    Negative Logits
    مي
    1.13
    1.04
    ков
    1.01
    ك
    0.98
    na
    0.96
    de
    0.93
    ного
    0.93
     كتاب
    0.91
    ка
    0.89
    ные
    0.86
    POSITIVE LOGITS
    ک
    1.25
    1.11
     networks
    1.03
    1.02
    1.01
    0.96
    0.96
     Network
    0.95
    かわいい
    0.95
    0.93
    Act Density 0.032%

    No Known Activations