INDEX
    Explanations

    is / are followed by adjective or placeholder

    New Auto-Interp
    Negative Logits
    ش
    1.02
    ?
    1.00
    0.98
    *
    0.96
    ل
    0.95
    ти
    0.93
    0.86
    ول
    0.85
    أ
    0.85
    خ
    0.82
    POSITIVE LOGITS
     are
    1.14
    d
    1.13
     is
    0.99
    dır
    0.92
    t
    0.91
    larını
    0.89
    した
    0.86
     has
    0.82
    larının
    0.80
    ła
    0.77
    Act Density 0.705%

    No Known Activations