INDEX
    Explanations

    auxiliary verb constructions

    New Auto-Interp
    Negative Logits
    ка
    1.27
    at
    1.24
    ل
    1.21
    ל
    1.18
    it
    1.10
    ת
    1.09
    '
    1.05
    ه
    1.05
    if
    0.99
    0.99
    POSITIVE LOGITS
     
    1.52
     be
    1.02
     of
    0.99
     is
    0.98
    dır
    0.90
     OF
    0.83
    的声音
    0.83
    های
    0.80
    of
    0.78
    0.77
    Act Density 0.334%

    No Known Activations