INDEX
    Explanations

    possessive pronoun and relative pronoun

    New Auto-Interp
    Negative Logits
    nya
    1.26
    ni
    1.18
    つまり
    1.16
    tained
    1.12
    ms
    1.07
    la
    1.07
     dotycz
    1.07
    spring
    1.06
    lerinden
    1.04
    ts
    1.03
    POSITIVE LOGITS
    ل
    1.16
    ят
    1.11
    ن
    1.05
    ير
    1.03
    知道
    1.02
    л
    1.02
    ت
    0.98
     уравнения
    0.91
    多次
    0.86
    行う
    0.86
    Act Density 0.073%

    No Known Activations