INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ih
    1.41
    k
    1.36
    ى
    1.31
    g
    1.30
    cL
    1.18
    h
    1.17
    c
    1.17
    ि
    1.16
    m
    1.14
    ing
    1.13
    POSITIVE LOGITS
    на
    1.38
    1.38
    տ
    1.23
     
    1.08
    1.07
    ה
    1.03
     particulier
    1.02
    τή
    0.98
    isierten
    0.97
    มือ
    0.95
    Act Density 0.060%

    No Known Activations