INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1
    1.57
    ‌ای
    1.14
    nya
    1.13
    ‌ها
    1.11
    1.09
     have
    1.04
    0.99
    ありません
    0.98
     is
    0.98
    ın
    0.97
    POSITIVE LOGITS
    in
    1.98
    inį
    1.36
    ل
    1.30
    at
    1.30
    inah
    1.28
    л
    1.25
    ל
    1.23
    nX
    1.20
    inę
    1.18
    ר
    1.18
    Act Density 0.015%

    No Known Activations