INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    زا
    -0.07
    lett
    -0.07
    га
    -0.07
    녕하세요
    -0.07
    pair
    -0.07
     záv
    -0.07
     sociale
    -0.06
    fa
    -0.06
     owned
    -0.06
     archae
    -0.06
    POSITIVE LOGITS
    iability
    0.07
    oku
    0.07
     Pain
    0.07
    0.07
    κού
    0.07
    0.06
     pain
    0.06
    ��
    0.06
    0.06
     miệng
    0.06
    Act Density 0.006%

    No Known Activations