INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     đức
    -0.08
    -0.07
    TEXT
    -0.07
     Trọng
    -0.07
     gerade
    -0.07
     مدى
    -0.07
    Ϋ
    -0.06
    hear
    -0.06
     zu
    -0.06
    trecht
    -0.06
    POSITIVE LOGITS
     subsid
    0.08
    atitis
    0.08
     cramped
    0.07
    Hour
    0.07
    _pw
    0.07
     cosas
    0.07
     ano
    0.07
    ável
    0.07
    igm
    0.07
    旅馆
    0.07
    Act Density 0.155%

    No Known Activations