INDEX
    Explanations

    spatial/temporal relations

    New Auto-Interp
    Negative Logits
     ід
    -0.07
    oku
    -0.07
    rooms
    -0.07
    _axis
    -0.07
    (return
    -0.07
     smuggling
    -0.06
    ↵   ↵
    -0.06
    股东
    -0.06
    ابق
    -0.06
     ↵
    -0.06
    POSITIVE LOGITS
     amor
    0.07
     ديسمبر
    0.06
    ISODE
    0.06
     Şimdi
    0.06
    žití
    0.06
     nastav
    0.06
     vais
    0.06
     Tyto
    0.06
     afterward
    0.06
     Kendrick
    0.06
    Act Density 0.144%

    No Known Activations