INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Therefore
    -0.07
     Maurit
    -0.07
    _ratio
    -0.07
    ández
    -0.06
     بیرون
    -0.06
     hungry
    -0.06
     ander
    -0.06
     financier
    -0.06
    Attendance
    -0.06
     DNA
    -0.06
    POSITIVE LOGITS
     casi
    0.06
    まと
    0.06
    +Sans
    0.06
    ите
    0.06
     egal
    0.06
    _PREF
    0.06
    !).
    0.06
     weil
    0.06
    .AD
    0.06
    。',↵
    0.06
    Act Density 0.004%

    No Known Activations