INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ência
    -0.07
    LINE
    -0.06
     estimated
    -0.06
     bookmark
    -0.06
    ENTE
    -0.06
     Ά
    -0.06
     difícil
    -0.06
    _present
    -0.06
    358
    -0.06
    -0.06
    POSITIVE LOGITS
     spies
    0.07
    adla
    0.07
     hinges
    0.06
     subsid
    0.06
    mel
    0.06
     ψ
    0.06
    =User
    0.06
     (**
    0.06
    >');
    ↵
    0.06
     γυνα
    0.06
    Act Density 0.083%

    No Known Activations