INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Wen
    -0.07
     therm
    -0.07
     چت
    -0.06
    translator
    -0.06
     firewall
    -0.06
    ありがとうござ
    -0.06
    CMP
    -0.06
    -0.06
     OSX
    -0.06
     clientes
    -0.06
    POSITIVE LOGITS
    >>(
    0.07
     [~,
    0.06
     zamanda
    0.06
    ovky
    0.06
    ```
    0.06
     uncover
    0.06
     plt
    0.06
     закон
    0.06
    .ViewModels
    0.06
    Upgrade
    0.06
    Act Density 0.024%

    No Known Activations