INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     rhetoric
    -0.06
     comercial
    -0.06
    iteit
    -0.06
     Empire
    -0.06
     fila
    -0.06
     през
    -0.06
     Six
    -0.06
     Lawn
    -0.06
     polite
    -0.06
     photographers
    -0.06
    POSITIVE LOGITS
     شور
    0.07
    REV
    0.07
    -heart
    0.06
    关系
    0.06
     ctrl
    0.06
     (:
    0.06
     [--
    0.06
     równ
    0.06
    =====↵
    0.06
    0.06
    Act Density 0.000%

    No Known Activations