INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    roat
    -0.08
    patibility
    -0.08
    traî
    -0.07
     дов
    -0.07
    urnished
    -0.07
     Coordinates
    -0.07
    ention
    -0.07
    adero
    -0.07
     undergraduate
    -0.07
    obody
    -0.07
    POSITIVE LOGITS
    0.08
    0.07
     qi
    0.07
    0.06
    (file
    0.06
     «
    0.06
     Lucky
    0.06
    0.06
     обяз
    0.06
    网友们
    0.06
    Act Density 0.058%

    No Known Activations