INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    (cell
    -0.07
     confinement
    -0.07
    (canvas
    -0.07
    -0.07
    _sheet
    -0.07
    (job
    -0.07
     Oversight
    -0.06
     sea
    -0.06
    RowCount
    -0.06
    Early
    -0.06
    POSITIVE LOGITS
    icates
    0.07
    זמ
    0.07
    /design
    0.07
    ogue
    0.07
    基�
    0.07
    0.06
    向记者
    0.06
    0.06
    ます
    0.06
    0.06
    Act Density 0.099%

    No Known Activations