INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.08
     Noticed
    -0.08
    .Exit
    -0.07
    andaş
    -0.07
     controversial
    -0.07
    <td
    -0.06
    ightly
    -0.06
     пункт
    -0.06
    -0.06
    ılır
    -0.06
    POSITIVE LOGITS
    0.06
    idf
    0.06
    _bb
    0.06
    ohan
    0.06
    0.06
    ――――
    0.06
    ]<=
    0.06
     SL
    0.06
    -object
    0.05
    BT
    0.05
    Act Density 0.199%

    No Known Activations