INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     descri
    -0.07
    or
    -0.07
     Earl
    -0.07
    /article
    -0.07
     نیروی
    -0.07
     элек
    -0.07
     Crud
    -0.07
     visitor
    -0.06
    readcr
    -0.06
     visitors
    -0.06
    POSITIVE LOGITS
     same
    0.23
    same
    0.19
     Same
    0.19
    Same
    0.16
     SAME
    0.15
    SAME
    0.13
    _same
    0.13
     mismo
    0.11
    .same
    0.11
    ame
    0.10
    Act Density 0.051%

    No Known Activations