INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     fried
    -0.07
     epoch
    -0.06
     Hull
    -0.06
     Abed
    -0.06
    .ID
    -0.06
    -pl
    -0.06
     attached
    -0.06
    ?(:
    -0.06
    ilated
    -0.06
    .sat
    -0.06
    POSITIVE LOGITS
     sadece
    0.08
     기다
    0.07
    ;",
    0.07
     tăng
    0.07
    ению
    0.06
    (QL
    0.06
    ندر
    0.06
     hele
    0.06
    _CRE
    0.06
     La
    0.06
    Act Density 0.011%

    No Known Activations