INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.07
     edi
    -0.07
     норм
    -0.06
    이었다
    -0.06
    _f
    -0.06
    Tiny
    -0.06
    بری
    -0.06
    ,%
    -0.06
    .feed
    -0.06
     دفاع
    -0.06
    POSITIVE LOGITS
    INCLUDING
    0.11
    CONTENT
    0.07
     Akron
    0.06
    lerinden
    0.06
    -Agent
    0.06
    PGA
    0.06
    εκ
    0.06
    ическая
    0.06
    FromClass
    0.06
    -away
    0.06
    Act Density 0.001%

    No Known Activations