INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    resident
    -0.09
     dove
    -0.08
    -0.08
    -0.08
     noqa
    -0.07
    wnie
    -0.07
    irio
    -0.07
    bolo
    -0.07
    antos
    -0.07
    comment
    -0.07
    POSITIVE LOGITS
     Evid
    0.08
     Det
    0.08
    ежать
    0.07
    адки
    0.07
     footprint
    0.07
     evid
    0.07
    Closest
    0.07
     journalism
    0.07
     ци
    0.07
    Detach
    0.07
    Act Density 0.002%

    No Known Activations