INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    кас
    -0.07
    alf
    -0.07
    yan
    -0.07
    ares
    -0.06
     Problems
    -0.06
    heiro
    -0.06
    ไฟ
    -0.06
    iring
    -0.06
     human
    -0.06
    meni
    -0.06
    POSITIVE LOGITS
     Media
    0.07
     media
    0.07
     Breitbart
    0.07
    itbart
    0.07
    SetText
    0.06
     영상
    0.06
    .startsWith
    0.06
     香港
    0.06
    0.06
    _arr
    0.06
    Act Density 0.021%

    No Known Activations