INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     sources
    -0.08
    measure
    -0.07
    emotion
    -0.07
     burada
    -0.07
     nhật
    -0.07
     Harvest
    -0.07
     skyrocket
    -0.07
     Sections
    -0.07
     και
    -0.07
    ��
    -0.07
    POSITIVE LOGITS
    ekkür
    0.07
     gorgeous
    0.07
    IW
    0.06
     вет
    0.06
     violates
    0.06
     Boo
    0.06
    middleware
    0.06
     Aless
    0.06
    _ipv
    0.06
     dred
    0.06
    Act Density 0.002%

    No Known Activations