INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    _CP
    -0.06
    _FA
    -0.06
     Lust
    -0.06
     Guides
    -0.06
     slices
    -0.06
    иц
    -0.06
     Village
    -0.06
    .Brand
    -0.06
    اية
    -0.06
    _correct
    -0.06
    POSITIVE LOGITS
     Oakland
    0.09
    querySelector
    0.07
     первую
    0.07
     Berkeley
    0.07
     inability
    0.06
    чива
    0.06
     cowork
    0.06
    خوان
    0.06
    0.06
     è
    0.06
    Act Density 0.013%

    No Known Activations