INDEX
    Explanations

    dashed and dotted lines

    New Auto-Interp
    Negative Logits
    -0.07
     đúng
    -0.06
    pyx
    -0.06
     ан
    -0.06
    HEAD
    -0.06
     incel
    -0.06
     xb
    -0.06
    _strcmp
    -0.06
     которое
    -0.05
     All
    -0.05
    POSITIVE LOGITS
    ا�
    0.09
     duct
    0.08
     demographics
    0.07
     capturing
    0.07
     Eisen
    0.07
     Supplement
    0.06
     sexism
    0.06
    ASURE
    0.06
    eleri
    0.06
    <Form
    0.06
    Act Density 0.001%

    No Known Activations