INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     סוג
    -0.08
     мех
    -0.07
     Losing
    -0.07
     منزل
    -0.07
    .deg
    -0.07
    Massage
    -0.07
     الثال
    -0.07
    -0.06
     reading
    -0.06
    -0.06
    POSITIVE LOGITS
    writer
    0.07
    elist
    0.07
    let
    0.07
    VID
    0.07
    win
    0.07
    bs
    0.07
     freed
    0.07
    	writer
    0.06
    forEach
    0.06
    numero
    0.06
    Act Density 0.028%

    No Known Activations