INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     cibl
    -0.08
     müd
    -0.08
     cib
    -0.08
     doses
    -0.08
     mitigate
    -0.08
     حجر
    -0.08
     workplace
    -0.07
     motivations
    -0.07
    -0.07
     interviewer
    -0.07
    POSITIVE LOGITS
     bowed
    0.08
    确定
    0.08
    线
    0.08
     snar
    0.08
    blink
    0.08
     determinação
    0.07
    take
    0.07
    arl
    0.07
    lines
    0.07
    polygon
    0.07
    Act Density 0.004%

    No Known Activations