INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     misogyn
    -0.07
     unaffected
    -0.07
    condition
    -0.07
    ولد
    -0.06
    taskId
    -0.06
    Animator
    -0.06
    =is
    -0.06
     STREAM
    -0.06
    WithIdentifier
    -0.06
    -Clause
    -0.06
    POSITIVE LOGITS
     enrol
    0.07
    _ADDR
    0.07
    DONE
    0.07
     deprivation
    0.06
     determination
    0.06
     WAS
    0.06
    LIGHT
    0.06
    Adjust
    0.06
     gone
    0.06
    λογ
    0.06
    Act Density 0.021%

    No Known Activations