INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     TASK
    -0.07
     Freud
    -0.07
     sider
    -0.06
     Tabs
    -0.06
     Lit
    -0.06
     Richardson
    -0.06
     ابتدا
    -0.06
     fac
    -0.06
     Parents
    -0.06
    Field
    -0.06
    POSITIVE LOGITS
    EN
    0.08
    orge
    0.08
    =/
    0.07
    en
    0.07
    0.07
    enh
    0.07
    ンガ
    0.07
    prehensive
    0.07
    WN
    0.07
    ен
    0.07
    Act Density 0.005%

    No Known Activations