INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ig
    -0.09
    ic
    -0.08
    IC
    -0.08
     Ying
    -0.08
    .indices
    -0.07
    c
    -0.07
     lic
    -0.07
    ovic
    -0.07
     ci
    -0.07
    IG
    -0.07
    POSITIVE LOGITS
     from
    0.31
     From
    0.23
     FROM
    0.19
    from
    0.19
    From
    0.18
    	from
    0.15
    -from
    0.14
    —from
    0.14
    .From
    0.13
    #from
    0.13
    Act Density 0.269%

    No Known Activations