INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    ike
    -0.26
    }\\
    -0.24
     preferring
    -0.24
     stained
    -0.23
    _builtin
    -0.23
    aar
    -0.23
    ILA
    -0.23
    brush
    -0.23
    (reinterpret
    -0.23
    atatype
    -0.23
    POSITIVE LOGITS
    ç»ŀ
    0.28
     gist
    0.26
    æĪİ
    0.25
    flow
    0.25
    -flow
    0.25
     zg
    0.25
    _flow
    0.25
    epar
    0.24
     Himself
    0.24
    pair
    0.24
    Act Density 0.005%

    No Known Activations

    This feature has no known activations.