INDEX
    Explanations

    gradient colors

    New Auto-Interp
    Negative Logits
    _boolean
    -0.09
     demol
    -0.09
    _delete
    -0.08
     공동
    -0.08
    _un
    -0.08
     burgl
    -0.08
    리는
    -0.08
    _data
    -0.08
    _false
    -0.08
    خص
    -0.07
    POSITIVE LOGITS
     gradient
    0.16
     gradients
    0.15
     Gradient
    0.15
    gradient
    0.14
    0.14
    Gradient
    0.14
     fades
    0.12
    _gradient
    0.12
    .gradient
    0.12
     fading
    0.11
    Act Density 0.009%

    No Known Activations