INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .widgets
    -0.07
    Aside
    -0.07
     shift
    -0.07
    Overview
    -0.07
    -work
    -0.07
     Studies
    -0.07
    вичай
    -0.07
     Leigh
    -0.07
    uity
    -0.07
     admittedly
    -0.07
    POSITIVE LOGITS
     dense
    0.11
     Dense
    0.10
    _dense
    0.08
    dense
    0.07
    +E
    0.07
    EN
    0.07
    >E
    0.07
     dens
    0.07
    E
    0.07
    .E
    0.06
    Act Density 0.004%

    No Known Activations