INDEX
    Explanations

    phrases indicating things that function effectively or exceed expectations

    New Auto-Interp
    Negative Logits
     Outs
    -0.18
     outs
    -0.18
    .comp
    -0.17
    ubic
    -0.16
    оÑı
    -0.16
     outline
    -0.15
    outs
    -0.15
    elsey
    -0.15
    NullOr
    -0.15
    quip
    -0.15
    POSITIVE LOGITS
     gate
    0.35
     Gate
    0.29
     gates
    0.28
    gate
    0.28
    Gate
    0.27
    _gate
    0.25
     Gates
    0.25
     blocks
    0.20
    éĸ
    0.18
    éĸĢ
    0.18
    Act Density 0.014%

    No Known Activations