INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     произ
    -0.07
    55
    -0.07
     Morris
    -0.07
     Jahr
    -0.07
     alum
    -0.07
     do
    -0.07
     comparison
    -0.07
    .did
    -0.07
    44
    -0.07
    ,'%
    -0.07
    POSITIVE LOGITS
     gate
    0.18
     Gate
    0.16
    gate
    0.14
     gates
    0.13
    Gate
    0.13
     Gates
    0.12
     gateway
    0.12
     gating
    0.11
    _gate
    0.10
    Gateway
    0.09
    Act Density 0.009%

    No Known Activations