INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    /Application
    -0.07
     franch
    -0.07
     slave
    -0.06
    Contrib
    -0.06
     gauge
    -0.06
     idols
    -0.06
    Writing
    -0.06
     Pest
    -0.06
    Starting
    -0.06
     soup
    -0.06
    POSITIVE LOGITS
     undermines
    0.07
    _SIDE
    0.06
    очек
    0.06
    ="/">
    0.06
    upro
    0.06
    0.06
    anske
    0.06
     темп
    0.06
    0.06
     Tay
    0.06
    Act Density 0.248%

    No Known Activations