INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    _super
    -0.08
     маг
    -0.07
     rng
    -0.07
     hateful
    -0.06
    _sold
    -0.06
    -0.06
    ají
    -0.06
    .onStart
    -0.06
     Eug
    -0.06
     Ter
    -0.06
    POSITIVE LOGITS
     пораж
    0.06
     yerinde
    0.06
    /problems
    0.06
    .ds
    0.06
     LS
    0.06
     cn
    0.06
    CODE
    0.06
    .j
    0.06
     ava
    0.06
    0.06
    Act Density 0.010%

    No Known Activations