INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     hated
    -0.06
    .yy
    -0.06
     прибы
    -0.06
     attacking
    -0.06
     Yang
    -0.06
    sigma
    -0.06
     Lump
    -0.06
     Broadcasting
    -0.06
     Cedar
    -0.06
    -roll
    -0.06
    POSITIVE LOGITS
    ENT
    0.08
    εν
    0.08
    ent
    0.07
    .Read
    0.07
    ент
    0.07
    vent
    0.07
    VENT
    0.07
     concent
    0.07
    nt
    0.07
    เทศ
    0.06
    Act Density 0.045%

    No Known Activations