INDEX
    Explanations

    Individuality/Belonging

    New Auto-Interp
    Negative Logits
     пребы
    -0.08
    capitalize
    -0.08
     precaution
    -0.07
    .seconds
    -0.07
     erections
    -0.07
    .capitalize
    -0.07
    -fed
    -0.07
     lid
    -0.07
    samples
    -0.07
    108
    -0.07
    POSITIVE LOGITS
     riêng
    0.10
     nuances
    0.09
     особенности
    0.09
     aturan
    0.09
     ню
    0.08
     hierarchy
    0.08
     encanto
    0.08
     flair
    0.08
     governing
    0.08
     charme
    0.08
    Act Density 0.044%

    No Known Activations