INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     lind
    -0.08
     mj
    -0.08
     hilarious
    -0.08
     amante
    -0.08
     сав
    -0.07
    ублич
    -0.07
     Mozart
    -0.07
     $?
    -0.07
    hic
    -0.07
     التغ
    -0.07
    POSITIVE LOGITS
     hybrids
    0.10
     hybride
    0.09
     composites
    0.08
     hybrid
    0.08
    0.08
     foi
    0.08
    0.08
    0.08
     banned
    0.08
     híbr
    0.08
    Act Density 0.006%

    No Known Activations