INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     любых
    0.44
    بینی
    0.44
    вающих
    0.41
     différents
    0.41
     polygons
    0.41
    異なる
    0.41
     quos
    0.40
     biopsies
    0.40
    giveness
    0.40
     pebbles
    0.40
    POSITIVE LOGITS
     educator
    0.69
     motivator
    0.67
    이자
    0.65
     fundraiser
    0.64
     winner
    0.61
     forerunner
    0.61
     leader
    0.60
     storyteller
    0.60
     protector
    0.59
     centerpiece
    0.59
    Act Density 0.059%

    No Known Activations