INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ignorant
    -0.07
     pokus
    -0.07
     breeding
    -0.06
     optics
    -0.06
     strangely
    -0.06
     Ring
    -0.06
    Editors
    -0.06
     Beled
    -0.06
     dicks
    -0.06
     Signed
    -0.06
    POSITIVE LOGITS
    ودی
    0.07
    Ba
    0.07
    0.07
    вою
    0.06
    मह
    0.06
    _BOTH
    0.06
    Create
    0.06
     Home
    0.06
     university
    0.06
     hare
    0.06
    Act Density 0.098%

    No Known Activations