INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     neste
    -0.08
    -0.07
     mere
    -0.07
    _loc
    -0.07
    reads
    -0.07
    assen
    -0.07
     перера
    -0.07
    ently
    -0.07
     другом
    -0.07
     rational
    -0.07
    POSITIVE LOGITS
    تن
    0.10
     hoof
    0.08
     Swim
    0.08
     Ama
    0.08
     paw
    0.07
     Mask
    0.07
     sce
    0.07
     Bath
    0.07
     Cou
    0.07
     Flowers
    0.07
    Act Density 0.057%

    No Known Activations