INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    )section
    -0.08
     אתה
    -0.07
    -0.07
     lecturer
    -0.07
    ocity
    -0.07
    ady
    -0.07
    vention
    -0.07
    נזק
    -0.07
    -tag
    -0.07
     whatever
    -0.07
    POSITIVE LOGITS
     Stim
    0.08
     mindfulness
    0.07
    UST
    0.07
    (_:
    0.07
    Sold
    0.07
    (acc
    0.07
    INIT
    0.07
    LY
    0.07
    empl
    0.07
     Rebels
    0.07
    Act Density 0.186%

    No Known Activations