INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     المر
    -0.06
     Sandra
    -0.06
    (Icons
    -0.06
    Main
    -0.06
    fp
    -0.06
     ${
    -0.06
     sexe
    -0.06
    Jan
    -0.06
     Paramount
    -0.06
    MH
    -0.06
    POSITIVE LOGITS
    ED
    0.12
    ed
    0.11
     ed
    0.08
    Med
    0.08
    Fed
    0.07
     ned
    0.07
    вед
    0.07
    wed
    0.07
    ked
    0.07
    ед
    0.07
    Act Density 0.108%

    No Known Activations