INDEX
    Explanations

    explanations

    New Auto-Interp
    Negative Logits
     lights
    -0.60
     Break
    -0.56
     break
    -0.54
     rockets
    -0.53
     Vere
    -0.52
     themes
    -0.51
     breaks
    -0.50
    --]
    -0.50
     parades
    -0.50
     spots
    -0.50
    POSITIVE LOGITS
     تضيفلها
    1.05
     Vikipedi
    0.70
     يتيمه
    0.63
    onomía
    0.63
     thérape
    0.62
     ویکی‌پدیا
    0.62
     ferdig
    0.61
     témoins
    0.61
     للمعارف
    0.60
     redor
    0.60
    Act Density 0.017%

    No Known Activations