INDEX
    Explanations

    dates, categories, or locations

    New Auto-Interp
    Negative Logits
     علاوه
    0.45
    .
    0.44
    ства
    0.39
    ها
    0.39
    IsPass
    0.38
    Estimation
    0.38
    WithType
    0.35
    ولا
    0.35
    яви
    0.34
     والاست
    0.34
    POSITIVE LOGITS
    ü
    0.53
    d
    0.52
    to
    0.52
     to
    0.50
    ă
    0.45
    0.44
     ﺍﻟ
    0.44
    ı
    0.43
    im
    0.42
    َی
    0.42
    Act Density 2.242%

    No Known Activations