INDEX
    Explanations

    references to religious figures and communities

    New Auto-Interp
    Negative Logits
    ãĥ³ãĥĦ
    -0.16
     Laden
    -0.15
    iane
    -0.14
    ucu
    -0.14
     Metals
    -0.14
    rzy
    -0.14
    ubo
    -0.13
    udic
    -0.13
    .native
    -0.13
     Sabbath
    -0.13
    POSITIVE LOGITS
    adel
    0.16
     spin
    0.15
     spun
    0.14
     Hosp
    0.14
    osp
    0.14
    abet
    0.14
    etas
    0.14
     strncpy
    0.14
    à¤łà¤¨
    0.13
    _iff
    0.13
    Act Density 0.032%

    No Known Activations