INDEX
    Explanations

    references to specific religious figures or practices

    New Auto-Interp
    Negative Logits
    uo
    -0.18
    à¥Ĥत
    -0.17
    rale
    -0.17
     disp
    -0.15
    ARRIER
    -0.15
    erro
    -0.15
    iem
    -0.15
    ambi
    -0.15
    eno
    -0.15
    oders
    -0.15
    POSITIVE LOGITS
    opi
    0.27
    anes
    0.25
    opal
    0.25
    hat
    0.24
    op
    0.23
    wal
    0.21
    ajar
    0.21
    opis
    0.21
    aur
    0.20
    urga
    0.20
    Act Density 0.014%

    No Known Activations