INDEX
    Explanations

    references to sin or sinful actions

    New Auto-Interp
    Negative Logits
    innacle
    -0.16
    hrad
    -0.15
    dy
    -0.15
    ober
    -0.15
    ľ
    -0.14
    a
    -0.14
     Obst
    -0.14
    ahn
    -0.14
    nants
    -0.14
    enga
    -0.14
    POSITIVE LOGITS
    fully
    0.15
    ëį
    0.15
    еди
    0.14
    ứng
    0.14
    ably
    0.14
    ples
    0.14
    acer
    0.14
    abcdefghijklmnop
    0.14
    ê·ľ
    0.14
    ously
    0.14
    Act Density 0.014%

    No Known Activations