INDEX
    Explanations

    references to religious and spiritual concepts

    New Auto-Interp
    Negative Logits
    inters
    -0.15
    atan
    -0.14
    pez
    -0.14
    ÑĢеÑĪ
    -0.14
    lander
    -0.13
    none
    -0.13
    fty
    -0.13
    inez
    -0.13
    ertz
    -0.12
    olkien
    -0.12
    POSITIVE LOGITS
     term
    0.54
     word
    0.39
    Term
    0.36
     Term
    0.36
    term
    0.36
     TERM
    0.34
    _term
    0.32
    -term
    0.31
     phrase
    0.31
    TERM
    0.30
    Act Density 0.497%

    No Known Activations