INDEX
    Explanations

    references to religious practices and principles

    New Auto-Interp
    Negative Logits
     nữa
    -0.50
    所以
    -0.45
    ubi
    -0.45
    abili
    -0.44
     Pourtant
    -0.43
    حاد
    -0.42
     Deshalb
    -0.40
     Nope
    -0.40
     đâu
    -0.40
    Moreover
    -0.39
    POSITIVE LOGITS
     using
    1.56
     ignoring
    1.28
     keeping
    1.25
     assuming
    1.24
     utilizando
    1.23
     utilizing
    1.23
     utilizzando
    1.23
     используя
    1.22
     USING
    1.21
    using
    1.20
    Act Density 1.022%

    No Known Activations