INDEX
    Explanations

    references to rewards and incentives for engagement

    New Auto-Interp
    Negative Logits
     nearly
    -0.24
     almost
    -0.22
    Nearly
    -0.17
     Nearly
    -0.17
    almost
    -0.16
     Almost
    -0.16
    anker
    -0.14
    Almost
    -0.14
     thá»ĥ
    -0.14
    ç´Ħ
    -0.14
    POSITIVE LOGITS
    100
    0.40
    500
    0.38
    10
    0.27
    250
    0.27
     Hundred
    0.26
    50
    0.26
     hundred
    0.25
     ten
    0.24
    999
    0.23
     thousand
    0.22
    Act Density 0.101%

    No Known Activations