INDEX
    Explanations

    instances of high impact or critical information

    New Auto-Interp
    Negative Logits
     ðŁĴ
    -0.27
     ðŁij
    -0.23
     ðŁĶ
    -0.23
     ðŁ
    -0.23
    ðŁĴ
    -0.22
     selfie
    -0.21
    ðŁ
    -0.21
     WhatsApp
    -0.21
    https
    -0.21
     selfies
    -0.20
    POSITIVE LOGITS
     homosex
    0.18
     bout
    0.17
     prob
    0.17
     prol
    0.16
     :]↵
    0.15
     orig
    0.15
     beta
    0.15
    ulti
    0.15
     sum
    0.15
     age
    0.15
    Act Density 0.013%

    No Known Activations