INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     destro
    -0.34
     disadvant
    -0.32
     challeng
    -0.29
     undermin
    -0.29
     awa
    -0.28
     rul
    -0.27
     incent
    -0.27
     embr
    -0.26
    '."
    -0.26
     distingu
    -0.25
    POSITIVE LOGITS
     âĢº
    0.24
    Screenshot
    0.20
     screenshots
    0.20
    atform
    0.20
    owned
    0.19
    reenshots
    0.18
    cigarettes
    0.18
     guiActive
    0.18
    icol
    0.18
    osures
    0.17
    Act Density 7.010%

    No Known Activations