INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     halt
    -0.08
     الوف
    -0.07
    00
    -0.07
    Opp
    -0.06
    Ð
    -0.06
    -0.06
     chtě
    -0.06
    _tweet
    -0.06
    Η
    -0.06
    -0.06
    POSITIVE LOGITS
     figured
    0.13
     figuring
    0.08
     reckon
    0.08
     figure
    0.07
    ейств
    0.07
    (power
    0.06
     broadly
    0.06
    (peer
    0.06
     Pager
    0.06
     lever
    0.06
    Act Density 0.003%

    No Known Activations