INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    STDOUT
    -0.07
    _POOL
    -0.06
     irony
    -0.06
    تد
    -0.06
    .integration
    -0.06
    HasBeenSet
    -0.06
    <context
    -0.06
    "This
    -0.06
     toddlers
    -0.06
     ideologies
    -0.06
    POSITIVE LOGITS
     combating
    0.07
    .prof
    0.06
    ster
    0.06
     obtaining
    0.06
    make
    0.06
    Brain
    0.06
     Bun
    0.06
     กำ
    0.06
     Bal
    0.06
    оит
    0.06
    Act Density 0.000%

    No Known Activations