INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     antioxidants
    -0.08
     vaccines
    -0.07
    issant
    -0.07
     Innoc
    -0.07
     constrained
    -0.06
     GLOBAL
    -0.06
     homosexual
    -0.06
    [@"
    -0.06
    eut
    -0.06
    .foo
    -0.06
    POSITIVE LOGITS
     elaborate
    0.11
     elabor
    0.09
     effort
    0.07
    larg
    0.07
     efforts
    0.07
     tehlik
    0.07
    flush
    0.07
    Sorry
    0.06
    0.06
     programm
    0.06
    Act Density 0.007%

    No Known Activations