INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ,param
    -0.08
     awhile
    -0.08
    blockquote
    -0.07
     skyrocket
    -0.07
     rais
    -0.07
     Dương
    -0.07
    appointed
    -0.07
     pán
    -0.07
                   
    -0.07
     incremented
    -0.06
    POSITIVE LOGITS
     self
    0.09
     selfish
    0.09
     Self
    0.08
    Sense
    0.07
     french
    0.07
    TF
    0.07
     SELF
    0.07
    SPI
    0.07
     PDF
    0.07
    self
    0.06
    Act Density 0.040%

    No Known Activations