INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ис
    -0.07
     Gilbert
    -0.07
    istrar
    -0.06
     gradients
    -0.06
    .ColumnStyle
    -0.06
    Frequency
    -0.05
     Diesel
    -0.05
     ninja
    -0.05
     skirt
    -0.05
     زاده
    -0.05
    POSITIVE LOGITS
     hoping
    0.18
     hoped
    0.16
     hope
    0.16
     hopes
    0.15
     Hope
    0.14
    Hope
    0.11
    hope
    0.09
     hopefully
    0.08
     hopeful
    0.08
     Sadly
    0.08
    Act Density 0.022%

    No Known Activations