INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Are
    0.77
    Is
    0.70
    Were
    0.69
    :
    0.69
    0.67
    0.66
    0.63
    (...)
    0.61
    ()
    0.61
    ،
    0.60
    POSITIVE LOGITS
     requires
    1.72
     gets
    1.66
     gives
    1.65
     promotes
    1.64
     reinforces
    1.63
     reduces
    1.63
     keeps
    1.61
     avoids
    1.60
     emphasizes
    1.60
     minimizes
    1.60
    Act Density 0.654%

    No Known Activations