INDEX
    Explanations

    comparisons of one thing being better than another

    New Auto-Interp
    Negative Logits
     emphat
    -1.28
     ftu
    -1.22
     increa
    -1.20
     aen
    -1.19
     Lég
    -1.18
     Juf
    -1.18
     „,
    -1.17
     lele
    -1.16
     fta
    -1.15
     meis
    -1.14
    POSITIVE LOGITS
     worse
    0.63
     sacrifice
    0.54
    マシ
    0.54
    atience
    0.52
     worst
    0.52
     Kč
    0.51
     than
    0.51
     risk
    0.51
     lieber
    0.51
     losing
    0.50
    Act Density 0.270%

    No Known Activations