INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Roughly
    -1.44
     However
    -1.35
    but
    -1.34
    一方で
    -1.32
    That
    -1.30
     Typically
    -1.28
    截至
    -1.28
     اگر
    -1.27
     Nonetheless
    -1.25
     Würde
    -1.24
    POSITIVE LOGITS
    itzung
    1.42
    ’,
    1.41
    --}}
    1.35
     terceira
    1.34
     menangis
    1.33
     soldat
    1.32
     programmation
    1.32
    1.31
    cerpts
    1.30
     pesa
    1.30
    Act Density 0.029%

    No Known Activations