INDEX
    Explanations

    whether to present options

    New Auto-Interp
    Negative Logits
     both
    0.50
    both
    0.46
     nejen
    0.45
     ليس
    0.43
     wouldn
    0.43
     unless
    0.43
     cả
    0.42
     både
    0.42
     नही
    0.42
     બંને
    0.41
    POSITIVE LOGITS
     merely
    0.56
    whether
    0.53
    是通过
    0.51
     речь
    0.50
     whether
    0.50
    只是
    0.49
     purely
    0.49
     是否
    0.49
    仅仅
    0.46
    just
    0.44
    Act Density 0.033%

    No Known Activations