INDEX
    Explanations

    negative outcomes or critical states

    New Auto-Interp
    Negative Logits
     évol
    0.43
     nicer
    0.43
     symmetrically
    0.40
     clums
    0.40
    びに
    0.40
     coarser
    0.40
     langfrist
    0.40
     remot
    0.38
     wiser
    0.38
    0.38
    POSITIVE LOGITS
     outright
    0.55
    0.50
    に近い
    0.49
     ಅಥವಾ
    0.49
     partiellement
    0.48
    0.46
     거의
    0.46
     تقريبا
    0.46
    幾乎
    0.45
     или
    0.45
    Act Density 0.237%

    No Known Activations