INDEX
    Explanations

    introducing purpose or consequence

    New Auto-Interp
    Negative Logits
     because
    -1.57
     and
    -1.55
     eftersom
    -1.43
     since
    -1.40
     that
    -1.38
     dlatego
    -1.27
     omdat
    -1.23
     потому
    -1.16
     ponieważ
    -1.15
     protože
    -1.13
    POSITIVE LOGITS
     they
    2.31
     we
    1.92
     cuando
    1.75
     ketika
    1.73
     можно
    1.69
     lorsque
    1.64
     can
    1.59
     nantinya
    1.59
     later
    1.57
     można
    1.53
    Act Density 0.029%

    No Known Activations