INDEX
    Explanations

    language that expresses contrast and conditionality

    New Auto-Interp
    Negative Logits
    raw
    -0.43
     original
    -0.42
      
    -0.41
    kle
    -0.39
     es
    -0.38
    oka
    -0.38
     voran
    -0.38
    stat
    -0.38
     sharing
    -0.37
     پار
    -0.37
    POSITIVE LOGITS
     when
    1.60
     during
    1.42
    when
    1.35
     cuando
    1.35
    during
    1.34
     DURING
    1.29
     quando
    1.23
     later
    1.22
     WHEN
    1.21
     lorsque
    1.20
    Act Density 0.931%

    No Known Activations