INDEX
    Explanations

    phrases that convey contrast, causation, or consideration

    New Auto-Interp
    Negative Logits
    into
    -0.49
    pu
    -0.45
     заслу
    -0.44
     became
    -0.44
    Stop
    -0.43
    Into
    -0.42
     INTO
    -0.42
     hold
    -0.41
    had
    -0.41
    ADE
    -0.40
    POSITIVE LOGITS
    tagHelperRunner
    0.81
     eftersom
    0.81
     perquè
    0.78
     ponieważ
    0.77
     kerana
    0.77
     puisque
    0.76
     poichè
    0.74
     aunque
    0.73
     поскольку
    0.73
     deoarece
    0.72
    Act Density 0.746%

    No Known Activations