INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     then
    -1.59
    then
    -1.34
     entonces
    -1.30
     THEN
    -1.27
    Then
    -1.21
    THEN
    -1.20
     allora
    -1.14
     Then
    -1.13
    alors
    -1.09
    IntoConstraints
    -1.09
    POSITIVE LOGITS
     went
    0.67
     to
    0.62
     got
    0.62
     took
    0.62
     asked
    0.60
     came
    0.59
     was
    0.58
     being
    0.58
     gave
    0.58
     had
    0.57
    Act Density 0.037%

    No Known Activations