INDEX
    Explanations

    words and phrases that indicate quotations or citations

    numbers and punctuation followed by words

    New Auto-Interp
    Negative Logits
    <unused8>
    -1.04
    [@BOS@]
    -1.04
    <unused41>
    -1.04
    <unused68>
    -1.04
    <unused16>
    -1.03
    <unused14>
    -1.03
    <unused23>
    -1.03
    <unused28>
    -1.03
    <pad>
    -1.03
    <unused3>
    -1.03
    POSITIVE LOGITS
     elétrico
    0.27
    ↵↵
    0.26
     resultaat
    0.25
     sekunder
    0.24
     samym
    0.24
     elástico
    0.23
     actuales
    0.23
     atuais
    0.23
     resultaten
    0.23
     działania
    0.22
    Act Density 0.738%

    No Known Activations