INDEX
    Explanations

    and, punctuation, structure

    New Auto-Interp
    Negative Logits
    elabel
    0.44
    ହା
    0.42
     twofold
    0.42
    тель
    0.41
    alda
    0.41
     ñ
    0.40
    olist
    0.39
    afstand
    0.39
     confronts
    0.39
    obatan
    0.38
    POSITIVE LOGITS
    表達
    0.46
    丝毫
    0.44
    0.44
     terre
    0.43
    表达
    0.42
     twist
    0.41
     wewnętr
    0.40
    μου
    0.39
     Twist
    0.38
    0.38
    Act Density 4.447%

    No Known Activations