INDEX
    Explanations

    describing how something is done

    New Auto-Interp
    Negative Logits
    -2.75
    -2.73
    aktionen
    -2.61
     alojamientos
    -2.59
    dáms
    -2.59
     ruinas
    -2.59
     みた
    -2.53
    tentu
    -2.45
     profondo
    -2.42
    -2.42
    POSITIVE LOGITS
    <td>
    3.31
    .
    3.30
    This
    2.64
    ка
    2.55
    <em>
    2.53
     Many
    2.52
    re
    2.42
    是被
    2.39
     From
    2.36
    on
    2.30
    Act Density 0.004%

    No Known Activations