INDEX
    Explanations

    the word "to," indicating actions or purposes

    New Auto-Interp
    Negative Logits
     the
    -0.43
    ,
    -0.43
     what
    -0.35
     mundiales
    -0.31
     têtes
    -0.30
     histórica
    -0.29
    inqui
    -0.29
     official
    -0.29
     news
    -0.28
     refuer
    -0.28
    POSITIVE LOGITS
     utafitiHapana
    0.95
    出版年
    0.89
    <unused41>
    0.88
    [@BOS@]
    0.88
    <unused51>
    0.88
    <unused16>
    0.88
    <unused43>
    0.88
    <unused42>
    0.88
    <unused3>
    0.88
    <unused14>
    0.88
    Act Density 0.022%

    No Known Activations