INDEX
    Explanations

    instances of articles and demonstrative adjectives indicating specificity or quantity

    New Auto-Interp
    Negative Logits
    adoo
    -0.17
    hiba
    -0.16
    emma
    -0.16
    enda
    -0.15
    opoulos
    -0.15
    anel
    -0.14
    .criteria
    -0.14
    æĪ¶
    -0.14
    Äįka
    -0.14
    æk
    -0.14
    POSITIVE LOGITS
     æĶ
    0.16
    ãĥ³ãĥģ
    0.15
     lan
    0.15
    iding
    0.14
    _swap
    0.14
     Lor
    0.14
    -await
    0.14
    alker
    0.13
    anko
    0.13
     scand
    0.13
    Act Density 0.067%

    No Known Activations