INDEX
    Explanations

    phrases that denote purpose or reason within the text

    New Auto-Interp
    Negative Logits
    utar
    -0.17
    -arm
    -0.15
    iterr
    -0.14
    morph
    -0.14
    ande
    -0.14
    еж
    -0.14
    lland
    -0.14
    modo
    -0.14
    562
    -0.13
    uito
    -0.13
    POSITIVE LOGITS
    ilst
    0.16
    ernel
    0.15
    .synthetic
    0.15
    rea
    0.15
    904
    0.14
    ANO
    0.14
    antt
    0.14
    OLS
    0.14
    aves
    0.14
    اÙĨÙĪ
    0.13
    Act Density 0.242%

    No Known Activations