INDEX
    Explanations

    phrases indicating causal relationships and important points within a text

    New Auto-Interp
    Negative Logits
    ubits
    -0.17
    egra
    -0.16
    readcr
    -0.15
     доÑĢ
    -0.15
    ики
    -0.14
    peare
    -0.14
    ARGET
    -0.14
    ÏĢλα
    -0.14
     trouble
    -0.14
    alie
    -0.14
    POSITIVE LOGITS
    ÂŃi
    0.15
    ulares
    0.14
    jom
    0.14
     vô
    0.14
    AUTHORIZED
    0.14
     Fol
    0.13
     prompt
    0.13
    acus
    0.13
     net
    0.13
     conclude
    0.13
    Act Density 0.295%

    No Known Activations