INDEX
    Explanations

    instances of a specific format or structure in the text, particularly related to references or signaling phrases

    New Auto-Interp
    Negative Logits
    повÑĸд
    -0.15
    splash
    -0.15
    acades
    -0.15
    ermen
    -0.15
    logg
    -0.15
    EGIN
    -0.14
    impse
    -0.14
    ecut
    -0.14
    äs
    -0.14
    rypton
    -0.14
    POSITIVE LOGITS
     RT
    0.19
    Assertion
    0.16
    mie
    0.15
     rt
    0.15
    (rt
    0.15
     Smooth
    0.14
    anic
    0.14
    urous
    0.14
    sexual
    0.14
    bjerg
    0.14
    Act Density 0.013%

    No Known Activations