INDEX
    Explanations

    the presence of the word "de," indicating a focus on its significance in the text

    New Auto-Interp
    Negative Logits
    DEL
    -0.74
     Italijani
    -0.69
    MENAFN
    -0.68
     ProtoMessage
    -0.68
     disambiguazione
    -0.63
     Theſe
    -0.63
     DEL
    -0.62
    Autoritní
    -0.61
    deli
    -0.57
     éter
    -0.57
    POSITIVE LOGITS
     de
    1.28
     den
    0.75
    lin
    0.69
    /**
    0.68
     def
    0.65
    UnsafeEnabled
    0.64
     Lin
    0.63
     victoire
    0.60
    makeText
    0.56
    stens
    0.55
    Act Density 0.124%

    No Known Activations