INDEX
    Explanations

    phrases that signal formal, structured exposition—disclaimers, summaries, signposting, and instructional framing within a response.

    New Auto-Interp
    Negative Logits
    AIRMAN
    0.54
    สต
    0.46
     الأخ
    0.45
     الاخ
    0.45
    BIN
    0.42
    ف
    0.41
    KU
    0.40
     INumber
    0.40
    ضيف
    0.39
     CORPER
    0.39
    POSITIVE LOGITS
     skilled
    0.48
     Village
    0.44
    食品
    0.44
     Detect
    0.43
     gauche
    0.43
     Crest
    0.43
     Skilled
    0.43
     Caball
    0.43
     Inspect
    0.43
     Vig
    0.42
    Act Density 0.022%

    No Known Activations