INDEX
    Explanations

    phrases indicating permission or restrictions

    New Auto-Interp
    Negative Logits
     McGu
    -0.15
     Kosten
    -0.15
    zman
    -0.15
    ifecycle
    -0.15
    ross
    -0.15
    anium
    -0.14
    oproject
    -0.14
    atham
    -0.14
     Bew
    -0.14
    غاÙĦ
    -0.14
    POSITIVE LOGITS
     mention
    0.42
     mentions
    0.30
    mention
    0.29
     Mention
    0.27
     ment
    0.27
     mentioned
    0.25
     mentioning
    0.24
     forget
    0.23
    mentioned
    0.21
     worry
    0.21
    Act Density 0.008%

    No Known Activations