INDEX
    Explanations

    mentions of unchecked situations or actions that may lead to escalation or negative consequences

    New Auto-Interp
    Negative Logits
    compromising
    -0.62
    solicited
    -0.61
    intelligible
    -0.58
    sightly
    -0.55
    comfor
    -0.52
    djang
    -0.51
    -0.50
    ilever
    -0.49
     itinéraire
    -0.49
    lwjgl
    -0.49
    POSITIVE LOGITS
     paff
    0.95
     territo
    0.87
     tramont
    0.86
     vns
    0.85
     meis
    0.82
     vnt
    0.81
     fuo
    0.80
     monaster
    0.80
     chèvre
    0.80
     fua
    0.80
    Act Density 0.200%

    No Known Activations