INDEX
    Explanations

    mentions of cyber threats or malware

    words related to severe conditions or risks, particularly in the context of health and safety

    New Auto-Interp
    Negative Logits
     :=
    -0.66
     immedi
    -0.66
     clarity
    -0.61
     emphas
    -0.59
    amaz
    -0.58
     Defin
    -0.56
     caut
    -0.55
     Airl
    -0.55
     till
    -0.54
     endeavour
    -0.53
    POSITIVE LOGITS
     plagiar
    0.98
     hoax
    0.88
     secretly
    0.87
     pedoph
    0.87
     illegally
    0.85
     faked
    0.84
     actually
    0.80
     improperly
    0.79
     fraud
    0.77
     inappropriately
    0.77
    Act Density 0.792%

    No Known Activations