INDEX
Explanations
mentions of cyber threats or malware
words related to severe conditions or risks, particularly in the context of health and safety
New Auto-Interp
Negative Logits
:=
-0.66
immedi
-0.66
clarity
-0.61
emphas
-0.59
amaz
-0.58
Defin
-0.56
caut
-0.55
Airl
-0.55
till
-0.54
endeavour
-0.53
POSITIVE LOGITS
plagiar
0.98
hoax
0.88
secretly
0.87
pedoph
0.87
illegally
0.85
faked
0.84
actually
0.80
improperly
0.79
fraud
0.77
inappropriately
0.77
Activations Density 0.792%