INDEX
Negative Logits
.Tables
-0.08
devoid
-0.08
_builtin
-0.08
Lif
-0.08
확인
-0.08
പരിശോധ
-0.08
podium
-0.08
widths
-0.07
관한
-0.07
confirms
-0.07
POSITIVE LOGITS
malicious
0.13
stealth
0.11
deception
0.10
phishing
0.10
deceptive
0.10
攻击
0.10
欺
0.10
defensive
0.10
Feign
0.10
malware
0.09
Activations Density 0.009%