INDEX
Explanations
terms related to malware or cyber attacks
references to a specific group or entity, particularly "trojan."
New Auto-Interp
Negative Logits
++++++++++++++++
-0.76
hips
-0.75
rentices
-0.74
maxwell
-0.71
ulse
-0.69
urities
-0.69
orship
-0.66
discriminating
-0.66
UAL
-0.63
ukong
-0.63
POSITIVE LOGITS
tro
1.27
dden
1.08
tted
0.95
tsky
0.95
tro
0.93
oping
0.86
opard
0.80
isure
0.78
estones
0.76
bled
0.75
Activations Density 0.007%