INDEX
Explanations
The neuron detects mentions of malware family names (tokens like “NAME_1” introduced after words such as “called” or “named”).
New Auto-Interp
Negative Logits
ими
-0.08
NAS
-0.06
Movie
-0.06
defin
-0.06
heroine
-0.06
tc
-0.06
locate
-0.06
legal
-0.06
curator
-0.06
альной
-0.06
POSITIVE LOGITS
tain
0.07
otten
0.06
Annunci
0.06
vaping
0.06
εισ
0.06
intelligence
0.06
ğit
0.06
prezident
0.06
_sn
0.06
mHandler
0.06
Activations Density 0.009%