INDEX
Explanations
This neuron activates on occurrences of the word “flag” (and closely related tokens), i.e. it detects mentions of flags.
New Auto-Interp
Negative Logits
vulnerabilities
-0.07
ouve
-0.07
676
-0.06
emphasizes
-0.06
dh
-0.06
839
-0.06
cem
-0.06
hor
-0.06
Tomorrow
-0.06
tissues
-0.05
POSITIVE LOGITS
अम
0.07
ナ
0.07
برنامه
0.07
knights
0.07
lassian
0.07
="./
0.06
panel
0.06
매매
0.06
글
0.06
>/<
0.06
Activations Density 0.009%