INDEX
Explanations
The neuron consistently activates on the word “prevent,” indicating it’s looking for instances of that term.
New Auto-Interp
Negative Logits
onto
-0.07
Shirt
-0.06
kos
-0.06
αρι
-0.06
idol
-0.06
adm
-0.06
italic
-0.06
stud
-0.06
fid
-0.06
Kok
-0.06
POSITIVE LOGITS
prevents
0.15
prevent
0.14
prevented
0.14
preventing
0.14
prevent
0.11
Prevent
0.11
Prevention
0.10
prevention
0.08
.preventDefault
0.08
أن
0.08
Activations Density 0.022%