INDEX
Explanations
This neuron primarily activates on the words “they” and “do,” effectively detecting occurrences of the pronoun “they” (often in the phrase “they do”).
New Auto-Interp
Negative Logits
Category
-0.07
Tags
-0.07
veter
-0.06
Privacy
-0.06
sneakers
-0.06
Guys
-0.06
اصل
-0.06
saints
-0.06
reunion
-0.06
,但
-0.06
POSITIVE LOGITS
shall
0.06
setContent
0.06
odyn
0.06
ナ
0.06
WARDED
0.06
лением
0.06
허
0.06
SB
0.06
ίζει
0.06
busiest
0.06
Activations Density 0.000%