INDEX
Explanations
The neuron activates on occurrences of the word “Teen” (including parts of “Teenager”).
New Auto-Interp
Negative Logits
Map
-0.07
Rut
-0.07
Zinc
-0.07
Liu
-0.07
Hugh
-0.07
map
-0.07
Bison
-0.07
Wit
-0.07
gold
-0.06
ith
-0.06
POSITIVE LOGITS
teen
0.11
Teen
0.10
adolescent
0.09
teenage
0.09
teens
0.08
Teen
0.08
teenagers
0.08
teacher
0.08
adden
0.07
adolescents
0.07
Activations Density 0.007%