INDEX
Explanations
The neuron detects occurrences of the adjective “optimal.”
New Auto-Interp
Negative Logits
person
-0.07
guy
-0.07
created
-0.07
hide
-0.07
made
-0.07
friend
-0.06
-expand
-0.06
girls
-0.06
-desc
-0.06
animals
-0.06
POSITIVE LOGITS
optimal
0.09
奥
0.08
optimum
0.08
Robbins
0.07
lepší
0.07
oint
0.07
Salmon
0.07
kval
0.07
unanimously
0.07
monetary
0.07
Activations Density 0.006%