INDEX
Explanations
The neuron detects descriptive mentions of undulating landscape features, especially “rolling hills.”
New Auto-Interp
Negative Logits
Sure
-0.07
Insider
-0.07
edit
-0.06
ists
-0.06
harm
-0.06
'],'
-0.06
Fried
-0.06
IRECT
-0.06
spheres
-0.06
-course
-0.06
POSITIVE LOGITS
Episode
0.07
gil
0.07
dolphins
0.07
iless
0.07
optgroup
0.06
Poetry
0.06
pockets
0.06
birthday
0.06
营业
0.06
attle
0.06
Activations Density 0.005%