INDEX
Explanations
The neuron is looking for modal qualifier adverbs—most strongly firing on words like “possible” or “necessary.”
New Auto-Interp
Negative Logits
uters
-0.07
HTTPRequestOperation
-0.06
tragedies
-0.06
urpose
-0.06
fried
-0.06
.orange
-0.06
brief
-0.06
-graph
-0.06
rieben
-0.06
случай
-0.06
POSITIVE LOGITS
Roma
0.07
살
0.07
unb
0.06
еров
0.06
Lond
0.06
hava
0.06
****↵
0.06
Rarity
0.06
syncing
0.06
exter
0.06
Activations Density 0.022%