Comments
Hmm to be clear it fires most strongly on *units and descriptions* of measurements, not the entire measurements themselves (eg "long", "high", "metres") not "120". It also has top positive logits diameter, radius, circumference - which are not measurement units.
It seems to fire most strongly on measurements, but also fires on colors like red, black, and white. It also seems to fire strongest when there are lists of measurements and descriptions.
Seems to happen after <|endoftext|>, might be correlated to a change in topic.
Seems to happen after <|endoftext|>, might correlate to a change in topic.
It also activates on non greek figures and fictional character names that don't have the like "Sun Tzu" "King Arthur" "Princess Celestia", "Albus Dumbledore" , "Aang"
Majora's Mask might be predicting "The Legend of Zelda", Zeus might be predicting "Zeus the God of Lightning"
Interesting misspelling of "proper" in the generated explanation
also what the hell is MatchupNotesCompanion.jar? do you have any attribution for the activation texts at all
it's a neuron, what do you expect XD
this neuron is a polysemantic mess
this is a super confusing/polysemantic neuron
Fires when Predictions of Tech Companies as the next token are appropriate.
A limited set of compound adjectives most related to specific phrases like laid-back, sold-out, grown-up (second term is often direction or spatial).
abbreviations and acronyms in governmental and trade contexts, fires on the first token after an open bracket and sometimes a little bit on the token before or after.
Text likely to appear in d&d character creations.
ugh sorry for spamming answers on this one; GPT's explanation clearly misses an important feature, but rates its own answer as the best, I was experimenting with different phrasing to see if it would score better
Seems to somewhat target the "com" after the period, and also the period before the "youtube" in some cases
there seem to be a lot of <|endoftext|> tokens in here, which does not bode well for automated explanations
I don't even know what's this about
the correct answer got a very low ranking
Looks like it. I tested "Some characters in the game Diablo are Itherael the Archangel of Fate and Deckard Cain the Elder." and it activated on both "the" (though very weakly on Cain the Elder). I'm not sure what to think of the other activations like "Majora's Mask", Zeus and Obi-Wan, which are within <5% top activation value of the others.
Looks like it applies specifically on titles with "the" or "of"
this is another mystery :(. it looks like the neuron activates strongly on all words (so it also shows up in many different searches), but also particularly strongly on legalese and the words "didn't"/"doesnt"???
ah right - it does look pretty sports-related at the top. feels like there might be more to it though. it seems like all have some number in them, even the non-sports ones that activate less "eg L.A. has reduced ozone levels by more than one-third [...] in the last 15 years". it could be that this is a "words around numbers" neuron, and it happens that sports tends to more frequently have numbers in this specific kind of way.
actually probably not really? this one is likely yet another sports neuron (and there *are* lots of sports neurons)! although the other activation texts are still a real head-scratcher
wow, this is one of the strangest neurons I've seen. it seemingly fires on ~all words (with the words @quercusilex373 named being the strongest?), but it also is a top search result for keysmash searches somehow 🤔
The explanation should be more specific. "Historical" is too broad for a direction that activates on Greek/Persian figures.
this direction is evidence that directions can find more than tokens and related tokens. pretty cool that it doesn't fire on any particular tokens, instead finding "story synopsis" narratives.
Seems pretty polysemantic considering it's from OAI directions.
Sorry for the double submission! I accidentally hit enter early on the first one instead of the quotation mark, and figured it would be better to submit the whole thing properly
my first response was a slightly better answer but had to resubmit because otherwise it didn't show it beating GPT for some reason
manner + in/by ; lens through which ; place to ; it take
I wrote the current top-scored explanation but nonnovino's is better
'Bear' as a verb and a noun, and other strong creatures
wow this one is highly polysemantic
GPT's incorrect answer, which scores more points
If you type a newline in the test, you can easily get high scoring text. But the scorer still didn't like that answer.
GPT’s explanation seems sort of bad
this one is so weird/difficult
something interesting going on here...
soccer game recaps should probably have a higher score
I think my explanation is more specific, at least relative to the examples that I was shown - the top scorer is accurate by overgeneralizing.
This is really interesting because h/t means hat tip, which is also a non-literal use of "hat" in "hats off to"
I disagree, it seems to be pretty specifically about indicating a specific feature of a building, like the story, or condition
this is highly polysemantic I think
my topical activation text: never before have I gotten such a bad score on an explanation 🫠
This one seems to be all over the place. Probably could use some more activation texts.
stumped. seems to react highly to ".", but can't find a pattern for the context.
Looks similar to this one: https://www.neuronpedia.org/gpt2-small/6/2817
This one seems like a mess 🫤
seems like there’s two explanations here- unless there is some commonality between “shalt thou” and “comes”?
Seems to be video game related, but I don't think it's actual in-game text - look like markdown tables. Maybe D&D?
seems to have at least two meanings