Neuronpedia

SCORE TYPE

eleuther_fuzz

Description

Asks a model if a set of selected tokens should activate a feature given an explanation. Activating contexts are sampled from different quantiles of the full distribution of activating contexts. Non-activating contexts have random tokens highlighted with the same proportion of activating contexts.

Author

EleutherAI

URL

https://github.com/EleutherAI/sae-auto-interp

Score Calculation

Score = (true positive rate + true negative rate)/2, where the true positive rate is TP/P, the times that the model correctly predicted the feature was active over the times it was active, and the true negative rate is TN/N, the times the model correctly predicted the feature was not active over the times it was active.

Settings

Samples 10 contexts from each of 10 quantiles and 100 non-activating contexts. Uses temperature 0.7, max returned token 500.

Recent Scores

Mentions uses, applications, and potential of something

Farming and agriculture

terms related to agricultural practices and research

The word "trend" (and related forms like "trending," "trends," "trend-setting," "trend-driven," "trend-line") appears in academic and technical contexts across scientific papers, reports, and professional writing. The term is used to describe general patterns, directions of change, or prevailing tendencies in data analysis, fashion, social behavior, research collaboration, and material science. The marked instances occur as standalone nouns or as components of compound adjectives, consistently referring to observable directional patterns or contemporary movements in their respective domains.

These examples contain fragments of words with selected syllables or morphemes marked, representing partial morphological units that appear within larger words across diverse technical and non-technical contexts (such as "libuv" from libraries, "NSU" from code, "edu" from place names, medical terms, and legal documents). The pattern reflects mid-word substrings that don't consistently correspond to meaningful linguistic boundaries, suggesting the selection marks text segments that may be important for tokenization, morphological analysis, or language model behavior at the subword level.

Closing parentheses or brackets that complete a citation, reference, or parenthetical remark in formal academic or legal text.

words and phrases associated with conjunctions and connections between ideas

The marked tokens appear to be fragments of words that have been split across delimiters, often appearing within proper nouns, technical terms, or compound words in diverse academic and technical texts. The patterns suggest these are either OCR/text encoding artifacts, reference citations within brackets, or deliberate word segmentation where parts of a single word are delimited separately from their surrounding context.

conjunctions and transitional phrases that signify contrast or condition

phrases emphasizing continuity or progression in narratives

references to the central participant or subject (person, animal, or designated entity) being discussed or instructed in the passage.

Preceding "Editor" or related to "Academic"