© Neuronpedia 2026
    Privacy & TermsBlogGitHubSlackTwitterContact
    Neuronpedia logo - a computer chip with a rounded viewfinder border around it

    Neuronpedia

    Natural Language
    Autoencoders
    NEW
    Assistant AxisNEWCircuit TracerUPDATESteerSAE EvalsExportsAPI Community BlogPrivacy & TermsContact
    SCORE TYPE
    eleuther_fuzz
    Description
    Asks a model if a set of selected tokens should activate a feature given an explanation. Activating contexts are sampled from different quantiles of the full distribution of activating contexts. Non-activating contexts have random tokens highlighted with the same proportion of activating contexts.
    Author
    EleutherAI
    URL
    https://github.com/EleutherAI/sae-auto-interp
    Score Calculation
    Score = (true positive rate + true negative rate)/2, where the true positive rate is TP/P, the times that the model correctly predicted the feature was active over the times it was active, and the true negative rate is TN/N, the times the model correctly predicted the feature was not active over the times it was active.
    Settings
    Samples 10 contexts from each of 10 quantiles and 100 non-activating contexts. Uses temperature 0.7, max returned token 500.
    Recent Scores
    Mentions uses, applications, and potential of something
    Mentions uses, applications, and potential of something
    Mentions uses, applications, and potential of something
    Mentions uses, applications, and potential of something
    Mentions uses, applications, and potential of something
    Farming and agriculture
    Farming and agriculture
    Farming and agriculture
    Farming and agriculture
    Farming and agriculture
    terms related to agricultural practices and research
    The word "trend" (and related forms like "trending," "trends," "trend-setting," "trend-driven," "trend-line") appears in academic and technical contexts across scientific papers, reports, and professional writing. The term is used to describe general patterns, directions of change, or prevailing tendencies in data analysis, fashion, social behavior, research collaboration, and material science. The marked instances occur as standalone nouns or as components of compound adjectives, consistently referring to observable directional patterns or contemporary movements in their respective domains.
    These examples contain fragments of words with selected syllables or morphemes marked, representing partial morphological units that appear within larger words across diverse technical and non-technical contexts (such as "libuv" from libraries, "NSU" from code, "edu" from place names, medical terms, and legal documents). The pattern reflects mid-word substrings that don't consistently correspond to meaningful linguistic boundaries, suggesting the selection marks text segments that may be important for tokenization, morphological analysis, or language model behavior at the subword level.
    Closing parentheses or brackets that complete a citation, reference, or parenthetical remark in formal academic or legal text.
    words and phrases associated with conjunctions and connections between ideas
    The marked tokens appear to be fragments of words that have been split across delimiters, often appearing within proper nouns, technical terms, or compound words in diverse academic and technical texts. The patterns suggest these are either OCR/text encoding artifacts, reference citations within brackets, or deliberate word segmentation where parts of a single word are delimited separately from their surrounding context.
    conjunctions and transitional phrases that signify contrast or condition
    phrases emphasizing continuity or progression in narratives
    references to the central participant or subject (person, animal, or designated entity) being discussed or instructed in the passage.
    Preceding "Editor" or related to "Academic"