INDEX
    Explanations

    The neuron reliably activates on the adjective “rare” (and its morphological variants like “rarely”) when it labels or introduces something uncommon.

    New Auto-Interp
    Negative Logits
    -plan
    -0.09
    Initializing
    -0.08
    elson
    -0.07
    -transform
    -0.07
    	mv
    -0.07
     boost
    -0.07
     імп
    -0.07
    inth
    -0.07
     headings
    -0.07
    Guess
    -0.07
    POSITIVE LOGITS
     rare
    0.18
     Rare
    0.12
     rarity
    0.12
    Rare
    0.10
     rarely
    0.09
     rar
    0.08
     Rarity
    0.08
    0.08
     rice
    0.07
     occasionally
    0.07
    Act Density 0.008%

    No Known Activations