INDEX
Explanations
The neuron reliably activates on the adjective “rare” (and its morphological variants like “rarely”) when it labels or introduces something uncommon.
New Auto-Interp
Negative Logits
-plan
-0.09
Initializing
-0.08
elson
-0.07
-transform
-0.07
mv
-0.07
boost
-0.07
імп
-0.07
inth
-0.07
headings
-0.07
Guess
-0.07
POSITIVE LOGITS
rare
0.18
Rare
0.12
rarity
0.12
Rare
0.10
rarely
0.09
rar
0.08
Rarity
0.08
र
0.08
rice
0.07
occasionally
0.07
Activations Density 0.008%