INDEX
Explanations
The neuron fires most strongly on technical, topic‐introducing vocabulary—especially when new concepts or specialized terms (e.g. “privacy,” “ontology,” “quantification,” “contribution”) are first presented.
New Auto-Interp
Negative Logits
(Layout
-0.07
ğu
-0.07
Roses
-0.06
раді
-0.06
lod
-0.06
riage
-0.06
Kelly
-0.06
privilege
-0.06
wipe
-0.06
انية
-0.06
POSITIVE LOGITS
المر
0.07
invoke
0.07
zure
0.06
یکی
0.06
乗
0.06
타
0.06
regulate
0.06
↑
0.06
ItemStack
0.06
Means
0.06
Activations Density 0.028%