INDEX
Explanations
This neuron activates on mentions of formal theoretical concepts in scientific text—especially occurrences of the word “theory” and closely related technical terms.
New Auto-Interp
Negative Logits
그렇
-0.07
subdued
-0.07
悉
-0.07
Bilg
-0.07
Grid
-0.07
.As
-0.07
Managing
-0.06
(fields
-0.06
Carolyn
-0.06
/default
-0.06
POSITIVE LOGITS
caches
0.06
naive
0.06
hesion
0.06
chicken
0.06
acha
0.06
const
0.06
apologize
0.06
0.06
На
0.06
↵↵↵
0.06
Activations Density 0.035%