INDEX
Explanations
**The neuron is looking for words related to "silence".**
instances of a specific form of the word "sil"
New Auto-Interp
Negative Logits
OTAL
-0.74
Blessed
-0.68
Ancients
-0.67
enegger
-0.67
Briggs
-0.67
damned
-0.66
Das
-0.62
Herald
-0.62
depreciation
-0.62
Mother
-0.62
POSITIVE LOGITS
encers
1.44
encer
1.43
encing
1.40
enced
1.32
icone
1.23
ences
1.20
icate
1.14
icon
1.12
ica
1.02
entary
1.00
Activations Density 0.024%