INDEX
Explanations
demeter & phishing
salient named entities and acronyms—proper names, titles, and specialized technical terms that signal key subjects across domains.
The neuron activates on enumerated list entries—especially section headings or item titles (e.g. anime character names) in a ranked “waifu” list.
user query topics or keywords.
New Auto-Interp
Negative Logits
৭
0.16
ل
0.16
rvGroup
0.15
name
0.15
ناہ
0.15
ఫ్
0.14
অন্যান্য
0.14
کے
0.14
ว
0.14
ുവരി
0.14
POSITIVE LOGITS
.
0.19
,
0.18
。
0.17
と呼ばれる
0.16
®,
0.16
®.
0.15
:
0.15
had
0.14
യുടെ
0.14
!
0.14
Activations Density 10.766%