INDEX
Explanations
The neuron detects mentions of record‐keeping terms (e.g., “records,” “reported,” “reports”).
New Auto-Interp
Negative Logits
ПО
-0.08
Given
-0.07
Slot
-0.07
.unit
-0.07
ोप
-0.07
El
-0.07
Tell
-0.06
White
-0.06
مؤ
-0.06
الق
-0.06
POSITIVE LOGITS
records
0.11
registers
0.07
Records
0.07
solitude
0.06
continuity
0.06
registry
0.06
associates
0.06
reclaimed
0.06
ONLY
0.06
retrieving
0.06
Activations Density 0.010%