INDEX
Explanations
The neuron activates on occurrences of the word “satellite” (or its subword “ellite”) across diverse contexts.
New Auto-Interp
Negative Logits
room
-0.08
armed
-0.08
ROOM
-0.07
rooms
-0.07
Format
-0.07
Room
-0.07
woods
-0.07
cough
-0.07
Commands
-0.07
849
-0.07
POSITIVE LOGITS
satellite
0.13
satellites
0.11
Satellite
0.08
eli
0.07
ΕΤ
0.07
uelle
0.07
ετ
0.07
_PCIE
0.07
dato
0.07
sat
0.07
Activations Density 0.003%