INDEX
Explanations
This neuron activates on numeric probability values (decimal numbers and fractions) in the text.
New Auto-Interp
Negative Logits
.cond
-0.07
Equip
-0.07
�
-0.07
Train
-0.06
probes
-0.06
lete
-0.06
sucht
-0.06
독
-0.06
_No
-0.06
Marin
-0.06
POSITIVE LOGITS
thora
0.08
|↵
0.08
(criteria
0.08
砲
0.07
jas
0.07
目录
0.07
接着
0.07
wildlife
0.07
abilidade
0.07
asury
0.06
Activations Density 0.001%