INDEX
Explanations
collaborations
The neuron activates on tokens describing the model’s provenance—i.e. mentions of its development or joint training by institutions (like “developed,” “trained,” institution names, and dates).
New Auto-Interp
Negative Logits
Haupt
-0.07
هفت
-0.07
arring
-0.06
paring
-0.06
Bearings
-0.06
пен
-0.06
HID
-0.06
wash
-0.06
kort
-0.06
andest
-0.06
POSITIVE LOGITS
(fs
0.08
<li
0.06
'])?
0.06
acea
0.06
nostalg
0.06
.Protocol
0.06
>>>(
0.06
341
0.06
acompañ
0.06
missed
0.06
Activations Density 0.004%