INDEX
Explanations
The neuron fires on words indicating material damage—specifically tears, punctures, or being torn/ripped.
New Auto-Interp
Negative Logits
설치
-0.06
cycle
-0.06
controlled
-0.06
align
-0.06
_META
-0.06
Insert
-0.06
doubled
-0.06
vaccination
-0.06
Lua
-0.06
<<
-0.06
POSITIVE LOGITS
ADO
0.07
Pemb
0.07
písem
0.07
کری
0.07
восп
0.06
spolu
0.06
εργ
0.06
申请
0.06
گي
0.06
вок
0.06
Activations Density 0.024%