mobcode

Operational responsibilities are good

8 January, 2008

pipe

If a developer stays on a project long enough they will inevitably be pulled into operations. Not only do they write the code, but they help make it run in production. This is a good thing. It is not necessarily glamorous, but it is good.

  • It makes a developer better because they experience the impact of their coding-time decisions.

  • It makes a developer more valuable because taking a new feature all the way through to production is more valuable than simply writing the code and leaving it for someone else to make work.

  • It secures a developer’s position on the team because keeping the system running is a more fundamental need than writing new code.

Comments (0)

Tell the team when you mess with a shared server

27 August, 2007

work

You are working on a distributed team. There are a handful of other people working on the team. You are sharing a server. When you do anything disruptive on the shared server you must tell the team.

This means you must tell the team when you do things like:

  • change the system configuration
  • start/stop a shared server process
  • run a resource intensive operation
  • start/stop the server

At some point, everyone on the team is trying to understand why the server is doing what it is doing. Often it seems the server has a life of its own. There are enough wrinkles of complexity and unexpected dependencies in the operating system and the applications and in the network interactions that it can be very hard to understand what the server is doing. This problem is compounded greatly when there is a person on the other side of the world, invisibly making changes. Now the system really does appear to have a life of its own! The other person has effectively become part of the black-box everyone is trying to understand.

So whenever you are messing with a shared server tell.

Even if it is just a dev server tell the team.

Even if nobody else is online, tell the team (they are going to come online and they are going to wonder what is going on with the server).

Even if 99 times out of 100 nobody cares about your messages, tell them (you can never tell when it will be the 1 time out of 100).

Even if everyone says they don’t care about the messages, tell them (the messages are only “un-necessary because you are sending them).

The goal is to create a virtual, shared work room. In a virtual, shared work room you could see if someone stood up from their desk and walked over to the main server and started punching buttons on it. Everyone would notice this. Even if they did not care at the moment, they would still notice and have a basic situational awareness of what is going on in their world.

So tell the team when you mess with a shared server.

Comments (0)

You have access to production

25 July, 2006

data center

At some point there will be system problems that you need to fix in production. Perhaps the problems only occur in production. You don’t have direct access to production. Therefore… you decide not to address the issue (?!).

Wrong! The answer is that: you have access to production.

Track down someone who does have access. Work with them to get into the production system and either diagnose or fix the problem. Trust me they will be happy to hear from you and happy to give you what you need. You must own the issue even if it requires access to systems beyond your direct control.

Comments (0)

Succeed or face the wolves

28 June, 2006

The only way to survive is for your system to run well in production. As soon as you have production problems you make yourself vulnerable. If it is a momentary production issue then it’s ok. But, if you can’t quickly solve the problem or if the problem recurs then prepare to face the wolves.

The wolves sense you are weak and they will circle around you. Once they engage they will tear your system to pieces. They will criticize the architecture, the coding, the implementation, the design. They will take some performance measurements and roll their eyes and say it obviously could never work. Regardless of the fact that they have no baseline for their measurements.

Consultants will be called in. They will interview the team and study the data. They will look quite serious. They will write reports. You will tell them what the problem is and they will write it down. They will criticize everything you have done.

The only way to avoid the wolves is to make a system that works. The only way to prove that you know what you are doing is for your system to work. If it runs well in production then they will pass you by looking for easier prey.

Comments (0)

Be careful with production

4 April, 2006

Show great care when making configuration changes to a production system.

  • Double check the changes
  • Be careful if using a global search and replace to make the change (it often changes things you don’t expect)
  • After the change, check the process output to make sure it is working

And this part is counter-intuitive: make the change during the main working hours rather than during the off-hours.  This way if it fails you will find out immediately rather than much later.  This is preferred because if it fails much later then you might not be around to fix it and it won’t be obvious what broke production.

Comments (0)